[PYTHON] Patterns that files generated by OpenPyXl cannot be opened in Excel and their countermeasures

When I output a certain log to an xlsx file with OpenPyXl and open it in Excel 2011 on Mac, the following dialog appears. I did.

スクリーンショット 2015-12-10 10.42.11.png

After analyzing the problematic part, we found that it could not be opened due to the following two reasons.

  1. There is a mobile emoji in the cell
  2. Numerical cells start with'=' (half-width equal)

Mobile pictogram measures

Mobile emojis (U + E000-U + F8FF, very loosely) will pass OpenPyXl itself without any problems and an xlsx file will be generated. However, Excel does not seem to assume that these invisible characters will be written as they are in the file, and the above error will occur and character conversion (?) Will be performed. If you read the tsv file or copy and paste it into a cell, it seems that it will be converted by Excel at this stage, and the converted characters will be stored in the cell.

In this work, I'm glad I scraped the things that can't be displayed, so I scraped them below.

invalid_code = re.compile(ur'[\u0000-\u001F\u007f\uE000-\uF8FF]')
invalid_code.sub('', s)

In fact, the one that supports emoji is \ uE000- \ uF8FF. \ u0000- \ u001F \ u007f is a control character, but it is included because OpenPyXl does not pass it. (Aside from the problem of what a log contains such characters) Also, s is a unicode string.

Equal measures

Like pictograms, it seems that Excel considers numbers and the beginning of'=' as an error. (I think that if it is a formula, it should be entered in the cell of the formula) And, by default, OpenPyXl writes to the cell as a number ([Be wary of setting Cell values text starting with "=".](Https:: //bitbucket.org/openpyxl/openpyxl/issues/442)), so it seems that Excel is still throwing an error.

As a countermeasure,

  1. Convert to a string cell
  2. Put a'(single quote) at the beginning of the cell

There is, and this time I chose the former.

ws.cell('A1').data_type = 's'

With the above, the generated file can be opened without any error. If there are other patterns, I would like to write them again.

Recommended Posts

Patterns that files generated by OpenPyXl cannot be opened in Excel and their countermeasures
Excel sheet cannot be specified in sheet_name of pd.read_excel
Dealing with the error that HTTP fetch error occurs in gpg and the key cannot be obtained