[PYTHON] [openpyxl] What to do when IllegalCharacterError appears in pandas.DataFrame.to_excel

Prerequisites

The problem you want to deal with this time

import pandas
target_df = pandas.DataFrame(Given data)

So, when I try to output this data frame to an Excel file,

---------------------------------------------------------------------------
IllegalCharacterError                     Traceback (most recent call last)
<ipython-input-88-d629d1ba2e9a> in <module>()
      1 # subset_df = subset_df.applymap(illegal_char_remover)
      2 
----> 3 subset_df.to_excel("./test.xlsx")

/Users/noname/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose)
   1462         formatted_cells = formatter.get_formatted_cells()
   1463         excel_writer.write_cells(formatted_cells, sheet_name,
-> 1464                                  startrow=startrow, startcol=startcol)
   1465         if need_save:
   1466             excel_writer.save()

/Users/noname/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/excel.py in write_cells(self, cells, sheet_name, startrow, startcol)
   1313                 column=startcol + cell.col + 1
   1314             )
-> 1315             xcell.value = _conv_value(cell.val)
   1316 
   1317             style_kwargs = {}

/Users/noname/.pyenv/versions/3.5.2/lib/python3.5/site-packages/openpyxl/cell/cell.py in value(self, value)
    290     def value(self, value):
    291         """Set the value and infer type and display options."""
--> 292         self._bind_value(value)
    293 
    294     @property

/Users/noname/.pyenv/versions/3.5.2/lib/python3.5/site-packages/openpyxl/cell/cell.py in _bind_value(self, value)
    189 
    190         elif isinstance(value, STRING_TYPES):
--> 191             value = self.check_string(value)
    192             self.data_type = self.TYPE_STRING
    193             if len(value) > 1 and value.startswith("="):

/Users/noname/.pyenv/versions/3.5.2/lib/python3.5/site-packages/openpyxl/cell/cell.py in check_string(self, value)
    154         value = value[:32767]
    155         if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 156             raise IllegalCharacterError
    157         return value
    158 

IllegalCharacterError: 

The problem of getting angry. Even if I searched, there was no workaround, so I will post it.

approach

IllegalCharacter is defined in openpyxl as follows. (https://bitbucket.org/openpyxl/openpyxl/src/dad834128adcb5ca4330bd2bc4bc714b54ccfbb9/openpyxl/cell/cell.py?at=default&fileviewer=file-view-default Line69)

ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')

So I think I should remove this IllegalCharacter from the data frame.

import re

# ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')
ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]|[\x00-\x1f\x7f-\x9f]|[\uffff]')
def illegal_char_remover(data):
    """Remove ILLEGAL CHARACTER."""
    if isinstance(data, str):
        return ILLEGAL_CHARACTERS_RE.sub("", data)
    else:
        return data

target_df = target_df.applymap(illegal_char_remover)

So, if you remove it, this error will not occur.

This fall (solved)

After removing \ uffff, the error below no longer appears.

~~ Unfortunately, even if you remove this error, at the time of export ~~

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Please note that you may need to make some improvements as you will get angry. ~~ ~~ A story about cleaning data before experimenting. ~~

Postscript

This is my first post, so if you have any problems, please point it out. Then.

Recommended Posts

[openpyxl] What to do when IllegalCharacterError appears in pandas.DataFrame.to_excel
What to do when "SSL: CERTIFICATE_VERIFY_FAILED _ssl.c: 1056" appears in Python
What to do when "Invalid HTTP_HOST header" appears in Django
What to do when the warning "The environment is in consistent ..." appears in the Anaconda environment
What to do when a warning appears around Python integration in Neovim's CheckHealth
What to do when "TypeError: data type not understood" appears in python's numpy.zeros
What to do when [Errno 2] No such file or directory appears in Python
What to do when Ubuntu crashes
OSError: [Errno 40] What to do when Message too long appears
What to do if Insecure Platform Warning appears when running Python
What to do when UnicodeDecodeError occurs during read_csv in pandas (pd.read_table ())
What to do when ModuleNotFoundError: No module named'XXX' occurs in Python
What to do when the value type is ambiguous in Python?
[Memorandum] What to do when a warning appears after executing pip list
What to do when the result downloaded via scrapy is in English
What to do when pyinstaller: error: argument --add-binary: invalid add_data_or_binary value: appears
[Docker] What to do when error Couldn't find the binary git appears
What to do if No Python documentation found for ... appears in pydoc
[OSX] [pyenv] What to do when an SSL error occurs in pip
What to do when a warning message is displayed in pip list
What to do if pipreqs results in UnicodeDecodeError
What to do when PermissionError of tempfile.mkstemp occurs
What to do to get google spreadsheet in python
What to do when "OS Error: [WinError 126] The specified module cannot be found" appears in import torch
[python] What to do when an error occurs in send_keys of headless chrome
What to do when "TypeError: must be string, not int…" appears when using strptime
What to do when SSL error occurs in pip in Windows10, miniconda, VScode environment
What to do if abort is displayed when inputting camera video in OpenCV
What to do when the graph does not appear in jupyter (ipython) notebook
What to do if NotADirectoryError: [Errno 20] Not a directory:'xdg-settings' appears in jupyter notebook
What I do when imitating embedded go in python
What to do if pip install fails in Xcode 5.1
curl: (60) What to do when Issuer certificate is invalid.
What to do when gdal_merge creates a huge file
What to do when raise ValueError, "unsupported hash type"
What to do if a UnicodeDecodeError occurs in pip
What to do when "cannot import name xxx" [Python]
I want to do something in Python when I finish
What to do when you can't bind CaboCha to Python
What to do when is not in the sudoers file.This incident will be reported.
[Python] What to do when No module named'pyproj.datadir' appears when Exe is done with PyInstaller
What to do when only the window is displayed and nothing is displayed in pygame Note
What to do if you get an error when importing matplotlib in Python (Mac)
What to do when Python starts up in Anaconda does not come out unexpectedly
What to do when "Type Error: must be _socket.socket, not socket" appears on GAE
[AWS] What to do when you want to pip with Lambda
What to do if ʻarguments [0] .scrollIntoView ();` fails in python selenium
What to do when Japanese is not displayed on matplotlib
What to do if pip gives a DistributionError in Homebrew
What to do when PyCharm font is strange or garbled
What to do when a Remove Error occurs when updating conda
What to do when Unalignable boolean Series provided as indexer
What to do if you get "coverage unknown" in Coveralls
What to do if a 0xC0000005 error occurs in tf.train.start_queue_runners ()
What to do when an error occurs with import _ssl
What to do if package installation fails when deploying to heroku
What to do if you can't log in as root
ImportError: No module What to do when you are told
What to do when there is no response due to Proxy setting in Python web scraping
What to do when a Missing artifact occurs in a jar that is not defined in pom.xml
What to do if you get an error when running "certbot renew" in CakePHP environment