[PYTHON] About problems and solutions of OpenPyXL (Ver 3.0 version)

Introduction

This article is for Ver 3.0.2 or earlier. OpenPyXL is being improved day by day, and in the future it may be possible to resolve the issues described here.

About the problem of OpenPyXL (2.6, 3.0)

OpenPyXL allows you to brute force create an Excel sheet from a brand new workbook with just Python code, but creating it from scratch makes the code more difficult to read and separates the code and design. It is more advantageous for maintenance, so I think it is a standard practice to read a template Excel workbook designed to some extent with OpenPyXL and set the necessary data.

However, OpenPyXL works in a completely different way from Excel, so there are some things that can be done easily with Excel VBA but not with OpenPyXL.

Except for the following conditions, it is often difficult to implement processing using the template Excel workbook only with the current OpenPyXL.

--Those without images or shapes --Those that do not require a copy of the worksheet --Those that do not make heavy use of merged cells --Do not add / remove rows or columns in the middle of the worksheet

Why is it difficult to process with OpenPyXL?

--Things with images and shapes

In OpenPyXL, images and shape information are ignored when the book is opened. If you want to handle images and shapes in OpenPyXL, you need to set it in the OpenPyXL process after opening the book.

--What you need to copy the worksheet

The function workbook.copy_worksheet () that copies worksheets behaves differently than that of Excel (Worksheets.Copy) and has some information that is not copied, such as conditional formatting. Rewriting the OpenPyXL source code can help to some extent, but it still doesn't copy as perfectly as Excel.

--Those that make heavy use of merged cells

OpenPyXL can handle merged cells themselves, but there are many restrictions such as drawing ruled lines and setting formats. (It is easy to cause trouble) For the parts operated by OpenPyXL, it is necessary to avoid using merged cells as much as possible by devising the design, or to confirm that there is no problem in the operation of OpenPyXL before using it.

--Do not add rows or columns in the middle of the worksheet

worksheet.insert_rows () and worksheet.insert_col () that insert rows / columns are ** pretty ** behaviors of Excel's (row (). Insert, col (). Insert) Is different. Even if the above function is executed, the attribute information (defined name, format, conditional format) given to the cell, merged cell, image information placed on the cell, function formula, etc. do not follow, so the current situation The version is ** virtually useless **. If you add (overwrite) rows and columns to the end instead of in the middle of the worksheet, the problem should be relatively small.

Problem solution

OpenPyXL can handle a template Excel workbook (xlsm) containing VBA macros by adding keep_vba = True to the open argument. Therefore, it is possible to realize difficult / impossible processing with OpenPyXL by letting Excel VBA macros perform difficult processing with OpenPyXL. (Unfortunately, it cannot be done in an environment where execution of Excel VBA macros is not permitted due to internal circumstances etc.)

We will organize the processing performed by OpenPyXL and the processing performed by Excel VBA macros, and perform code processing and settings for Python and Excel VBA respectively. For example, you can copy a worksheet and add / delete matrices in the middle of a worksheet with an Excel VBA macro.

For the data required by the Excel VBA macro, create a data worksheet separate from the template worksheet and set the data in it with OpenPyXL.

If you need to handle images that can only be created on the OpenPyXL side, create a data worksheet and place the images on it with OpenPyXL. If you cannot set an identification name for the image created by ʻadd_image () `of OpenPyXL and you need to pass multiple images, you can distinguish the images because the Excel VBA macro cannot distinguish the images. It is also necessary to devise something like this.

Sample example

I think it's faster to take a look at the sample than to explain it in a long way. The main processing of both Python and Excel VBA is about 100 lines. To run the sample, Python requires the OpenPyXL package as well as the Pillow package for editing images. (If you get an error, please install the package if necessary)

Sample publishing location

It is available on Github (https://github.com/umazular/openpyxl).

--Standard Python 2.7.5 + OpenPyXL 2.6.4 on CentOS 7 --Python 3.7.3 + OpenPyXL 3.0.2 on Raspbian (Debian 10.2) --Python 3.7.2 + OpenPyXL 3.0.0 installed on Windows10

I have confirmed at. (I try not to depend on the environment as much as possible)

The sample was created by imagining the output of the receipt. Read one or more CSV receipt data + images and output the result to Excel. In the sample, CSV files and images for 2 sheets are created in advance. The default CSV file of the sample is Shift-JIS, so replace it with UTF-8 according to your environment.

I tried to set the template report sheet ("receipt") that seems difficult with OpenPyXL.

--Conditional formatting to change the background color for each line in the line --A function formula that sets the product of quantity and unit price in the amount --A function formula that sets the total amount of money --Format when printing a sheet (settings such as margins and printing of all columns)

Excel VBA can be opened by selecting "Development"-> "Visual Basic" from the Excel menu. If the "Development" tab is not displayed, open the Excel options screen with File-> Options and check "Development" on the main tab of the user settings on the ribbon to display it.

How to run the sample

Running a python program will generate an Excel workbook (sampleoutput.xlsm) with the data set.

$ ls *.xlsm
sample.xlsm
$ python sample.py
$ ls *.xlsm
sample.xlsm  sampleoutput.xlsm
$

When the output Excel workbook is opened in an environment where macro execution is permitted, the data setting macro processing is executed only for the first time and the result is created.

Reference materials, etc.

--Official document (English) https://openpyxl.readthedocs.io/

--Notes on how to use openpyxl in python https://qiita.com/sky_jokerxx/items/dc9d8827d946b467ba4b

Recommended Posts

About problems and solutions of OpenPyXL (Ver 3.0 version)
About problems and solutions of OpenPyXL (Ver 3.0 version)
Problems of liars and honesty
Problems of liars and honesty
[Tips] Problems and solutions in the development of python + kivy
About the virtual environment of python version 3.7
Comparison of solutions in weight matching problems
About _ and __
About left justification and right justification of Kivy Label
About the behavior of copy, deepcopy and numpy.copy
A note about the python version of python virtualenv
About the * (asterisk) argument of python (and itertools.starmap)
About shallow and deep copies of Python / Ruby
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
About import error of numpy and scipy in anaconda
Think about the next generation of Rack and WSGI
[Competition Pro] Summary of stock buying and selling problems
Personal notes about the integration of vscode and anaconda
[Python] Chapter 01-02 About Python (Execution and installation of development environment)
Version control of Node, Ruby and Python with anyenv