This article was published in November 2019 by Benedikt Droste, "[Boost your efficiency and process Excel-files with Python](https:: //towardsdatascience.com/boost-your-efficiency-and-process-excel-files-with-python-cae650c85d6c) ”is a Japanese translation. This article is published with permission from the original author.
When dealing with data, you will inevitably come into contact with Excel. Even if you don't use it for yourself, your clients and colleagues will need it. Excel is good at spreadsheets with small datasets.
But I was always lamenting when I saw an Excel sheet with tens of thousands of rows and hundreds of columns. These sheets are heavy and prone to crash during calculations.
So I decided to use Python to handle such a huge Excel file. It also comes with the great advantage of being able to write reusable code and provide documentation. Let's get started!
The file I want to process this time contains about 1 million rows and 16 columns.
Python provides a read_excel ()
function to read an Excel file as a DataFrame object.
import pandas as pd
import numpy as np
df = pd.read_excel(...\\Excel-Tutorial.xlsx')
** The Pandas library is not included with Python and must be installed with pip etc. ** **
As you can see, the data looks pretty good in general, but the column headers seem to be wrong. Many Excel maps contain headings and other information. Let's skip this part and define the header line.
df = pd.read_excel('…\\Excel-Tutorial.xlsx', header=[1]).reset_index()
The argument header = [1] specifies that the second row of the Excel sheet should be used as the header. All previous lines are ignored.
A common question in the marketing department is annual sales in each country.
This calculation was completed in 86ms. One of the great advantages of working with Excel files in Python is that it can handle all kinds of calculations much faster than Excel itself. The more complex the process, the greater the speed advantage.
You may also need national data grouped by year and category as a sales department request. You need to save the calculation results in separate worksheets to output the market data for each country.
The next step is to save the file again as an Excel file and provide it to your sales and marketing departments. Create a pd.ExcelWriter object and create a different worksheet for each.
It's easy? Let's take a look at the newly created workbook.
As you can see, the DataFrame is saved correctly in the specified worksheet. I sent this great achievement to both departments and received an email the next day. I was asked for formatting and visualization. Since such data needs to be converted every month, I decided to run the task in Python as well.
You will need to recreate the writer object for formatting and visualization.
As you can see, the beginning of the code is the same as in the first example, creating a writer object. You can use xlsxwriter
to access Excel features such as graphs and formats. To access these features, you need to get a workbook object with workbook = writer.book
and a worksheet object withworksheet = writer.sheet ['Sales_Sums']
. In this example, we will make changes to the first sheet. Add a graph, specify a range of data (= Sales_Sums! $ B $ 2: $ B $ 7'
), and add it to cell A9 of the worksheet.
Format your sales data in the same way. Add a 3-color scale to the B2: B7
range to visually highlight the low and high values. Adjust the width of the worksheets in the first and second columns with worksheet.set_column (0,1,30)
. It also formats the sales data column header and renames it to " 2019 Sales Data
". And finally save the file.
The results are much better and have significant advantages over Excel. And next month, you can create exactly the same thing with just one click.
Python is very good at handling Excel files. With Python, you can easily handle large files, write reusable code, and even provide documentation to your colleagues. We also confirmed that we could easily access the advanced features of Python. You could also fully automate the reporting process itself.
Original Author: Benedikt Droste Thank you for letting us share your knowledge!
This article was published with the cooperation of the following people. Thank you again. Selector: yumika tomita Translator: @ siho1 Auditor: @nyorochan Publisher: @aoharu
We translate high-quality articles from overseas into Japanese with the cooperation of several excellent engineers and publish the articles. Please contact us if you can sympathize with the activity or if you are interested in spreading good articles to many people. Please send a message with the title "I want to participate" in [Mail](mailto: [email protected]), or send a message in Twitter. For example, we can introduce the parts that can help you after the selection.
How was this article? ・ I wish I had done this, I want you to do more, I think it would be better ・ This kind of place was good We are looking for frank opinions such as. Please feel free to post in the comments section as we will use your feedback to improve the quality of future articles. We also welcome your comments on Twitter. We look forward to your message.
Recommended Posts