Improve your productivity by processing huge Excel files with Python

This article was published in November 2019 by Benedikt Droste, "[Boost your efficiency and process Excel-files with Python](https:: //towardsdatascience.com/boost-your-efficiency-and-process-excel-files-with-python-cae650c85d6c) ”is a Japanese translation. This article is published with permission from the original author.

When dealing with data, you will inevitably come into contact with Excel. Even if you don't use it for yourself, your clients and colleagues will need it. Excel is good at spreadsheets with small datasets.

But I was always lamenting when I saw an Excel sheet with tens of thousands of rows and hundreds of columns. These sheets are heavy and prone to crash during calculations.

So I decided to use Python to handle such a huge Excel file. It also comes with the great advantage of being able to write reusable code and provide documentation. Let's get started!

Read an Excel file with Python

The file I want to process this time contains about 1 million rows and 16 columns.

Python provides a read_excel () function to read an Excel file as a DataFrame object.

import pandas as pd
import numpy as np
df = pd.read_excel(...\\Excel-Tutorial.xlsx')

** The Pandas library is not included with Python and must be installed with pip etc. ** **

As you can see, the data looks pretty good in general, but the column headers seem to be wrong. Many Excel maps contain headings and other information. Let's skip this part and define the header line.

df = pd.read_excel('…\\Excel-Tutorial.xlsx', header=[1]).reset_index()

The argument header = [1] specifies that the second row of the Excel sheet should be used as the header. All previous lines are ignored.

Do calculations using Pandas

A common question in the marketing department is annual sales in each country.

This calculation was completed in 86ms. One of the great advantages of working with Excel files in Python is that it can handle all kinds of calculations much faster than Excel itself. The more complex the process, the greater the speed advantage.

You may also need national data grouped by year and category as a sales department request. You need to save the calculation results in separate worksheets to output the market data for each country.

Save the result as Excel

The next step is to save the file again as an Excel file and provide it to your sales and marketing departments. Create a pd.ExcelWriter object and create a different worksheet for each.

It's easy? Let's take a look at the newly created workbook.

As you can see, the DataFrame is saved correctly in the specified worksheet. I sent this great achievement to both departments and received an email the next day. I was asked for formatting and visualization. Since such data needs to be converted every month, I decided to run the task in Python as well.

Formatting and visualization

You will need to recreate the writer object for formatting and visualization.

As you can see, the beginning of the code is the same as in the first example, creating a writer object. You can use xlsxwriter to access Excel features such as graphs and formats. To access these features, you need to get a workbook object with workbook = writer.book and a worksheet object withworksheet = writer.sheet ['Sales_Sums']. In this example, we will make changes to the first sheet. Add a graph, specify a range of data (= Sales_Sums! $ B $ 2: $ B $ 7'), and add it to cell A9 of the worksheet.

Format your sales data in the same way. Add a 3-color scale to the B2: B7 range to visually highlight the low and high values. Adjust the width of the worksheets in the first and second columns with worksheet.set_column (0,1,30). It also formats the sales data column header and renames it to " 2019 Sales Data ". And finally save the file.

The results are much better and have significant advantages over Excel. And next month, you can create exactly the same thing with just one click.

Summary

Python is very good at handling Excel files. With Python, you can easily handle large files, write reusable code, and even provide documentation to your colleagues. We also confirmed that we could easily access the advanced features of Python. You could also fully automate the reporting process itself.

Translation cooperation

Original Author: Benedikt Droste Thank you for letting us share your knowledge!

This article was published with the cooperation of the following people. Thank you again. Selector: yumika tomita Translator: @ siho1 Auditor: @nyorochan Publisher: @aoharu

Would you like to write an article with us?

We translate high-quality articles from overseas into Japanese with the cooperation of several excellent engineers and publish the articles. Please contact us if you can sympathize with the activity or if you are interested in spreading good articles to many people. Please send a message with the title "I want to participate" in [Mail](mailto: [email protected]), or send a message in Twitter. For example, we can introduce the parts that can help you after the selection.

We look forward to your opinions and impressions.

How was this article? ・ I wish I had done this, I want you to do more, I think it would be better ・ This kind of place was good We are looking for frank opinions such as. Please feel free to post in the comments section as we will use your feedback to improve the quality of future articles. We also welcome your comments on Twitter. We look forward to your message.

Recommended Posts

Improve your productivity by processing huge Excel files with Python
Sort huge files with python
Handle Excel CSV files with Python
[Easy Python] Reading Excel files with openpyxl
[Easy Python] Reading Excel files with pandas
Excel with Python
Communication processing by Python
Handle Excel with python
Image processing with Python
Operate Excel with Python (1)
Operate Excel with Python (2)
Manipulate excel files from python with xlrd (personal notes)
Learn Python asynchronous processing / coroutines by comparing with Node.js
Image processing with Python (Part 2)
[Python] Send gmail with python: Send one by one with multiple image files attached
Operate Excel with Python openpyxl
100 Language Processing with Python Knock 2015
Let's run Excel with Python
Acoustic signal processing with Python (2)
Acoustic signal processing with Python
Sorting image files with Python (2)
Sorting image files with Python (3)
Image processing with Python (Part 1)
Sorting image files with Python
Integrate PDF files with Python
Image processing with Python (Part 3)
Reading .txt files with Python
Image processing by python (Pillow)
[Python] Image processing with scikit-image
Visualize your pocket money files with the Python web framework Dash
solver> Link> Solve Excel Solver with python
[Python] Easy parallel processing with Joblib
Recursively unzip zip files with python
100 Language Processing Knock with Python (Chapter 1)
Manipulating EAGLE .brd files with Python
[Python] POST wav files with requests [POST]
Create an Excel file with Python3
Decrypt files encrypted with OpenSSL with Python 3
100 Language Processing Knock with Python (Chapter 3)
Image processing with Python 100 knocks # 3 Binarization
Let's play with Excel with Python [Beginner]
Read files in parallel with Python
Image processing with Python 100 knocks # 2 Grayscale
100 Language Processing Knock Chapter 1 by Python
Excel aggregation with Python pandas Part 1
Sorting files by Python naming convention
Basics of binarized image processing with Python
Get property information by scraping with python
Image processing with Python 100 knock # 10 median filter
[AWS] Using ini files with Lambda [Python]
Socket communication and multi-thread processing by Python
Play audio files from Python with interrupts
Create wordcloud from your tweet with python3
Save video frame by frame with Python OpenCV
Image processing with Python 100 knocks # 8 Max pooling
Draw Nozomi Sasaki in Excel with python
Periodically perform arbitrary processing with Python Twisted
Quickly create an excel file with Python #python
Convert Excel data to JSON with python
Let Heroku do background processing with Python
100 Language Processing Knock with Python (Chapter 2, Part 2)