Excel aggregation with Python pandas Part 1

motivation

It slows down when the calculation formula is messed up in the sheet to aggregate the Excel information. I've come to see books related to Python + Excel at bookstores, so let's study.

environment

windows10 python3.9 VS Code Pandas openpyxl

Preparation

Assuming that python and pandas have been installed (using pip)

theme

There is such Excel. As a feature, the columns of departments differ depending on the seat. (Yellow column) I want to count the number of people in each department.

sheet1 sheet2 sheet3
image.png image.png image.png

Answer immediately

2 function definition in pandas_lib.py

pandas_lib.py


import pandas as pd #pandas is a library that provides functions to support data analysis in Python.
import numpy as np

#Group by key and count
def countByKeyFromFileAndSheet(filename, sheetname, key):
  df = pd.read_excel(filename, sheet_name=sheetname, engine="openpyxl")
  dict = df[key].value_counts().to_dict()
  print(dict)
  return dict

def concatenateDict(dict1, dict2):
  newdict=dict1
  for k in dict2:
    if k in newdict:
      newdict[k] += dict2[k]
    else:
      #newdict.update(k, dict2[k])
      newdict[k] = dict2[k]
  return newdict

pandas_main.py Use countByKeyFromFileAndSheet to aggregate the columns of department names of Sheet1 to Sheet3, store them in dict, and connect dicts with concatenateDict to generate dict.

pandas_main.py


import pandas_lib as pl

dict1 = pl.countByKeyFromFileAndSheet("./example.xlsx", "Sheet1", "Department")
dict2 = pl.countByKeyFromFileAndSheet("./example.xlsx", "Sheet2", "Department")
dict3 = pl.countByKeyFromFileAndSheet("./example.xlsx", "Sheet3", "Department")

dict = pl.concatenateDict(dict1, dict2)
dict = pl.concatenateDict(dict, dict3)

print(dict)

Execution result

After totaling to dict1-3, you can see that the total of 3 sheets is finally totaled.

{'Sales department': 4, 'Development department': 3, 'General Affairs Department': 3}
{'Sales department': 5, 'Development department': 3, 'Accounting department': 2}
{'Development department': 9, 'Accounting department': 1}
{'Sales department': 9, 'Development department': 15, 'General Affairs Department': 3, 'Accounting department': 3}

Finally

I implemented it muddy using the for statement in concatenateDict, but if there is another smarter implementation method, please let me know.

Recommended Posts

Excel aggregation with Python pandas Part 1
Excel aggregation with Python pandas Part 2 Variadic
Excel with Python
[Easy Python] Reading Excel files with pandas
Operate Excel with Python (1)
Operate Excel with Python (2)
[Python] How to read excel file with pandas
Image processing with Python (Part 2)
Studying Python with freeCodeCamp part1
Read csv with python pandas
Bordering images with python Part 1
Python application: Pandas Part 1: Basic
Python application: Pandas Part 2: Series
Scraping with Selenium + Python Part 1
Studying Python with freeCodeCamp part2
[Python] Change dtype with pandas
Image processing with Python (Part 1)
Solving Sudoku with Python (Part 2)
Image processing with Python (Part 3)
Scraping with Selenium + Python Part 2
solver> Link> Solve Excel Solver with python
Playing handwritten numbers with python Part 1
[Automation with python! ] Part 1: Setting file
Create an Excel file with Python3
Let's play with Excel with Python [Beginner]
[Python] Join two tables with pandas
Handle Excel CSV files with Python
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Automate simple tasks with Python Part0
[Automation with python! ] Part 2: File operation
How to read an Excel file (.xlsx) with Pandas [Python]
[Python] Format when to_csv with pandas
Play handwritten numbers with python Part 2 (identify)
Process Pubmed .xml data with python [Part 2]
Automate simple tasks with Python Part1 Scraping
Draw Nozomi Sasaki in Excel with python
Quickly create an excel file with Python #python
Python application: Pandas Part 4: DataFrame concatenation / combination
[Easy Python] Reading Excel files with openpyxl
100 Language Processing Knock with Python (Chapter 2, Part 1)
Create Excel file with Python + similarity matrix
FM modulation and demodulation with Python Part 2
[Part1] Scraping with Python → Organize to csv!
Excel table creation with Python [Progress management table]
QGIS + Python Part 2
FizzBuzz with Python3
Scraping with Python
My pandas (python)
Statistics with python
Scraping with Python
Python with Go
QGIS + Python Part 1
Excel-> pandas-> sqlite
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Python: Scraping Part 1