[PYTHON] How to format a table using Pandas apply, pivot and swaplevel

Overview

Let's create a table like this with Pandas. スクリーンショット 2016-03-16 19.35.24.png

Input data

import pandas as pd

tokyo = pd.DataFrame(data={'year':['2013','2014','2015'],
                           'cnt':[10,15,20],
                           'price':[100,100,90],},
                     columns=['year','cnt','price'])

osaka = pd.DataFrame(data={'year':['2013','2014','2015'],
                           'cnt':[5,6,7],
                           'price':[100,100,100],},
                     columns=['year','cnt','price'])

yokohama = pd.DataFrame(data={'year':['2015'],
                              'cnt':[1],
                              'price':[50],},
                        columns=['year','cnt','price'])

result


tokyo
-----------------------
   year  cnt  price
0  2013   10    100
1  2014   15    100
2  2015   20     90

osaka
-----------------------
   year  cnt  price
0  2013    5    100
1  2014    6    100
2  2015    7    100

yokohama
-----------------------
   year  cnt  price
0  2015    1     50

The sample data assumes sales of a certain product in Tokyo, Osaka, and Yokohama for three years. cnt is the number and price is the unit price. Yokohama has data for 2015 only.

Concatenate input data (concat)

Give each DataFrame a city name and join them vertically.

#Add city name to each DataFrame
tokyo['city'] = 'tokyo'
osaka['city'] = 'osaka'
yokohama['city'] = 'yokohama'

#Combine vertically
df = pd.concat([tokyo,osaka,yokohama], ignore_index=True)

result


df
-------------------------------------
   year  cnt  price      city
0  2013   10    100     tokyo
1  2014   15    100     tokyo
2  2015   20     90     tokyo
3  2013    5    100     osaka
4  2014    6    100     osaka
5  2015    7    100     osaka
6  2015    1     50  yokohama

Create a new column using the information of an existing column (apply)

With the apply method, you can perform uniform processing for all columns and all rows. In this example, axis = 1 is specified, so the lambda expression is called line by line.

** Calculate sales by multiplying the number (cnt) and unit price (price) **

#axis=In 1, apply function to each line. axis axis='columns'But the same effect.
#x.cnt * x.The price part is the number*unit price
df["sales"] = df.apply(lambda x:x.cnt * x.price,axis=1)

result


df
------------------------------------------------
   year  cnt  price      city  sales
0  2013   10    100     tokyo   1000
1  2014   15    100     tokyo   1500
2  2015   20     90     tokyo   1800
3  2013    5    100     osaka    500
4  2014    6    100     osaka    600
5  2015    7    100     osaka    700
6  2015    1     50  yokohama     50

** Determine the Kanto / Kansai area from the city name **

#'west' if x.city == 'osaka' else 'east'Is a ternary operation.
df["area"] = df.apply(lambda x:'west' if x.city == 'osaka' else 'east',axis=1)

result


df
------------------------------------------------
   year  cnt  price      city  sales  area
0  2013   10    100     tokyo   1000  east
1  2014   15    100     tokyo   1500  east
2  2015   20     90     tokyo   1800  east
3  2013    5    100     osaka    500  west
4  2014    6    100     osaka    600  west
5  2015    7    100     osaka    700  west
6  2015    1     50  yokohama     50  east

Create a table of cities and years (pivot_table)

List the sales for each year in each city. Pandas automatically edits with pivot_table, but the arguments are complicated and you need to get used to it. .. This time, I referred to this page.

#Create a table of cnt and sales with area and city on the horizontal axis and year on the vertical axis.
#fill_value fills the missing part of the data with 0.
df2 = pd.pivot_table(df,
                     index=['area','city'], 
                     columns=['year'],
                     values=['cnt','sales'],
                     fill_value=0) 

result


df2
------------------------------------------------
               cnt           sales            
year          2013 2014 2015  2013  2014  2015
area city                                     
east tokyo      10   15   20  1000  1500  1800
     yokohama    0    0    1     0     0    50
west osaka       5    6    7   500   600   700

The above is the output result of Python, but if you make it a little easier to understand, it looks like this.

              |  cnt           | sales             <Column 1st layer(No name)
              |================|===================
year          | 2013 2014 2015 | 2013  2014  2015  <Column 2nd layer(The name is year)
==============|================|===================
east|tokyo    |   10   15   20 | 1000  1500  1800
    |yokohama |    0    0    1 |    0     0    50
west|osaka    |    5    6    7 |  500   600   700
==============|================|===================
area|city <<The Index also has two levels. The names are area and city.

Change the order of columns in Pivot results (swap level)

I'm pretty close to the output, but I want the columns to be in the order of "year"> "quantity, sales", so [swaplevel](http://pandas.pydata.org/pandas-docs/stable/generated/pandas. Edit with DataFrame.swaplevel.html). (If you know how to do it with pivot alone, please let me know.)

** Swap 2 rows in column **

#The first argument, 0, points to the 0th line of colum.
#The second argument, 1 points to the first line of colum.
df2=df2.swaplevel(0,1, axis=1)

Result (year has moved to the beginning)


df2
------------------------------------------------
year          2013 2014 2015  2013  2014  2015
               cnt  cnt  cnt sales sales sales
area city                                     
east tokyo      10   15   20  1000  1500  1800
     yokohama    0    0    1     0     0    50
west osaka       5    6    7   500   600   700

** Sort by year **

df3=df2[['2013','2014','2015']]

result


df3
------------------------------------------------
year          2013       2014       2015      
               cnt sales  cnt sales  cnt sales
area city                                     
east tokyo      10  1000   15  1500   20  1800
     yokohama    0     0    0     0    1    50
west osaka       5   500    6   600    7   700

Output to excel

Now that you have created the output you want, output it to Excel.

#pip install openpyxl
writer = pd.ExcelWriter('output.xlsx')
df3.to_excel(writer,'Sheet1')
writer.save()

スクリーンショット 2016-03-16 20.52.52.png

The rest is done by drawing ruled lines and coloring. It seems that it can be done with openpyxl, but this time I did it manually with Excel.

Recommended Posts

How to format a table using Pandas apply, pivot and swaplevel
[Python] How to add rows and columns to a table (pandas DataFrame)
<Pandas> How to handle time series data in a pivot table
[Pandas] How to check duplicates and delete duplicates in a table (equivalent to deleting duplicates in Excel)
[Python] How to output a pandas table to an excel file
How to build a LAMP environment using Vagrant and VirtulBox Note
How to paste a CSV file into an Excel file using Pandas
How to split and save a DataFrame
How to draw a graph using Matplotlib
How to install a package using a repository
How to use pandas Timestamp and date_range
How to divide and process a data frame using the groupby function
How to get a specific column name and index name in pandas DataFrame
How to copy and paste the contents of a sheet in Google Spreadsheet in JSON format (using Google Colab)
How to code a drone using image recognition
How to upload to a shared drive using pydrive
How to uninstall a module installed using setup.py
How to apply updlock, rowlock, etc. with a combination of SQLAlchemy and SQLServer
The first step to log analysis (how to format and put log data in Pandas)
How to read dynamically generated table definitions using SQLAlchemy
How to write a GUI using the maya command
How to use Pandas 2
How to set up a Python environment using pyenv
How to scrape horse racing data using pandas read_html
[GCF + Python] How to upload Excel to GCS and create a new table in BigQuery
How to hold a hands-on seminar using Jupyter using docker
How to extract null values and non-null values with pandas
How to display DataFrame as a table in Markdown
How to make a Python package using VS Code
How to save a table scraped by python to csv
How to execute a command using subprocess in Python
[Python] How to delete rows and columns in a table (list of drop method options)
How to manage a README for both github and PyPI
How to output a document in pdf format with Sphinx
How to transpose a 2D array using only python [Note]
How to print characters as a table with Python's print function
How to add new data (lines and plots) using matplotlib
How to generate a query using the IN operator in Django
How to write a metaclass that supports both python2 and python3
How to create dataframes and mess with elements in pandas
How to call a function
Key additions to pandas 1.1.0 and 1.0.0
How to use Pandas Rolling
How to hack a terminal
[Python] How to scrape a local html file and output it as CSV using Beautiful Soup
How to set a shortcut to switch full-width and half-width with IBus
Overview of how to create a server socket and how to establish a client socket
How to compare lists and retrieve common elements in a list
Process Splunk execution results using Python and save to a file
How to get followers and followers from python using the Mastodon API
[Python] How to get & change rows / columns / values from a table.
Create and deploy a Django (PTVS) app using Azure Table storage
How to install pandas on EC2 (How to deal with MemoryError and PermissionError)
Create a pandas Dataflame by searching the DB table using sqlalchemy
JOI2019 / 2020 1st qualifying 3rd How to solve problems A and B
How to find the memory address of a Pandas dataframe value
How to save only a part of a long video using OpenCV
[ROS2] How to describe remap and parameter in python format launch
How to update a Tableau packaged workbook data source using Python
How to create a CSV dummy file containing Japanese using Faker
How to POST to a specified channel without using Slack's Incoming WebHooks