[PYTHON] Add totals to rows and columns in pandas

I will write it as a memorandum when using * pandas *. That's what the answer here says.

df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Grand Total']).T])
df = pd.concat([df,pd.DataFrame(df.sum(axis=1),columns=['Total'])],axis=1)

The basics of * pandas * are summarized in the official manual so that it can be read in about 10 minutes.

As test data, we will use past weather information that can be downloaded from the Japan Meteorological Agency. The weather conditions for Tokyo for one year (January-December 2013) are totaled monthly and the total of rows and columns is calculated.

Japan Meteorological Agency | Past Meteorological Data Download

気象庁 過去の気象データ・ダウンロード.png

Environmental setting

Make the independent Python library available in pyvenv.

$ PYVENV_DIR=$HOME/.pyvenv/data-analytics
$ pyvenv-3.4 $PYVENV_DIR
$ source $PYVENV_DIR/bin/activate
$ pip install pandas

Let's call the file downloaded from the Japan Meteorological Agency weather-tokyo-2013.csv. As it is difficult to handle with * pandas * as it is downloaded, pre-processing is performed.

Check data reading with pandas

>>> import pandas as pd

# Read CSV file, whose "day" column is parsed with `dateutil.parser.parse`.
>>> fname = 'weather-tokyo-2013.csv'
>>> df = pd.read_csv(fname, parse_dates=['day'])

# Check DataFrame overview.
>>> df.dtypes
>>> df.head(3)
>>> df.describe()

I was able to confirm that the CSV file was loaded into * DataFrame *.

Then extract the "month" part from the "day" column and add it to the "month" column. The "day" column is recognized as a datetime when reading the CSV, so you can use the datetime API.

>>> df['month'] = df['day'].apply(lambda x: x.month)

Aggregate by groupby and add row and column total value

>>> grouped = df.groupby(['weather_day', 'month'])['day'].count()
>>> type(grouped)
>>> grouped.index
>>> grouped.head(3)

The return value of groupby is * Series *, so convert it to * DataFrame * with the ʻunstack` method. This will allow you to calculate the row and column totals in the opening process, in this case the monthly totals for the weather overview.

>>> monthly = grouped.unstack()

# Add "Total" column on each row.
>>> monthly = pd.concat([monthly,
                         pd.DataFrame(monthly.sum(axis=1), columns=['Total'])],
                        axis=1)
# Add "Grand Total" row on each column.
>>> monthly = pd.concat([monthly,
                         pd.DataFrame(monthly.sum(axis=0), columns=['Grand Total']).T])

In addition to simple sums, you can also filter by row or column name to add up. If you think of a day that includes "rain" as a rainy day and a day that includes "fine" as a sunny day, you can use the index name to calculate as follows. (Since the actual data includes weather such as "sunny temporary rain", there are days when it is duplicated.)

>>> monthly = pd.concat([monthly,
                         pd.DataFrame(monthly[monthly.index.map(lambda x: 'rain' in x)].sum(axis=0), columns=['Rainy']).T])
>>> monthly = pd.concat([monthly,
                         pd.DataFrame(monthly[monthly.index.map(lambda x: 'Fine' in x)].sum(axis=0), columns=['Fine']).T])

Finally, write the aggregation result to a text file. Headers, index names, etc. can be controlled with optional arguments. For example, specify the delimiter with * sep *.

>>> fname = 'weather-tokyo-2013-monthly-weather_name.csv'
>>> monthly.to_csv(fname, sep=',')

After that, check the result with spreadsheet software such as Excel.

Draw a graph using ipython

You can open the data file in Excel and draw the graph, but you can also draw in Python using * matplotlib *. If you want to use it for a while, use * ipython *. It will perform the chores of graph drawing processing behind the scenes.

$ pip install matplotlib ipython
$ ipython

You are now in the IPython interactive shell. Read the data you just exported and draw a graph of the monthly transition between sunny and rainy days.


In [1]: import pandas as pd

In [2]: pd.options.display.mpl_style = 'default'

In [3]: fname = 'weather-tokyo-2013-monthly-weather_name.csv'

In [4]: df = pd.read_csv(fname, index_col=0)

In [5]: df.T[['Fine', 'Rainy']][:12].plot(color=['r', 'b']).set_xlabel('Month')
Out[5]: <matplotlib.text.Text at 0x106cb89e8>

A graph like this should be drawn in a separate window.

月別の天候変化_東京.png

Since June is the rainy season, it rains a lot (less than half the number of days), and you can see that there are almost no rainy days in January. The weather changes easily in October, but it seems that there are still more sunny days than rainy days. In addition, since the rough aggregation conditions are easy to implement and the fluctuations are easy to understand, when performing detailed analysis, let's aggregate the values of the weather overview after categorizing them.

Recommended Posts

Add totals to rows and columns in pandas
[Python] Add total rows to Pandas DataFrame
Adding Series to columns in python pandas
[Python] Swapping rows and columns in Numpy data
Key additions to pandas 1.1.0 and 1.0.0
Swap columns in pandas dataframes
A handy function to add a column anywhere in a Pandas DataFrame
How to create dataframes and mess with elements in pandas
[Python] How to delete rows and columns in a table (list of drop method options)
How to write soberly in pandas
How to use calculated columns in CASTable
Load csv with duplicate columns in pandas
How to change multiple columns of csv in Pandas (Unixtime-> Japan Time)
How to get a specific column name and index name in pandas DataFrame
Create an empty array in Numpy to add rows for each loop
In Jupyter, add IPerl to the kernel.
How to reassign index in pandas dataframe
[Python] Pandas to fully understand in 10 minutes
It's really useful to add save () and load () methods to Target in Luigi
How to read CSV files in Pandas
How to use is and == in Python
How to use pandas Timestamp and date_range
The first step to log analysis (how to format and put log data in Pandas)
[Pandas] How to check duplicates and delete duplicates in a table (equivalent to deleting duplicates in Excel)
How to generate permutations in Python and C ++
Add rows to an empty array with numpy
Delete rows with arbitrary values in pandas DataFrame
Send messages to Skype and Chatwork in Python
Header shifts in read_csv () and read_table () of Pandas
To add a module to python put in Julialang
Add query to url parsed url and return to original url
Remove rows with duplicate indexes in pandas DataFrame
How to write async and await in Vue.js
To represent date, time, time, and seconds in Python
How to plot autocorrelation and partial autocorrelation in python
Display and analyze only some columns in CASTable
How to split machine learning training data into objective variables and others in Pandas
[Python] Read Japanese csv with pandas without garbled characters (and extract columns written in Japanese)
Convert timezoned date and time to Unixtime in Python2.7
[Python] [Django] How to use ChoiceField and how to add options
Various ways to extract columns in a NumPy array
Allow Keras 2.0 and OpenCV 3.2 to work in GPU environment
Add auto-completion to EV3 Micropyhon programming in VS Code
Function to return multi columns to single column in DataFrame
How to define Decorator and Decomaker in one function
numpy memorandum 2 / transpose doesn't just swap rows and columns
Convert comma-separated numeric strings to numbers in Pandas DataFrame
How to access with cache when reading_json in pandas
Add disks to extend LVM SWAP and / home area
Write tests in Python to profile and check coverage
[Spark] I'm addicted to trapping "", null and [] in DataFrame
How to add page numbers to PDF files (in Python)
I tried to summarize how to use pandas in python
How to use Decorator in Django and how to make it
Output a binary dump in binary and revert to a binary file
Convert numeric variables to categorical with thresholds in pandas
Go language to see and remember Part 7 C language in GO language
How to get RGB and HSV histograms in OpenCV
[python] Use DataFrame to label arbitrary variables and arrays together and save them in csv [pandas]