[PYTHON] Calculate the time difference between two columns with Pandas DataFrame

Pandas and date and time data

Pandas is a very useful library that gives you the flexibility to read data and perform type conversions. However, compared to int and float type, ** datetime type (datetime64 type in 64bit OS) requires careful handling, and subtraction is especially easy to get hooked on **.

Date and time subtraction

Let's perform column subtraction between dates and times.

Data to use

time_diff_test.csv


A,B
2020-06-02 13:45:16,2020-06-02 13:50:23
2020-06-02 13:50:15,2020-06-02 14:55:19
2020-06-02 13:52:10,2020-06-03 13:57:21

Code that reads data and performs subtraction

Please note that if you do not specify an argument when reading with pd.read_csv (), it will be read as str type instead of datetime type. There are several ways to read it as a datetime type column, but it's easier to specify the argument parse_dates as shown below.

time_diff_test.py


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df)

#0   0 days 00:05:07
#1   0 days 01:05:04
#2   1 days 00:05:11
#Name: diff, dtype: timedelta64[ns]

If you simply subtract columns as described above, it will be output as timedelta64 type.

Get date and time subtraction in seconds (minutes, hours, days ...)

In actual use, there are many situations where you want to get it in seconds or minutes.

Code that doesn't work

If you want to convert timedelta type variables to seconds **. Total_seconds () ** is effective, It doesn't work even if I execute it as it is in the column (Series) of DateFrame

python


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
print(df['diff'].total_seconds())

#'Series' object has no attribute 'total_seconds'

Code that works

There are several ways to convert a timedelta type Series to seconds, By combining ** map and lambda as shown below, it is easy to write in one line **

time_diff_test_seconds.py


import pandas as pd
df = pd.read_csv('time_diff_test.csv',parse_dates=['A', 'B'])
df['diff'] = df['B'] - df['A']
df['diff_ts'] = df['diff'].map(lambda x: x.total_seconds())
print(df['diff_ts'])

#0       307.0
#1     3904.0
#2    86711.0
#Name: diff_ts, dtype: float64

[Addition] Use Series.dt.total_seconds ()

df['diff_ts'] = df['diff'].dt.total_seconds()
print(df['diff_ts'])

#0      307.0
#1     3904.0
#2    86711.0
#Name: diff, dtype: float64

Time difference in minutes and hours

total_minutes (), total_hours () ‥ ** There is no method, lol ** Divide total_seconds () by 60, 3600

print(df['diff'].map(lambda x: x.total_seconds()/60.0))
print(df['diff'].map(lambda x: x.total_seconds()/3600.0))

#0       5.116667
#1      65.066667
#2    1445.183333
#Name: diff, dtype: float64
#0     0.085278
#1     1.084444
#2    24.086389
#Name: diff, dtype: float64

Difference between total_seconds () and seconds

In addition to ** total_seconds () **, there are ** seconds ** as a way to convert timedelta to seconds. Let's find out the difference

total_seconds()


print(df['diff'].map(lambda x: x.total_seconds()))
#0      307.0
#1     3904.0
#2    86711.0
#Name: diff, dtype: float64

seconds


print(df['diff'].map(lambda x: x.total_seconds()))
#0     307
#1    3904
#2     311
#Name: diff, dtype: int64

According to here ** total_seconds (): Difference expressed in seconds (float type) ** ** seconds: The number of seconds (int type) of the difference decomposed into days, seconds, microseconds, milliseconds, minutes, hours, and weeks ** It seems that it is.

It seems that total_seconds () ** represents the so-called ** "time difference".

** This article is over ** Thank you for watching till the end!

Recommended Posts

Calculate the time difference between two columns with Pandas DataFrame
Type conversion of multiple columns of pandas DataFrame with astype at the same time
Calculate the angle between n-dimensional vectors with TensorFlow
Bayesian modeling-estimation of the difference between the two groups-
Check what line caused the error with apply () (dataframe, Pandas)
Reformat the timeline of the pandas time series plot with matplotlib
Calculate the similarity between sentences with Doc2Vec, an evolution of Word2Vec
Browse .loc and .iloc at the same time in pandas DataFrame
Examine the relationship between two variables (2)
Extract the maximum value with pandas.
Estimate the delay between two signals
Examine the relationship between two variables (1)
[Python] Join two tables with pandas
Extract specific multiple columns with pandas
Bulk Insert Pandas DataFrame with psycopg2
Examine the relationship between two variables (2)
Calculate the time difference between two columns with Pandas DataFrame
Estimate the delay between two signals
Examine the relationship between two variables (1)
Calculate the angle between n-dimensional vectors with TensorFlow
Bayesian modeling-estimation of the difference between the two groups-
Various ways to calculate the similarity between data in python
Calculate the number of changes
Calculate the similarity between sentences with Doc2Vec, an evolution of Word2Vec
How to get the date and time difference in seconds with python
Adjust the spacing between figures with Matplotlib
Plot the Nikkei Stock Average with pandas
[Pandas] Expand the character string to DataFrame
Replace column names / values with pandas dataframe
Is there NaN in the pandas DataFrame?
Convert the world time zone time string to Japan time without calculating the time difference with python.