Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-

Overview

It is a continuation from the previous. Under the title of [Predicting the goal time of a full marathon by machine learning], from data collection to model creation and prediction in order to predict the goal time when running a full marathon (42.195 km) from data during running practice I will write a series of flow of.

In the previous article (Predicting the goal time of a full marathon by machine learning-②: I tried to create learning data with Garmin-), I created learning data. In order to do so, we have described the procedure for deleting unnecessary items and adding necessary data.

This time, before creating a prediction model that predicts the goal time of a full marathon using the created training data, we will describe how to visualize the data and see the overall trend. Some of them are easy to do in Excel, but I hope you have the opportunity to know how to write code if you want to do the same in Python. cat-984097_1920.jpg [pixtabay](https://pixabay.com/en/photos/%E7%8C%AB-%E3%83%A1%E3%82%AC%E3%83%8D-%E7%9C%BC%E9 From% 8F% A1-% E3% 83% 9A% E3% 83% 83% E3% 83% 88-984097 /)

Contents of learning data

We are creating learning data featuring 14 items that are thought to affect the distance and pace during running.

  1. Practice date (yyyy / mm / dd HH: MM: ss) Item name: Practice Time
  2. Distance (km) Item name: Distance
  3. Time (HH: MM: ss) Item name: Time
  4. Average heart rate (bpm) Item name: Average heart rate
  5. Maximum heart rate (bpm) Item name: Max heart rate
  6. Aerobic TE Item name: Aerobic TE
  7. Average pitch (steps / minute) Item name: Average pitch
  8. Average pace per 1km (HH: MM: ss / km) Item name: Average pace
  9. Maximum pace per 1km (HH: MM: ss / km) Item name: Max pace
  10. Average stride (cm / step) Item name: Average stride
  11. Temperature at the start of running (℃) Item name: temperature
  12. Wind speed at the start of running (m / sec) Item name: Wind speed
  13. Working hours of the week (h / week) Item name: Work
  14. Average sleep time per day of the week (HH: MM: ss / day) Item name: Average sleep time

Sample data for one record

Practice Time Distance Time Average heart rate Max heart rate Aerobic TE Average pitch Average pace Max pace Average stride temperature Wind speed Work Average sleep time
2020/2/23 16:18:00 8.19 0:59:35 161 180 3.6 176 00:07:16 00:06:11 0.78 7.9 9 44.5 6:12:00

Monthly mileage

First, import what you think you will need to visualize the data. For the time being, I think that this is enough.

RunnningDataVisualization.ipynb


import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import seaborn as sns

You can draw a graph of the monthly mileage with the following code.

RunnningDataVisualization.ipynb


df = pd.read_csv(r'Activities.csv', index_col=["PracticeTime"],parse_dates=True) 
#"PracticeTime"To read as a date type, specify the index as an argument index_Do it with col
#parse_Specify True for dates and index_Set the item specified by col as a date type index

#Draw graph
df_m = df.resample(rule="M").sum()
df_m_graph = df_m['Distance']
df_m_graph.plot.bar()

#Set various graph display formats
plt.title("Distance per month", fontsize = 22) #Give the graph a title name
plt.grid(True) #Add a scale line to the graph
plt.xlabel("month", fontsize = 15)  #Label the horizontal axis of the graph
plt.ylabel("km", fontsize = 15)  #Label the vertical axis of the graph
plt.yticks( np.arange(0, 60, 5) ) #Adjust the size of the graph


Execution result
キャプチャ.JPG

If you look at it like this, you can see how much you haven't practiced in the hot summer months.

Scatter plot-Relationship between pace and pitch-

Next, I will draw a scatter plot to see if there is a correlation between the pace and pitch per kilometer. Generally speaking, if the pace slows down, the pitch (steps per minute) will decrease, but what about the reality?

RunnningDataVisualization.ipynb


df = df.sort_values("Average pace") #Sort the pace in order of speed
plt.scatter(df['Average pace'], df['Average pitch'],s=40 ,marker="*", linewidths="4", edgecolors="orange") #Draw a scatter plot

plt.title("Scatter plot of pace and pitch", fontsize = 22)
plt.ylabel('Average pitch', fontsize = 15)
plt.xlabel('Average pace', fontsize = 15)
plt.grid(True)
plt.xticks(rotation=90)
plt.figure(figsize=(50, 4))


Execution result
キャプチャ.JPG

You can see that the pitch is different from time to time, regardless of whether the pace is fast or slow.

Scatter plot-Relationship between pace and stride-

Then what about the relationship between pace and stride? If the pace slows down, the stride (step length per step) is likely to decrease.

RunnningDataVisualization.ipynb


df = df.sort_values("Average pace")
plt.scatter(df['Average pace'], df['Average stride'],s=40 ,marker="*", linewidths="4", edgecolors="blue")
plt.title("Scatter plot of pace and stride", fontsize = 22)
plt.ylabel('Average stride', fontsize = 15)
plt.xlabel('Average pace', fontsize = 15)
plt.grid(True)
plt.xticks(rotation=90)
plt.figure(figsize=(10, 10),dpi=200)
plt.show()


Execution result
キャプチャ.JPG

Unlike the scatter plot of pace and pitch, you can see that the collection of points is somehow downward-sloping. In other words, it can be read that the slower the pace, the smaller the stride is up to 25 cm.

When you run a lot of distance, there will always be a moment when the pace slows down, but was this one of the causes? You can be convinced by visualizing with Python. ←

Correlation coefficient between features

Finally, let's find out the correlation coefficient between each feature. Correlation with mileage, heart rate, etc. in the four features (temperature, wind speed, weekly working hours, average sleeping time) added to the training data in addition to the data recorded by Garmin If a strong feature quantity appears, it is considered that it has some influence on the pace and mileage.

This time, I didn't know how to calculate the correlation coefficient of the time data, so I calculated only the correlation coefficient between the features of the numerical data.

When calculating the correlation coefficient, type-convert the average heart rate and maximum heart rate values that were read as a character string when reading csv from the character string to a numerical value.

RunnningDataVisualization.ipynb



#Type conversion
df['Average heart rate'] = df['Average heart rate'].fillna(0).astype(np.int64)
df['Max heart rate'] = df['Max heart rate'].fillna(0).astype(np.int64)

#Visualize the correlation coefficient
df_corr = df.corr()
print(df_corr) #Display the correlation coefficient between features in a list
fig = plt.subplots(figsize=(8, 8)) #Easy-to-understand visualization
sns.heatmap(df_corr, annot=True,fmt='.2f',cmap='Blues',square=True)


Execution result
キャプチャ.JPG キャプチャ.JPG

Among the three features (temperature, wind speed, and working hours on a weekly basis) that we paid attention to, none of them have an absolute value of correlation coefficient exceeding 0.5 with other features. In other words, it can be seen that these three features do not significantly affect the mileage or pace.

Well, if you think about it, you don't practice running on days that are too hot, too cold, or windy, and if you work a lot during the week, you will get a lot of physical fatigue. You will choose not to practice running. So this result is also convincing.

Unfortunately, I couldn't find the features that affect the mileage and pace just by calculating the correlation coefficient, but by visualizing while looking at various data like this, I am when I run. It's a good opportunity to look back on the trends and how to practice.

Next time, we will finally create a prediction model and rotate the prediction process.

Recommended Posts

Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
Summary of the basic flow of machine learning with Python
Align the number of samples between classes of data for machine learning with Python
A story stuck with the installation of the machine learning library JAX
[Machine learning] Check the performance of the classifier with handwritten character data
A beginner of machine learning tried to predict Arima Kinen with python
Get a glimpse of machine learning in Python
I started machine learning with Python Data preprocessing
Build a Python machine learning environment with a container
[Machine learning pictorial book] A memo when performing the Python exercise at the end of the book while checking the data
[Introduction to Python] How to get the index of data with a for statement
Run a machine learning pipeline with Cloud Dataflow (Python)
Predict the gender of Twitter users with machine learning
Build a machine learning application development environment with Python
Record of the first machine learning challenge with Keras
Extract the band information of raster data with python
The first step of machine learning ~ For those who want to implement with python ~
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
Machine learning with Python! Preparation
Beginning with Python machine learning
Try scraping the data of COVID-19 in Tokyo with Python
The result of Java engineers learning machine learning in Python www
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
Create a python machine learning model relearning mechanism with mlflow
A beginner's summary of Python machine learning is super concise.
A concrete method of predicting horse racing by machine learning and simulating the recovery rate
Save the result of the life game as a gif with python
[python, ruby] fetch the contents of a web page with selenium-webdriver
Building a Windows 7 environment for getting started with machine learning with Python
Machine learning with python (1) Overall classification
The story of making a standard driver for db with python.
A function that measures the processing time of a method in python
[Python3] Define a decorator to measure the execution time of a function
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
The idea of feeding the config file with a python file instead of yaml
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Solve the subset sum problem with a full search in Python
"Scraping & machine learning with Python" Learning memo
The story of making a module that skips mail with python
Create a compatibility judgment program with the random module of python.
Feature engineering for machine learning starting with the 1st Google Colaboratory --Binarization and discretization of count data
Rewrite the field creation node of SPSS Modeler with Python. Feature extraction from time series sensor data
The story of making a university 100 yen breakfast LINE bot with Python
[AtCoder explanation] Control the A, B, C problems of ABC182 with Python!
Calculate the shortest route of a graph with Dijkstra's algorithm and Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
REST API of model made with Python with Watson Machine Learning (CP4D edition)
The story of having a hard time introducing OpenCV with M1 MAC
[AtCoder explanation] Control the A, B, C problems of ABC186 with Python!
[Python] [Machine learning] Beginners without any knowledge try machine learning for the time being
Try to image the elevation data of the Geographical Survey Institute with Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[AtCoder explanation] Control the A, B, C problems of ABC185 with Python!
Source code of sound source separation (machine learning practice series) learned with Python
Calculate the probability of being a squid coin with Bayes' theorem [python]
Hit a method of a class instance with the Python Bottle Web API
Receive a list of the results of parallel processing in Python with starmap
Plot CSV of time series data with unixtime value in Python (matplotlib)
I made a GAN with Keras, so I made a video of the learning process.
[AtCoder explanation] Control the A, B, C problems of ABC187 with Python!