[PYTHON] Aggregate steps by day from iPhone healthcare data to create a CSV file

I enjoyed wearing Android Wear and Apple Watch when they first came out, but I still want to use the watch as I like it, so I no longer use it as an activity meter. An app that can record healthcare data is installed as standard on the iPhone that you carry around every day without wearing an activity meter. Since the data has been accumulated, I would like to export it from the iPhone and use it for data analysis.

Export healthcare data from iPhone

Open Healthcare in the iPhone app and tap the profile icon in the upper right.   health-1.png

Tap Export Healthcare Data on the profile page.   health-2.png   Tap Export in the confirmation dialog.   health-3.png

Tap the service for which you want to export healthcare data.   health-4.png

If you select iCloud Drive, the archive will be saved in the iCloud Drive folder on the synced PC with the file name Exported Data.zip.

CSV converter

Healthcare data is an XML-formatted exported data.xml file inside the exported data.zip. Since the step count data is managed by Excel, I wrote a script to convert it to CSV so that it can be easily copied and pasted.     To use it, first clone the repository from here.

$ git clone https://github.com/masato/health-data-csv.git
$ cd health-data-csv

Copy the exported data.zip file to the cloned directory. For macOS iCloud Drive is in the following directory. Double quote because there is a half-width space in the path.

$ cp "$HOME/Library/Mobile Documents/com~apple~CloudDocs/Exported data.zip" .

convert.py is a Python script that extracts the XML of healthcare data from a Zip file, aggregates the number of steps by day, and outputs it to a CSV file. Only the step count data is extracted from the Record element by specifying type in HKQuantityTypeIdentifierStepCount. I'm studying Introduction to Data Analysis with Python-Data Processing Using NumPy and pandas, so the data analysis tool pandas Let's implement aggregation and export to CSV using (: //pandas.pydata.org/).   According to the article Handling a zip file containing a Japanese file name in Python 3, the Japanese file name is like Exported data.xml. It seems to be decoded by cp437.

convert.py


# -*- coding: utf-8 -*-

from lxml import objectify
import pandas as pd
from pandas import DataFrame
from dateutil.parser import parse
from datetime import datetime
import zipfile
import argparse
import sys, os

def main(argv):

    parser = argparse.ArgumentParser()
    parser.add_argument('-f', '--file',
                        default='Exported.zip',
                        type=str,
                        help='zip file name(Exported.zip)')
    parser.add_argument('-s', '--start',
                        action='store',
                        default='2016-01-01',
                        type=str,
                        help='start date(2016-12-01)')

    args = parser.parse_args()

    if not os.path.exists(args.file):
        print('Please specify the zip file name.')
        parser.print_help()
        sys.exit(1)

    zipfile.ZipFile(args.file).extractall()

    parsed = objectify.parse(open('apple_health_export/Exported data.xml'
                                  .encode('utf-8').decode('cp437')))

    root = parsed.getroot()

    headers = ['type', 'unit', 'startDate', 'endDate', 'value']

    data = [({k: v for k, v in elt.attrib.items() if k in headers})
            for elt in root.Record]

    df = DataFrame(data)
    df.index = pd.to_datetime(df['startDate'])

    #Only the number of steps
    steps = df[df['type'] == 'HKQuantityTypeIdentifierStepCount'].copy()
    steps['value'] = steps['value'].astype(float)

    #Slice if start date is in condition
    if args.start:
        steps = steps.loc[args.start:]

    #Group by day and aggregate
    steps_sum = steps.groupby(pd.TimeGrouper(freq='D')).sum()

    steps_sum.T.to_csv('./steps_{0}.csv'.format(datetime.now().strftime('%Y%m%d%H%M%S')),
                       index=False, float_format='%.0f')

if __name__ == '__main__':
    main(sys.argv[1:])

Executing a Python script

To execute the script, use continuumio / anaconda3 for the Docker image. A Docker image that uses Anaconda for data analysis. Jupyter is also installed.   The Python script uses the -f flag to specify the zip file name in the current directory exported from healthcare. The -s flag allows you to specify the start date of the record to be converted to CSV.

$ docker pull continuumio/anaconda3
$ docker run -it --rm \
  -v $PWD:/app \
  -w /app \
  continuumio/anaconda3 \
  python convert.py -f Exported data.zip -s 2016-12-01

A CSV file such as "steps_xxx.csv" that aggregates the number of steps by day has been created in the current directory.

$ cat steps_20161212013800.csv
2016-12-01,2016-12-02,2016-12-03,2016-12-04,2016-12-05,2016-12-06,2016-12-07,2016-12-08,2016-12-09,2016-12-10,2016-12-11
7217,8815,2260,1828,3711,6980,7839,5079,7197,7112,2958

Recommended Posts

Aggregate steps by day from iPhone healthcare data to create a CSV file
Python script to create a JSON file from a CSV file
Create a dummy data file
How to create a config file
[Python] How to store a csv file as one-dimensional array data
How to create a CSV dummy file containing Japanese using Faker
How to create a clone from Github
Create a deb file from a python package
How to create a repository from media
Script to create a Mac dictionary file
Let Code Day6 Starting from Zero "1342. Number of Steps to Reduce a Number to Zero"
I made a package to create an executable file from Hy source code
How to create sample CSV data with hypothesis
Edit Excel from Python to create a PivotTable
How to read a CSV file with Python 2/3
How to create a function object from a string
Randomly sample MNIST data to create a dataset
How to create a JSON file in Python
Steps to create a Twitter bot with python
Draw a graph with matplotlib from a csv file
Read line by line from a file with Python
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
How to use NUITKA-Utilities hinted-compilation to easily create an executable file from a Python script
Python-Read data from a numeric data file and calculate covariance
Steps from installing Python 3 to creating a Django app
How to save a table scraped by python to csv
I tried reading data from a file using Node.js.
How to create a kubernetes pod from python code
Python --Read data from a numeric data file to find the covariance matrix, eigenvalues, and eigenvectors
Let's create a program that automatically registers ID/PW from CSV to Bitwarden with Python + Selenium
Convert mesh data exported from SpriteUV2 to a format that can be imported by Spine
Create a data frame from the acquired boat race text data
I made a tool to create a word cloud from wikipedia
Create AI to identify Zuckerberg's face by deep learning ③ (Data learning)
A python script that converts Oracle Database data to csv
Create a summary table by product and time by processing the data extracted from a certain POS system
I tried to create a Power BI report by processing CSV / JSON logs using Azure Databricks