[PYTHON] Aggregate steps by day from iPhone healthcare data to create a CSV file

I enjoyed wearing Android Wear and Apple Watch when they first came out, but I still want to use the watch as I like it, so I no longer use it as an activity meter. An app that can record healthcare data is installed as standard on the iPhone that you carry around every day without wearing an activity meter. Since the data has been accumulated, I would like to export it from the iPhone and use it for data analysis.

Export healthcare data from iPhone

Open Healthcare in the iPhone app and tap the profile icon in the upper right. 　

Tap Export Healthcare Data on the profile page. 　　 Tap Export in the confirmation dialog. 　

Tap the service for which you want to export healthcare data. 　

If you select iCloud Drive, the archive will be saved in the iCloud Drive folder on the synced PC with the file name Exported Data.zip.

CSV converter

Healthcare data is an XML-formatted exported data.xml file inside the exported data.zip. Since the step count data is managed by Excel, I wrote a script to convert it to CSV so that it can be easily copied and pasted. 　　 To use it, first clone the repository from here.

$ git clone https://github.com/masato/health-data-csv.git
$ cd health-data-csv

Copy the exported data.zip file to the cloned directory. For macOS iCloud Drive is in the following directory. Double quote because there is a half-width space in the path.

$ cp "$HOME/Library/Mobile Documents/com~apple~CloudDocs/Exported data.zip" .

convert.py is a Python script that extracts the XML of healthcare data from a Zip file, aggregates the number of steps by day, and outputs it to a CSV file. Only the step count data is extracted from the Record element by specifying type in HKQuantityTypeIdentifierStepCount. I'm studying Introduction to Data Analysis with Python-Data Processing Using NumPy and pandas, so the data analysis tool pandas Let's implement aggregation and export to CSV using (: //pandas.pydata.org/). 　 According to the article Handling a zip file containing a Japanese file name in Python 3, the Japanese file name is like Exported data.xml. It seems to be decoded by cp437.

`convert.py`


# -*- coding: utf-8 -*-

from lxml import objectify
import pandas as pd
from pandas import DataFrame
from dateutil.parser import parse
from datetime import datetime
import zipfile
import argparse
import sys, os

def main(argv):

    parser = argparse.ArgumentParser()
    parser.add_argument('-f', '--file',
                        default='Exported.zip',
                        type=str,
                        help='zip file name(Exported.zip)')
    parser.add_argument('-s', '--start',
                        action='store',
                        default='2016-01-01',
                        type=str,
                        help='start date(2016-12-01)')

    args = parser.parse_args()

    if not os.path.exists(args.file):
        print('Please specify the zip file name.')
        parser.print_help()
        sys.exit(1)

    zipfile.ZipFile(args.file).extractall()

    parsed = objectify.parse(open('apple_health_export/Exported data.xml'
                                  .encode('utf-8').decode('cp437')))

    root = parsed.getroot()

    headers = ['type', 'unit', 'startDate', 'endDate', 'value']

    data = [({k: v for k, v in elt.attrib.items() if k in headers})
            for elt in root.Record]

    df = DataFrame(data)
    df.index = pd.to_datetime(df['startDate'])

    #Only the number of steps
    steps = df[df['type'] == 'HKQuantityTypeIdentifierStepCount'].copy()
    steps['value'] = steps['value'].astype(float)

    #Slice if start date is in condition
    if args.start:
        steps = steps.loc[args.start:]

    #Group by day and aggregate
    steps_sum = steps.groupby(pd.TimeGrouper(freq='D')).sum()

    steps_sum.T.to_csv('./steps_{0}.csv'.format(datetime.now().strftime('%Y%m%d%H%M%S')),
                       index=False, float_format='%.0f')

if __name__ == '__main__':
    main(sys.argv[1:])

Executing a Python script

To execute the script, use continuumio / anaconda3 for the Docker image. A Docker image that uses Anaconda for data analysis. Jupyter is also installed. 　 The Python script uses the -f flag to specify the zip file name in the current directory exported from healthcare. The -s flag allows you to specify the start date of the record to be converted to CSV.

$ docker pull continuumio/anaconda3
$ docker run -it --rm \
  -v $PWD:/app \
  -w /app \
  continuumio/anaconda3 \
  python convert.py -f Exported data.zip -s 2016-12-01

A CSV file such as "steps_xxx.csv" that aggregates the number of steps by day has been created in the current directory.

$ cat steps_20161212013800.csv
2016-12-01,2016-12-02,2016-12-03,2016-12-04,2016-12-05,2016-12-06,2016-12-07,2016-12-08,2016-12-09,2016-12-10,2016-12-11
7217,8815,2260,1828,3711,6980,7839,5079,7197,7112,2958