Introduction of "scikit-mobility", a library that allows you to easily analyze human flow data with Python (Part 1)

Content of this article

Introducing the library "** Scikit-mobility **" for handling human flow data in Python. This time, the content will be introductory so that people who are "what is human flow data in the first place?" Will be interested.

1.First of all 2. What is human flow data? 3. What can you do with Scikit-mobility? 4. Assumptions 5. Library installation 5. Let's look at the movement history data 6. Data set used this time 7. Read data 8. Visualize the movement history on the map 6. Summary

logo_skmob.png

github[https://github.com/scikit-mobility/scikit-mobility]

1.First of all

Do you know a Python library called "** Scikit-mobility **"? Just made last year, which you may not know yet, ** For analysis of human movement data (hereinafter referred to as human flow data) A library with features **. In recent years, there is a background that a large amount of location information is accumulated in map applications and SNS, and algorithms for processing and analyzing human flow data, including evaluation of privacy risk, are in place.

First of all, I would like to briefly introduce "** What is human flow data? " and " What can scikit-mobility do? **".

2. What is human flow data?

Scikit-mobility mainly handles ** 2 types ** of data.

  1. ** Movement history data (trajectories) ** Latitude / longitude data showing the trajectory of movement. For familiar items, we use GPS to collect and analyze current location information used in map apps and SNS, and long-term behavior in research and surveys. map_0_1.png

  2. ** Moving flow data (fluxes) ** Data on the flow rate of people moving between locations. It is data showing how many people went from a specific place (starting point / orient) to a specific place (ending point / destination) like an OD survey.

map_0_2.png

3. What can you do with Scikit-mobility?

With scikit-mobility, you can easily perform the following analysis on human flow data.

--Data preprocessing --Behavior analysis (measuring) --Data generation (synthesis) --Predicting flow rate --Privacy Risk Assessment (Assessing)

I would like to delve into each content in the future. However, this time, I would like to introduce a little more about "** What is human flow data in the first place? **" just before that.

4. Assumptions

Execution environment

Installation

First, let's install the library.

$ pip install scikit-mobility

5. Handling of movement history data

Data set used this time

Use the Sample Data provided on github. (* Please note that it will be downloaded automatically. It is a text file of about 2MB)

This is Microsoft's [GeoLife GPS Trajectories](https://www.microsoft.com/en-us/download/details.aspx?id=52367&from=https%3A%2F%2Fresearch.microsoft.com % 2Fen-us% 2Fdownloads% 2Fb16d359d-d164-469e-9fd4-daa38f2b2e13% 2F). The Microsoft Research Asia Geolife project collects GPS log data from 2007 to 2012 for 182 users in Beijing.

The sample data contains data for two of them.

Data reading

Let's read the downloaded data.

Creating a TrajDataFrame

The movement history data is read with the data type TrjDataFrame. This is an extension of pandas' DataFrame.

#Data reading
tdf = skmob.TrajDataFrame.from_file('geolife_sample.txt.gz'
                                    ,latitude='lat'
                                    ,longitude='lon'
                                    ,user_id='user'
                                    ,datetime='datetime'
                                    )
#Check the contents
print(tdf.head())

The contents are like this.


   uid        lat         lng            datetime
0    1  39.984094  116.319236 2008-10-23 13:53:05
1    1  39.984198  116.319322 2008-10-23 13:53:06
2    1  39.984224  116.319402 2008-10-23 13:53:11
3    1  39.984211  116.319389 2008-10-23 13:53:16

Required arguments

In order to create TrajDataFrame, it is necessary to specify the column names corresponding to the three arguments.

-* latitude : latitude - Longitude : Longitude - datetime *: date

These are the basic information of the movement history, such as "when and where you were".

option

You can also optionally specify the following arguments:

-* user_id *: User ID It shows "who" movement history data. It does not have to be the data for one person, but it is necessary if the data for multiple people is mixed.

-* tid *: Trajectory id An ID is attached to a series of movements. For example, when the means of transportation is switched, such as "walk → bus → train", it is given when you want to distinguish each movement.

Of course, any column other than this can be read without any problem.

Created from data frame

It is also possible to convert a data frame to TrajDataFrame.

import pandas as pd
import skmob
#Preparation of sample data
data_list = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'], 
            [1, 39.984198, 116.319322, '2008-10-23 13:53:06'],
            [1, 39.984224, 116.319402, '2008-10-23 13:53:11'],
            [1, 39.984211, 116.319389, '2008-10-23 13:53:16']]
#Create a data frame
data_df= pd.DataFrame(data_list, columns=['user', 'lat', 'lon', 'datetime'])
print('Before conversion: ', type(data_df)) 
#Convert to TrjDataFrame
tdf = skmob.TrajDataFrame(data_df, latitude='lat', longitude='lon', datetime='datetime', user_id='user')
print('After conversion: ', type(tdf))
print(tdf.head())
Before conversion:  <class 'pandas.core.frame.DataFrame'>
After conversion:  <class 'skmob.core.trajectorydataframe.TrajDataFrame'>

   uid        lat         lng            datetime
0    1  39.984094  116.319236 2008-10-23 13:53:05
1    1  39.984198  116.319322 2008-10-23 13:53:06
2    1  39.984224  116.319402 2008-10-23 13:53:11
3    1  39.984211  116.319389 2008-10-23 13:53:16

Try to visualize the movement history on the map

You cannot tell where the latitude / longitude data is by looking at the numbers alone. It is important to check on the map. TrajDataFrame can be easily visualized as follows.

Plot the movement history on the map

tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner')

-* zoom : You can specify how much to zoom the map. - Weight : You can specify the weight of the line to draw - opacity : You can specify the transparency of the line to draw - tiles *: You can select the type of background map

It automatically color-codes each uid and displays it. If you look on the map, you can see where you have moved, how much activity you have, and where you are going. map_1.png

Zoom out the map

You can see how far the user is moving by zooming out until you can see the entire range of activity. One user has gone quite far. map_3.png

Pop-up display of start and end points

Also, markers are displayed for each user's first log (green) and last log (red). Click to pop up time and latitude / longitude map_2.png

Visualizing it on a map in this way makes it easier to understand the user's movements.

6. Summary

What did you think. This time, I briefly introduced Scikit-mobility and what kind of data it handles. Since you don't usually see the movement history data, you may have seen it for the first time. We hope that you will take this opportunity to become interested in human flow data analysis. If you are using google map, it may be interesting to download and analyze your location information. (Download Google Map History (timeline)) In the next and subsequent articles, I would like to introduce the flow rate data and specific functions and algorithms. That's all for this time! Thank you for reading!

Recommended Posts

Introduction of "scikit-mobility", a library that allows you to easily analyze human flow data with Python (Part 1)
A memo that allows you to change Pineapple's Python environment with pyenv
[Introduction to Python] How to get the index of data with a for statement
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
A learning roadmap that allows you to develop and publish services from scratch with Python
A Python script that allows you to check the status of the server from your browser
I made a library to easily read config files with Python
Created a service that allows you to search J League data
We have released an extension that allows you to define xarray data like a Python data class.
[Introduction to Data Scientists] Basics of Python ♬
[Introduction to Python] How to sort the contents of a list efficiently with list sort
Code reading of faker, a library that generates test data in Python
Create a plugin that allows you to search Sublime Text 3 tabs in Python
You can easily create a GUI with Python
Free version of DataRobot! ?? Introduction to "PyCaret", a library that automates machine learning
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
PhytoMine-How to efficiently acquire gene sequence data of a specific plant species with Python
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
A memo connected to HiveServer2 of EMR with python
Summary of tools needed to analyze data in Python
[Chapter 5] Introduction to Python with 100 knocks of language processing
Reading Note: An Introduction to Data Analysis with Python
[Chapter 3] Introduction to Python with 100 knocks of language processing
A server that echoes data POSTed with flask / python
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Python] A convenient library that converts kanji to hiragana
I tried to analyze J League data with Python
[Introduction to Python] Basic usage of the library matplotlib
[Chapter 4] Introduction to Python with 100 knocks of language processing
Easily make a TweetBot that notifies you of temperature and humidity with Raspberry Pi + DHT11.
[Python] Extracts data frames that do not match a specific column with other data frames of Pandas
[Introduction to cx_Oracle] (Part 6) DB and Python data type mapping
A memo that reads data from dashDB with Python & Spark
Published a library that hides character data in Python images
How to shuffle a part of a Python list (at random.shuffle)
A collection of competitive pro techniques to solve with Python
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
An introduction to Python that even monkeys can understand (Part 3)
[Introduction to Data Scientists] Basics of Python ♬ Functions and classes
[Raspi4; Introduction to Sound] Stable recording of sound input with python ♪
An introduction to Python that even monkeys can understand (Part 1)
An introduction to Python that even monkeys can understand (Part 2)
A python script that converts Oracle Database data to csv
[Introduction to Python] How to get data with the listdir function
[Python] About creating a tool to create a new Outlook email based on the data of the JSON file and the part that got caught
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part1-
If you want to become a data scientist, start with Kaggle
IPynb scoring system made with TA of Introduction to Programming (Python)
A note on what you did to use Flycheck with Python
I want to use a wildcard that I want to shell with Python remove
[Introduction to Python] How to split a character string with the split function
Use networkx, a library that handles graphs in python (Part 2: Tutorial)
A story that struggled to handle the Python package of PocketSphinx
Created a library for python that can easily handle morpheme division
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
I tried to create a list of prime numbers with python
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Try to analyze online family mahjong using Python (PART 1: Take DATA)