[PYTHON] Analysis of shared space usage by machine learning

Trigger

The giftee office where I work has moved on May 08, 2017.

In the new office, sofa seats, family restaurant seats, etc. Various types of shared spaces have increased. https://www.wantedly.com/companies/giftee/post_articles/64703

Therefore, I want to understand what type of shared space is used and how much. Therefore, we decided to consider how to obtain the usage status.

Consideration

The use of motion sensors and pressure sensors was also a candidate for grasping the usage status, but It seems that the number of people cannot be taken with the motion sensor, Since the pressure sensor requires a sensor for each seat, I took a picture of each shared space on a regular basis and measured the number of people in it.

Number analysis method

I decided to use a machine learning framework to get the number of people from the image.

This time ・ Learned data is open to the public ・ Easy to use I decided to use darknet from the two points.

https://pjreddie.com/darknet/

darknet

Installation

Installation is very easy, just clone from github and run make.

git clone https://github.com/pjreddie/darknet
cd darknet
make

Download the trained weight data to this and you're ready to go.

wget https://pjreddie.com/media/files/yolo.weights

Image analysis

Specify detect for the darknet option and pass the config, weight data, and target photo. When analyzing data / person.jpg included in the source, it will be as follows.

./darknet detect cfg/yolo.cfg yolo.weights data/person.jpg

The result is output in the same hierarchy with the name "predictions.png ".

The original image

person.jpg

Image after analysis

predictions.png

How to get the number of people

The analysis result of darknet is also output to the standard output as follows.

data/person.jpg: Predicted in 14.067749 seconds.
person: 86%
horse: 82%
dog: 86%

This time, I simply greped the standard output person and took the count to get the number of people in the image.

./darknet detect cfg/yolo.cfg yolo.weights data/person.jpg | grep person | wc -l

Overall flow of the system

The system created this time is roughly divided into three phases.

  1. Take a picture of the target area with iPhone at regular intervals and upload it to S3
  2. Run the analysis script with cron on a regular basis to get the number of people in the photo Insert the number of people you could get into DynamoDB
  3. Visualize DynamoDB data with re: dash

Phase 1

@ koh518 Takes a shared space at regular intervals with the iPhone app. (This time every 5 minutes) I didn't have a stand to fix the iPhone, so I put it in a mug and fixed it.

iphone_in_mug.jpg

The captured image will be uploaded to S3. The key name of S3 is "shared space name / time.jpeg " (e.g dining / 20170620131500.jpeg).

Phase 2

The analysis script started from cron retrieves the images accumulated in S3 in order and analyzes the number of people. However, since darknet does not recognize the orientation information of jpeg exif, Convert it to the correct orientation with ImageMagick's convert command.

convert iphone.jpg -auto-orient converted.png

If you do not do this, it will be analyzed sideways and the accuracy will be considerably worse.

Then pass the converted image to darknet to get the person's count.

./darknet detect cfg/yolo.cfg yolo.weights ../converted.png | grep person | wc -l

The image was analyzed as follows.

The original image

iphone.jpeg

After analysis

detected.png

It is a little difficult to understand because the purple frames overlap, but the person is correctly recognized as three people.

Insert this number into DynamoDB. Move the analyzed image to another S3 bucket. The output image after analysis is also uploaded to S3 for later verification.

This time Phase 2 is done with a python script. The source is below.

import boto3
import botocore
import subprocess
import os
import subprocess

BUCKET_NAME = os.environ["BUCKET_NAME"]
BUCKET_NAME_DONE = os.environ["BUCKET_NAME_DONE"]
DYNAMODB_REGION = os.environ["DYNAMODB_REGION"]
DYNAMODB_TABLE = os.environ["DYNAMODB_TABLE"]

s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
client = boto3.client('s3')

for obj in bucket.objects.all():
    key = obj.key
    shared_space_name, filename = key.split('/')

    # "prefix/"Because the object of is also fetched, skip
    if not filename:
      continue

    created_at, extention = filename.split('.')

    # download
    s3.Bucket(BUCKET_NAME).download_file(key, 'iphone.jpg')

    command = "convert iphone.jpg -auto-orient converted.png "
    proc = subprocess.Popen(
      command,
      shell  = True,
      stdin  = subprocess.PIPE,
      stdout = subprocess.PIPE,
      stderr = subprocess.PIPE)

    stdout_data, stderr_data = proc.communicate()

    # yolo
    command = "cd darknet;./darknet detect cfg/yolo.cfg yolo.weights ../converted.png | grep person | wc -l"
    proc = subprocess.Popen(
      command,
      shell  = True,
      stdin  = subprocess.PIPE,
      stdout = subprocess.PIPE,
      stderr = subprocess.PIPE)

    stdout_data, stderr_data = proc.communicate()

    value = int(stdout_data.decode('ascii'))

    # insert dynamo
    dynamodb = boto3.resource('dynamodb', region_name=DYNAMODB_REGION)
    table = dynamodb.Table(DYNAMODB_TABLE)

    resposne = table.put_item(
      Item = {
        'shared_space_name' : shared_space_name,
        'created_at' : created_at,
        'value': value
      }
    )

    # copy file to done bucket
    copy_source = {
      'Bucket': BUCKET_NAME,
      'Key': key
    }
    s3.meta.client.copy(copy_source, BUCKET_NAME_DONE, key)

    # delete file
    obj.delete()

    # upload image
    yolo_key = shared_space_name + '/' + created_at + '_yolo.png'
    client.upload_file('darknet/predictions.png', BUCKET_NAME_DONE, yolo_key)

Phase 3

Use re: dash to visualize DynamoDB data.

This time, the time stamp was a String type and was put in the format of YYYYMMDDhhmmss. If I get it as it is, it will not be the normal time, so I put the TIMESTAMP function on the time string to display the normal time.

Below is the DQL used for acquisition.

SCAN shared_space_name,TIMESTAMP(created_at),value FROM tbl_name

The result is a graph like the one below. redash.png

In addition, re: dash and analysis script are run as containers by putting docker in one EC2.

Task

Threshold

In the analysis, there was a rare case where an image showing only one person was identified as two people. Since darknet can specify the threshold with the threshold option, It is possible to set not to count if the probability of being a person is low. This time I moved it with the default of 25%, but adjusting this value may improve the accuracy.

Reflection of people who are not using

If a person who happens to be walking near the common space is reflected, the number of people will be measured more than it actually is. If you don't want that much accuracy, you can ignore it, If you take multiple shots every few tens of seconds in one measurement and take the minimum number of people in the picture, I think we can eliminate the number of people in the picture. (I haven't tried this yet.)

Finally

Giftee Co., Ltd. is looking for engineers. If you are interested in this system or would like to visit a new office Feel free to contact here

Recommended Posts

Analysis of shared space usage by machine learning
Judgment of igneous rock by machine learning ②
Classification of guitar images by machine learning Part 1
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Reasonable price estimation of Mercari by machine learning
Classification of guitar images by machine learning Part 2
A story about data analysis by machine learning
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 1: Data analysis)
Importance of machine learning datasets
4 [/] Four Arithmetic by Machine Learning
Predict the presence or absence of infidelity by machine learning
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Significance of machine learning and mini-batch learning
Machine learning summary by Python beginners
Machine learning algorithm (multiple regression analysis)
Machine learning ③ Summary of decision tree
Machine learning algorithm (simple regression analysis)
Machine Learning: Supervised --Linear Discriminant Analysis
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Perform morphological analysis in the machine learning environment launched by GCE
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Machine learning algorithm (generalization of linear regression)
Making Sandwichman's Tale by Machine Learning ver4
[Learning memo] Basics of class by python
Machine learning with python (2) Simple regression analysis
Example of 3D skeleton analysis by Python
[Failure] Find Maki Horikita by machine learning
Four arithmetic operations by machine learning 6 [Commercial]
Machine learning
Machine learning algorithm (implementation of multi-class classification)
[Python] First data analysis / machine learning (Kaggle)
Sentiment analysis of tweets with deep learning
<Course> Machine learning Chapter 4: Principal component analysis
Python learning memo for machine learning by Chainer Chapter 13 Basics of neural networks
Analysis of X-ray microtomography image by Python
[Machine learning] List of frequently used packages
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Python learning memo for machine learning by Chainer until the end of Chapter 2
Preprocessing in machine learning 1 Data analysis process
[Machine learning] Regression analysis using scikit learn
Judge the authenticity of posted articles by machine learning (Google Prediction API).
Machine Learning: Image Recognition of MNIST by using PCA and Gaussian Native Bayes
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
Machine learning memo of a fledgling engineer Part 1
Beginning of machine learning (recommended teaching materials / information)
Try to forecast power demand by machine learning
Python & Machine Learning Study Memo ⑤: Classification of irises
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
Python & Machine Learning Study Memo ②: Introduction of Library
Full disclosure of methods used in machine learning
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
List of links that machine learning beginners are learning
Parallel learning of deep learning by Keras and Kubernetes
Overview of machine learning techniques learned from scikit-learn
About the development contents of machine learning (Example)
Summary of evaluation functions used in machine learning
Classify machine learning related information by topic model
Improvement of performance metrix by two-step learning model
Stock price forecast by machine learning Numerai Signals
Machine learning memo of a fledgling engineer Part 2