[PYTHON] Detect video objects with Video Intelligence API

things to do

Identify what is in what position in which frame of the video. Use google's Video Intelligence API to detect objects from videos. The code presented in this article is based on the Official Getting Started Guide.

Preparation

Video Intelligence API [Authentication for API] in the official guide (https://cloud.google.com/video-intelligence/docs/how-to?hl=ja#api-%E3%81%AB%E5%AF%BE%E3% 81% 99% E3% 82% 8B% E8% AA% 8D% E8% A8% BC) and get the service account key file.

google colaboratory Use google colaboratory to implement and check results.

[This article](https://qiita.com/sosuke/items/533909d31244f986ad47#%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%82%A2 % E3% 82% AB% E3% 82% A6% E3% 83% B3% E3% 83% 88% E3% 81% AE% E8% AA% 8D% E8% A8% BC% E6% 83% 85% E5 % A0% B1% E3% 82% 92% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 83% AD% E3% 83% BC% E3% 83% 89% E3% 81 Upload the service account key file to colaboratory as shown in% 99% E3% 82% 8B). If you don't want to do it every time, you can put it in the google drive that you mount, but be careful not to accidentally share it.

Image to be analyzed

With the Video Intelligence API

--Video files saved in GCP Storage --Video files saved locally

Can be analyzed. If you want to use a video file saved in GCP Storage, you will be charged a usage fee for Storage in addition to the API fee, so local is recommended if you just want to give it a try.

This time, we will use the method of analyzing "video files saved locally", so save the video you want to analyze in google drive and mount the drive in colaboratory. The drive can be mounted from the left pane of the colaboratory. スクリーンショット 2020-03-29 11.28.34.png

When preparing a video, consider the Usage fee of the Video Intelligence API.

[Supplement] Video Intelligence API usage fee (as of March 2020)

The Video Intelligence API is billed according to the length (length) of the video being analyzed. The length is calculated in minutes, and less than 1 minute is rounded up, so if you annotate with the following three patterns, the usage fee will be the same.

  1. One video of 2 minutes and 30 seconds
  2. One 1 minute 30 second video and one 30 second video
  3. Three 30-second videos

The price of each annotation is as follows

function First 1,000 minutes 1,Over 000 minutes
Label detection free $0.10/Minutes
Shot detection free $0.05/Minutes, free if using label detection
Inappropriate content detection free $0.10/Minutes
Voice character conversion free $0.048/Minutes (Voice-character conversion is charged for en of supported languages-US only)
Object tracking free $0.15/Minutes
Text detection free $0.15/Minutes
Logo detection free $0.15/Minutes
Celebrity recognition free $0.10/Minutes

Implementation

Ready to use video intelligence

Install the video intelligence client

!pip install -U google-cloud-videointelligence

Create a videointelligence client

First, [Authentication for API](https://cloud.google.com/video-intelligence/docs/how-to?hl=ja#api-%E3%81%AB%E5%AF%BE%E3%81 Authenticate using the service account key file obtained in% 99% E3% 82% 8B% E8% AA% 8D% E8% A8% BC). service_account_key_name is the path to the service account key file uploaded to colaboratory.

import json
from google.cloud import videointelligence
from google.oauth2 import service_account

#API authentication
service_account_key_name = "{YOUR KEY.json}"
info = json.load(open(service_account_key_name))
creds = service_account.Credentials.from_service_account_info(info)

#Create client
video_client = videointelligence.VideoIntelligenceServiceClient(credentials=creds)

Run the API

First, load the video from the drive.

#Specify the video to be processed and load it
import io

path = '{YOUR FILE PATH}'
with io.open(path, 'rb') as file:
    input_content = file.read()

Then run the API and get the result

features = [videointelligence.enums.Feature.OBJECT_TRACKING]
timeout = 300
operation = video_client.annotate_video(input_content=input_content, features=features, location_id='us-east1')

print('\nProcessing video for object annotations.')
result = operation.result(timeout=timeout)
print('\nFinished processing.\n')

Check the result

Display a list of detected objects

The jupyter notebook draws the pandas DataFrame nicely, so [Response](https://cloud.google.com/video-intelligence/docs/object-tracking?hl=ja#vision-object- Extract only the necessary information from tracking-gcs-protocol) and generate a DataFrame.

This time, get the following from ʻobject_annotations` of the response.

Column name Contents Source
Description Object description (name) entity.description
Confidence Detection reliability confidence
SegmentStartTime Start time of the segment in which the object appears segment.start_time_offset
SegmentEndTime End time of the segment in which the object appears segment.end_time_offset
FrameTime How many seconds from the beginning of the video the frame in which the object was detected is frames[i].time_offset
Box{XXX} A 100% fraction of the coordinates of each side of the object's bounding box frames[i].normalized_bounding_box
#List the detected objects
import pandas as pd

columns=['Description', 'Confidence', 'SegmentStartTime', 'SegmentEndTime', 'FrameTime', 'BoxLeft', 'BoxTop', 'BoxRight', 'BoxBottom', 'Box', 'Id']
object_annotations = result.annotation_results[0].object_annotations
result_table = []
for object_annotation in object_annotations:
    for frame in object_annotation.frames:
        box = frame.normalized_bounding_box
        result_table.append([
                object_annotation.entity.description,
                object_annotation.confidence,
                object_annotation.segment.start_time_offset.seconds + object_annotation.segment.start_time_offset.nanos / 1e9,
                object_annotation.segment.end_time_offset.seconds + object_annotation.segment.end_time_offset.nanos / 1e9,
                frame.time_offset.seconds + frame.time_offset.nanos / 1e9,
                box.left,
                box.top,
                box.right,
                box.bottom,
                [box.left, box.top, box.right, box.bottom],
                object_annotation.entity.entity_id
        ])
        #Since it will be huge, only the first frame of each segment for the time being
        break

df=pd.DataFrame(result_table, columns=columns)
pd.set_option('display.max_rows', len(result_table))
#Sort and display by Confidence
df.sort_values('Confidence', ascending=False)

When executed, the following results will be obtained. スクリーンショット 2020-03-29 15.41.11.png

List the frames in which the object was detected as a still image

First, the frame is extracted from the video based on the above time_offset information. Use ʻopenCVto get a still image from a video. Since it is necessary to specify the still image to be cut out in frames, the approximate number of frames is calculated from the FPS of the video andtime_offset` (seconds).

import cv2

images = []
cap = cv2.VideoCapture(path)

if cap.isOpened():
    fps = cap.get(cv2.CAP_PROP_FPS)
    for sec in df['FrameTime']:
        #Calculate the number of frames from fps and seconds
        cap.set(cv2.CAP_PROP_POS_FRAMES, round(fps * sec))
        ret, frame = cap.read()
        if ret:
            images.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

Next, add a frame to the object shown in each still image cut out by rectangle of ʻopenCV. Since rectangleneeds to specify the upper left and lower right vertices of the rectangle you want to draw, you need to get these two points. Thenormalized_bounding_box returned from the API contains information on four sides (left, top, right, bottom). For example, in the box shown below, the position of personis shown. The value ofleft is l (width from the left edge of the image to the left edge of the box) / width (width of the entire image) , so when calculating the x coordinate of vertex 1 ( pt1) from the value of leftYou can calculate back withwidth * left`. Prepare the method appropriately.

IMG_3397のコピー.jpg
#Find the coordinates on the image
def ratio_to_pics(size_pics, ratio):
    return math.ceil(size_pics * ratio)

#Get the top left and bottom right vertices from the box
def rect_vertex(image, box):
    height, width  = image.shape[:2]
    return[
        (
            ratio_to_pics(width, box[0]), ratio_to_pics(height, box[1])
        ),
        (
            ratio_to_pics(width, box[2]), ratio_to_pics(height, box[3])
        )
    ]

While calculating the position of the apex of the frame using the above method, the frame is actually written in the image.

boxed_images = []
color = (0, 255, 255)
thickness = 5
for index, row in df.iterrows():
    image = images[index]
    boxed_images.append(cv2.rectangle(image,  *rect_vertex(image, row.Box), color,  thickness = thickness))

Finally, each image is displayed with Description and Confidence. Depending on the length of the video and the number of detected objects, it takes time to display all of them, so a threshold is set for Confidence.

import math
import matplotlib.pyplot as plt

#Cut off appropriately with confidence
min_confidence = 0.7

#Set various figures
col_count = 4
row_count = math.ceil(len(images) / col_count)
fig = plt.figure(figsize = (col_count * 4, row_count * 3), dpi = 100)
num = 0

#Display still images side by side
for index, row in df.iterrows():
    if row.Confidence < min_confidence:
        continue
    num += 1
    fig.add_subplot(row_count, col_count, num, title = '%s : (%s%s)' % (row.Description, round(row.Confidence * 100, 2), '%'))
    plt.imshow(boxed_images[index], cmap='gray')
    plt.axis('off')

When executed, the following results will be obtained. スクリーンショット 2020-03-29 15.46.17.png

reference

[Display the result of video analysis using Cloud Video Intelligence API from Colaboratory. ](Https://qiita.com/sosuke/items/533909d31244f986ad47#%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%82%A2%E3%82 % AB% E3% 82% A6% E3% 83% B3% E3% 83% 88% E3% 81% AE% E8% AA% 8D% E8% A8% BC% E6% 83% 85% E5% A0% B1 % E3% 82% 92% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 83% AD% E3% 83% BC% E3% 83% 89% E3% 81% 99% E3 % 82% 8B) Official Getting Started Guide

Recommended Posts

Detect video objects with Video Intelligence API
I want to detect objects with OpenCV
Track objects in your video with OpenCV Tracker
Detect stoop with OpenCV
Playing with a user-local artificial intelligence API in Python
Manipulate S3 objects with Boto3 (high-level API and low-level API)
Extrude with Fusion360 API
Detect objects of a specific color and size with Python
Image download with Flickr API
Use Trello API with python
Trim mp4 video with python-ffmpeg
Create an API with Django
Predict candlesticks with artificial intelligence
Use Twitter API with Python
API with Flask + uWSGI + Nginx
YouTube video management with Python 3
Get information with zabbix api
Loop video loading with opencv
Web API with Python + Falcon
Play RocketChat with API / Python
Recognize red objects with python
Support yourself with Twitter API
Call the API with python3.
Use subsonic API with python3
Successful update_with_media with twitter API
Qiita API Oauth with Django
Get ranking with Rakuten API