Identify what is in what position in which frame of the video. Use google's Video Intelligence API to detect objects from videos. The code presented in this article is based on the Official Getting Started Guide.
Video Intelligence API [Authentication for API] in the official guide (https://cloud.google.com/video-intelligence/docs/how-to?hl=ja#api-%E3%81%AB%E5%AF%BE%E3% 81% 99% E3% 82% 8B% E8% AA% 8D% E8% A8% BC) and get the service account key file.
google colaboratory Use google colaboratory to implement and check results.
[This article](https://qiita.com/sosuke/items/533909d31244f986ad47#%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%82%A2 % E3% 82% AB% E3% 82% A6% E3% 83% B3% E3% 83% 88% E3% 81% AE% E8% AA% 8D% E8% A8% BC% E6% 83% 85% E5 % A0% B1% E3% 82% 92% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 83% AD% E3% 83% BC% E3% 83% 89% E3% 81 Upload the service account key file to colaboratory as shown in% 99% E3% 82% 8B). If you don't want to do it every time, you can put it in the google drive that you mount, but be careful not to accidentally share it.
With the Video Intelligence API
--Video files saved in GCP Storage --Video files saved locally
Can be analyzed. If you want to use a video file saved in GCP Storage, you will be charged a usage fee for Storage in addition to the API fee, so local is recommended if you just want to give it a try.
This time, we will use the method of analyzing "video files saved locally", so save the video you want to analyze in google drive and mount the drive in colaboratory. The drive can be mounted from the left pane of the colaboratory.
When preparing a video, consider the Usage fee of the Video Intelligence API.
The Video Intelligence API is billed according to the length (length) of the video being analyzed. The length is calculated in minutes, and less than 1 minute is rounded up, so if you annotate with the following three patterns, the usage fee will be the same.
The price of each annotation is as follows
function | First 1,000 minutes | 1,Over 000 minutes |
---|---|---|
Label detection | free | $0.10/Minutes |
Shot detection | free | $0.05/Minutes, free if using label detection |
Inappropriate content detection | free | $0.10/Minutes |
Voice character conversion | free | $0.048/Minutes (Voice-character conversion is charged for en of supported languages-US only) |
Object tracking | free | $0.15/Minutes |
Text detection | free | $0.15/Minutes |
Logo detection | free | $0.15/Minutes |
Celebrity recognition | free | $0.10/Minutes |
!pip install -U google-cloud-videointelligence
First, [Authentication for API](https://cloud.google.com/video-intelligence/docs/how-to?hl=ja#api-%E3%81%AB%E5%AF%BE%E3%81 Authenticate using the service account key file obtained in% 99% E3% 82% 8B% E8% AA% 8D% E8% A8% BC).
service_account_key_name
is the path to the service account key file uploaded to colaboratory.
import json
from google.cloud import videointelligence
from google.oauth2 import service_account
#API authentication
service_account_key_name = "{YOUR KEY.json}"
info = json.load(open(service_account_key_name))
creds = service_account.Credentials.from_service_account_info(info)
#Create client
video_client = videointelligence.VideoIntelligenceServiceClient(credentials=creds)
First, load the video from the drive.
#Specify the video to be processed and load it
import io
path = '{YOUR FILE PATH}'
with io.open(path, 'rb') as file:
input_content = file.read()
Then run the API and get the result
features = [videointelligence.enums.Feature.OBJECT_TRACKING]
timeout = 300
operation = video_client.annotate_video(input_content=input_content, features=features, location_id='us-east1')
print('\nProcessing video for object annotations.')
result = operation.result(timeout=timeout)
print('\nFinished processing.\n')
The jupyter notebook draws the pandas DataFrame nicely, so [Response](https://cloud.google.com/video-intelligence/docs/object-tracking?hl=ja#vision-object- Extract only the necessary information from tracking-gcs-protocol) and generate a DataFrame.
This time, get the following from ʻobject_annotations` of the response.
Column name | Contents | Source |
---|---|---|
Description | Object description (name) | entity.description |
Confidence | Detection reliability | confidence |
SegmentStartTime | Start time of the segment in which the object appears | segment.start_time_offset |
SegmentEndTime | End time of the segment in which the object appears | segment.end_time_offset |
FrameTime | How many seconds from the beginning of the video the frame in which the object was detected is | frames[i].time_offset |
Box{XXX} | A 100% fraction of the coordinates of each side of the object's bounding box | frames[i].normalized_bounding_box |
#List the detected objects
import pandas as pd
columns=['Description', 'Confidence', 'SegmentStartTime', 'SegmentEndTime', 'FrameTime', 'BoxLeft', 'BoxTop', 'BoxRight', 'BoxBottom', 'Box', 'Id']
object_annotations = result.annotation_results[0].object_annotations
result_table = []
for object_annotation in object_annotations:
for frame in object_annotation.frames:
box = frame.normalized_bounding_box
result_table.append([
object_annotation.entity.description,
object_annotation.confidence,
object_annotation.segment.start_time_offset.seconds + object_annotation.segment.start_time_offset.nanos / 1e9,
object_annotation.segment.end_time_offset.seconds + object_annotation.segment.end_time_offset.nanos / 1e9,
frame.time_offset.seconds + frame.time_offset.nanos / 1e9,
box.left,
box.top,
box.right,
box.bottom,
[box.left, box.top, box.right, box.bottom],
object_annotation.entity.entity_id
])
#Since it will be huge, only the first frame of each segment for the time being
break
df=pd.DataFrame(result_table, columns=columns)
pd.set_option('display.max_rows', len(result_table))
#Sort and display by Confidence
df.sort_values('Confidence', ascending=False)
When executed, the following results will be obtained.
First, the frame is extracted from the video based on the above time_offset
information.
Use ʻopenCVto get a still image from a video. Since it is necessary to specify the still image to be cut out in frames, the approximate number of frames is calculated from the FPS of the video and
time_offset` (seconds).
import cv2
images = []
cap = cv2.VideoCapture(path)
if cap.isOpened():
fps = cap.get(cv2.CAP_PROP_FPS)
for sec in df['FrameTime']:
#Calculate the number of frames from fps and seconds
cap.set(cv2.CAP_PROP_POS_FRAMES, round(fps * sec))
ret, frame = cap.read()
if ret:
images.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
Next, add a frame to the object shown in each still image cut out by rectangle
of ʻopenCV. Since
rectangleneeds to specify the upper left and lower right vertices of the rectangle you want to draw, you need to get these two points. The
normalized_bounding_box returned from the API contains information on four sides (
left,
top,
right,
bottom). For example, in the box shown below, the position of
personis shown. The value of
left is
l (width from the left edge of the image to the left edge of the box) / width (width of the entire image) , so when calculating the x coordinate of vertex 1 (
pt1) from the value of
leftYou can calculate back with
width * left`.
Prepare the method appropriately.
#Find the coordinates on the image
def ratio_to_pics(size_pics, ratio):
return math.ceil(size_pics * ratio)
#Get the top left and bottom right vertices from the box
def rect_vertex(image, box):
height, width = image.shape[:2]
return[
(
ratio_to_pics(width, box[0]), ratio_to_pics(height, box[1])
),
(
ratio_to_pics(width, box[2]), ratio_to_pics(height, box[3])
)
]
While calculating the position of the apex of the frame using the above method, the frame is actually written in the image.
boxed_images = []
color = (0, 255, 255)
thickness = 5
for index, row in df.iterrows():
image = images[index]
boxed_images.append(cv2.rectangle(image, *rect_vertex(image, row.Box), color, thickness = thickness))
Finally, each image is displayed with Description and Confidence. Depending on the length of the video and the number of detected objects, it takes time to display all of them, so a threshold is set for Confidence.
import math
import matplotlib.pyplot as plt
#Cut off appropriately with confidence
min_confidence = 0.7
#Set various figures
col_count = 4
row_count = math.ceil(len(images) / col_count)
fig = plt.figure(figsize = (col_count * 4, row_count * 3), dpi = 100)
num = 0
#Display still images side by side
for index, row in df.iterrows():
if row.Confidence < min_confidence:
continue
num += 1
fig.add_subplot(row_count, col_count, num, title = '%s : (%s%s)' % (row.Description, round(row.Confidence * 100, 2), '%'))
plt.imshow(boxed_images[index], cmap='gray')
plt.axis('off')
When executed, the following results will be obtained.
[Display the result of video analysis using Cloud Video Intelligence API from Colaboratory. ](Https://qiita.com/sosuke/items/533909d31244f986ad47#%E3%82%B5%E3%83%BC%E3%83%93%E3%82%B9%E3%82%A2%E3%82 % AB% E3% 82% A6% E3% 83% B3% E3% 83% 88% E3% 81% AE% E8% AA% 8D% E8% A8% BC% E6% 83% 85% E5% A0% B1 % E3% 82% 92% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 83% AD% E3% 83% BC% E3% 83% 89% E3% 81% 99% E3 % 82% 8B) Official Getting Started Guide
Recommended Posts