background

I'm a gamer, so I want to make AI for games every day.

This time, I came up with the idea of creating a video analysis AI for Splatoon players.

In this article ・ Approximate content ・ Results by image classification ・ The demo video Until,

In the next article ・ We plan to post the results of the video classification model.

Thing you want to do

In terms of tasks, "action segmentation" is close. Action segmentation is a classification model for videos that predicts which action class each frame belongs to. For example, on a golf swing 1F ~ 30F "Backswing" 31F ~ 45F "Downswing" 46F ~ 65F "Follow Through" I feel like.

We will do this with game data.

To put it simply, it outputs the label at the bottom left of this.

A sample of Splatoon's behavior recognition (the lower left is the output label) by machine learning pic.twitter.com/eNTT5PHNoT
& mdash; itdk (@ itdk1996) December 26, 2020

Classification class

label	Action
Paint(painting)	I'm painting around. Include checks here
attack(atack)	対面している。相手をattackしている。塗りを行動は同じ
Move(moving)	イカ、あるいはヒト状態でMoveする。塗りながらのMoveを含める
Hidden(hidden)	Includes search and recovery. Same state as moving
map(map)	生きている状態でmapを開いた状態
Special(special)	Specialの使用
Super jump(super jump)	Super jump
object(object)	Rule involvement. Playing related to areas, hoko, clams, and yagura. It's easy to overlap, but it's an important factor in a match
Respawn(respawn)	Unique during death.
opening(opening)	opening
ending(ending)	ending

is.

What is difficult about this is that there are duplicate classes on the input. As shown in the figure below, the act of "inking out" has the purpose of "attacking," "painting," or "being involved in an object." It's action purpose segmentation, which means "get the purpose." Due to duplication, class priorities are pre-determined.

The left side has priority. For example, if an object is involved, it can be moved while looking at the map, so in that case the object label is given priority.

Also, due to these properties, there are some restrictions on classifying images by themselves. Since the image is transitioning, it will be "moving", but the state is the same as "hidden".

It is "attacking" because there are enemies, but it is quite similar to "painting".

Therefore, the image-based method has its limits. So I started working on the assumption that I would use a model for the task of action segmentation.

What would you be happy if you could do this

This doesn't make much sense in itself, but if you look at the whole video, you can make the following table.

1 video ---> Action distribution If you can do this,

・ Comparison of good people and bad people ・ Comparison of the same person's rules ・ Comparison of good and bad of the same person

You can do various things. I think that if the numerical information is removed from the video of the content of the play, it can be used for various analyzes.

data set

Purchase a capture board
Play Splatoon (26 games)
Annotate using the action segmentation annotation tool called ELAN

It takes up to 6 minutes per piece, but it took a long time to review it ...

As a result of consideration in advance, I decided to squeeze the environment to some extent and try it. ・ Rules (Gachihoko, Gachi area, etc.) are appropriately dispersed. ・ The stage is appropriate. The one that was there at that time. Biased ・ Weapons are Hero Roller and Hero Roller Betchu ・ I'm wearing a squid ninja ・ My Udemae is all X ・ 21 learning and 5 tests (1 area, 1 clam, clam, yagura, nawabari) ・ The average video is about 4 minutes.

It's a feeling. I was wondering if I could scatter the weapons, but the gears changed and the charge time of the spinner charger became complicated, so I decided to go with my own weapons for the time being.

Also, it may be allowed because it is play, but the quality and standard of annotation are slightly different between the first (r1, mp4) and the last (r20.mp4) lol

For example ・ Find a partner ·Moving ·Attack

Against Hiding → moving → attacking Or because it is a move for an attack Hiding → attacking It may be. (In addition, I thought that the special was only landing, so I used Roller Betchu for almost the first time)

Data set link

Considering whether to publish

Image-based technique

First, you have the choice of going with transfer learning or fine tuning. I can think of -Splatoon domain, too sharp. Does transfer learning not improve the distributed expression? ・ Since the number of videos is not large, it may be overfitting if fine tuning is performed. So I will verify which of these two is better

The network structure looks like this

Evaluation is tedious, so we use Accuracy. There are various evaluation methods for action segmentation, but I will think about that later.

The result of trying it is like this

model	mode	accuracy
VGG	Transfer learning	64.7
mibileNet	Transfer learning	61.6
VGG	fine tuning	62.7
mobileNet	fine tuning	59.7

By the way, fine tuning takes some time.

The transfer learning of VGG was the best. A little surprising

Also, I thought fine tuning would be better, but it wasn't.

Class	Precision
total_accuracy	0.64768
opening	0.97024
moving	0.57446
hidden	0.12668
painting	0.53247
battle	0.68719
respawn	0.93855
superjump	0.45745
object	0.22072
special	0.76923
map	0.38922
ending	0.98046

It seems that the class that was supposed to some extent is still difficult to classify.

Impressions

It's fun to make machine learning from datasets and do various things Video model will be tackled from now on

However, as I understand the current situation, the stage I have seen (Arowana Mall in the test data) seems to be qualitatively quite hit. So, it may be okay to create learning data for the time being at all stages.

Test video

youtube playlist https://www.youtube.com/playlist?list=PL0Al7LkpEHRJPkI6HZfrUP9HKv1TI_tHn

There are 5 test videos. One each for Nawabari, Area, Hoko, Yagra, and Clam Also, the stage may or may not have training data. I'm wondering if that area will be a source of consideration, and there are both to see if it will work even on a stage I have never seen. Whether it is in the training data is in the video summary section.

GitHub https://github.com/daikiclimate/action_segmentation I've put the weights and the code for the demo, so I think it can be executed if you have the video and environment. Unverified

[PYTHON] Let's make Splatoon AI! part.1