I started studying OpenCV image recognition and wanted to make my own cascade classifier, but I can't help computing with my own PC. When I was thinking about that, I learned about the GPU provided by google Colaboratory and decided to take over the calculation.

0. This goal

Create a cascade classifier that recognizes the violin family from the photos below using google colaboratory. Ideally, you should be able to distinguish between violin, viola, cello, and contrabass, but it's hard to demand that even someone (who doesn't know much about classical music) do something difficult, so the hurdle is low. are doing.

1. Preparation

1-1. Preparation of google colab environment 1-2. Preparation of image data 1-3. Download mergevec.py

1-1. Preparation of google colab environment

goole colab creates a new notebook in any folder on google drive. This time I will work in / drive (My Drive) / Colab Notebooks / opencv / instruments /.

The structure of the instruments folder is as follows.

`instruments`


<DIR>  pos          #Save the correct image
<DIR>  vec          #Create a vec file from the correct image and save it
<DIR>  neg          #Save incorrect image
<DIR>  neglist      #Create and save a list from incorrect images
<DIR>  cascade      #Output training results
       mergevec.py  #Used to combine vec files
       train_flow   #Create a new Colab notebook

1-2. Preparation of image data

** Correct image **

Pick it up yourself or from the net. This time, I prepared 6 processed jpg files of 200 x 300 pixels and stored them in pos.

** Incorrect image **

A lot of color images for the time being. This time, I downloaded "2017 Val images" from 50 free Machine Learning Datasets: Image Datasets. A total of about 8000 sheets are prepared.

This is stored in neg, but it takes time to upload and it uses more than 1GB of drive capacity, so it is recommended to reduce the amount depending on the situation.

1-3. Download mergevec.py

Download from github maergevec page.

2. Notebook creation

Create a new colab note called train_flow and create the contents. I want to use GPU, so set the notebook from the edit tab. The calculation speed may differ depending on the type of GPU assigned, but it is omitted here.

2-1. Environmental change 2-2. Create correct answer file 2-3. Incorrect file creation

2-1. Environmental change

There are three things you have to do. ・ Down version of openCV ・ Drive mount -Change the current directory

** Down version of openCV **

Since openCV4, which is installed as standard in colab, cannot use create samples and train cascade described later, downgrade to openCV3.

`Cell 1`


#uninstall openCV
#Twice'y'Enter
!pip3 uninstall opencv-python opencv-contrib-python

`Cell 2`


#install openCV
!pip3 install opencv-python==3.4.4.19 opencv-contrib-python==3.4.4.19
#Session restart
exit()

** drive mount **

Do the following to input and output data between colab notes and drive

`Cell 3`


from google.colab import drive
drive.mount('/content/drive')

** Change current directory **

Set according to your directory structure. Here, instruments are set to the current directory.

`Cell 4`


import os
os.chdir('/content/drive/My Drive/Colab Notebooks/opencv/instruments')

3. Preparation of training file

Create training files from the correct and incorrect images.

3-1. Creating a correct answer data file 3-2. Creating an incorrect data file

3-1. Creating a correct answer data file

Mass-produce correct answer data using openCV createsamples and save it as a separate file. There are two methods.

** How to specify the image directly ** Create 1000 correct answer data from 1 image with the following command. The width and height are adjusted to the aspect ratio of the correct image, but I'm not sure if it's effective.

`Cell 5-1`


#Create 1000 correct answer data from images and output as vec file
!opencv_createsamples -img pos/Va001.jpg -vec vec/Va001.vec -num 1000 -w 40 -h 60

`Out`


Info file name: (NULL)
Img file name: drive/My Drive/Colab Notebooks/opencv/instruments/pos/Va006.jpg
Vec file name: drive/My Drive/Colab Notebooks/opencv/instruments/vec/Va006.vec
BG  file name: (NULL)
Num: 1000
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 40
Height: 60
Max Scale: -1
Create training samples from single image applying distortions...
Done

** How to specify a list of images **

It didn't work because my method was wrong, but I'll write it down. As shown below, specify the location of the correct answer file, specify the number of correct answer objects, and put the list of the correct answer object coordinates (x-axis start point, y-axis start point, x-axis end point, y-axis end point) in the pos folder. Have it ready.

`poslist.txt`


#Created in pos folder
#File name Number Coordinates
pos/Va001.jpg 1 0 0 200 300
pos/Va002.jpg 1 0 0 200 300
pos/Va003.jpg 1 0 0 200 300
pos/Va004.jpg 1 0 0 200 300
pos/Va005.jpg 1 0 0 200 300
pos/Va006.jpg 1 0 0 200 300

`Cell 5-2`


!opencv_createsamples -info pos/poslist.txt -vec vec/pos.vec -num 6000 -w 40 -h 60

I expected that this would create a total of 6000 correct answer data from the images written in txt, but it didn't work on Colaboratory.

Although it is not impossible to learn by a method like cell 5-1, it seems difficult to bring the original image to sufficient recognition accuracy with one image. After investigating how to deal with it, it seems that there is a code that combines multiple vec files.

** Combine multiple vec files together **

After creating each vec file from the images placed in the pos folder as in cell 5-1, execute mergevec.py under instruments.

`Cell 6`


!opencv_createsamples -img pos/Va002.jpg -vec vec/Va002.vec -num 1000 -w 40 -h 60
!opencv_createsamples -img pos/Va003.jpg -vec vec/Va003.vec -num 1000 -w 40 -h 60
!opencv_createsamples -img pos/Va004.jpg -vec vec/Va004.vec -num 1000 -w 40 -h 60
!opencv_createsamples -img pos/Va005.jpg -vec vec/Va005.vec -num 1000 -w 40 -h 60
!opencv_createsamples -img pos/Va006.jpg -vec vec/Va006.vec -num 1000 -w 40 -h 60

`Cell 7`


#Combine the data in the vec folder and pos.Save as vec
!python mergevec.py -v vec -o vec/pos.vec

You have now created a 1000x6 correct image file.

3-2. Creating an incorrect data file

neg You need to create nglist.txt with the location of the image file.

`Cell 8`


#nglist the contents of the neg folder.Export to txt
!ls neg | xargs -I {} echo neg/{} > neglist/nglist.txt

`neglist`


neg/000000000139.jpg
neg/000000000285.jpg
neg/000000000632.jpg

...

This completes all preparations.

4. Creating a classifier

Execute the following code.

`Cell 9`


!opencv_traincascade -data cascade -vec vec/pos.vec -bg neglist/nglist.txt -numPos 5500 -numNeg 3000 -numStages 20 -featureType LBP -w 40 -h 60

numPos: Number of images used for the correct answer. If all the images are used up, learning may be interrupted, so set it a little. numNeg: It seems good if the correct and incorrect answers are about 2: 1. featureType: HAAR, LBP, etc. can be selected. HAAR is overwhelmingly time consuming. w, h: Set to the same value as when the correct answer file was created.

5. Execution history

The colab GPU runs continuously for up to 12 hours. I tried to calculate it overnight, but the calculation was completed only until the 14/20 stage, and it was an exponential increase that doubled the calculation time for each stage, so I decided that it was necessary to adjust the parameters.

`Out`



PARAMETERS:
cascadeDirName: cascade/trained_data/
vecFileName: vec/pos.vec
bgFileName: neglist/nglist.txt
numPos: 5500
numNeg: 3000
numStages: 20
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: LBP
sampleWidth: 40
sampleHeight: 60
boostType: GAB
minHitRate: 0.995
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
Number of unique features given windowSize [40,60] : 153400

===== TRAINING 0-stage =====
<BEGIN
POS count : consumed   5500 : 5500
NEG count : acceptanceRatio    3000 : 1
tcmalloc: large alloc 1073758208 bytes == 0x5650ef23e000 @  0x7f54c034f1e7 0x7f54bf549382 0x7f54bf64821b 0x5650e5fc5608 0x5650e5fc5d42 0x5650e5fc5e1a 0x5650e5fcf1a9 0x5650e5fbbfff 0x7f54be80cb97 0x5650e5fbcc1a
Precalculation time: 20
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2| 0.997818| 0.225333|
+----+---------+---------+
END>
Training until now has taken 0 days 0 hours 5 minutes 21 seconds.

...

===== TRAINING 14-stage =====
<BEGIN
POS count : consumed   5500 : 5725
NEG count : acceptanceRatio    3000 : 2.41651e-06
Precalculation time: 17
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3| 0.998364| 0.712667|
+----+---------+---------+
|   4| 0.997455|    0.632|
+----+---------+---------+
|   5| 0.996545|    0.449|
+----+---------+---------+
END>
Training until now has taken 0 days 6 hours 59 minutes 8 seconds.

Therefore, when numStage was set to 15, the learning was completed because the accuracy was sufficiently improved without reaching stage15, and the required time was a little over an hour. Strange.

6. Evaluation

It seems that colab cannot display images with openCV, so work locally. Of the multiple files output to cascade, you can download cascade.xml.

`Cell 10`


import cv2
img = cv2.imread('instruments.jpg')
cascade = cv2.CascadeClassifier('cascade.xml')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

#Consider changing minSize when recognition accuracy is poor
Va = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=3, minSize=(140,210))

#Mark the target with a red frame
for(x,y,w,h) in Va:
  cv2.rectangle(img, (x,y), (x+w, y+h), (0,0,200),3)

#result.Output jpg
cv2.imwrite("result.jpg ", img)
#Display image in a separate window
cv2.imshow('image', img)
cv2.waitKey(0)

The output result is as follows. It's a pity that the cello was judged in two places and that only the contrabass was out of the group.

When I trained with 2000 incorrect images, it became as follows. Cymbal euphonium? I found it interesting that the curves between the spaces and the viola cello look like musical instruments. It seems that it makes sense to increase the number of incorrect images.

7. Summary

・ I was able to create a classifier to recognize violin instruments. ・ I learned how to create a classifier with colabratory. ・ I have vaguely understood the calculation method and parameters of image recognition.

Sites that I referred to

Studying OpenCV ③ (Create a classifier) (https://qiita.com/takanorimutoh/items/5bd88f3d17239a147581)

Learn OpenCV Cascade classifier with multiple pos images (https://pfpfdev.hatenablog.com/entry/20200715/1594799186)

[PYTHON] Make a cascade classifier with google colaboratory

0. This goal

1. Preparation

1-1. Preparation of google colab environment

instruments

1-2. Preparation of image data

1-3. Download mergevec.py

2. Notebook creation

2-1. Environmental change

Cell 1

Cell 2

Cell 3

Cell 4

3. Preparation of training file

3-1. Creating a correct answer data file

Cell 5-1

Out

poslist.txt

Cell 5-2

Cell 6

Cell 7

3-2. Creating an incorrect data file

Cell 8

neglist

4. Creating a classifier

Cell 9

5. Execution history

Out

6. Evaluation

Cell 10

7. Summary

Sites that I referred to

`instruments`

`Cell 1`

`Cell 2`

`Cell 3`

`Cell 4`

`Cell 5-1`

`Out`

`poslist.txt`

`Cell 5-2`

`Cell 6`

`Cell 7`

`Cell 8`

`neglist`

`Cell 9`

`Out`

`Cell 10`