Introduction

Wouldn't it be fun if you could easily "* recognize things` **" with AI using a webcam? You can do that ** easily ** by using the published model. Let's do it now!

What to do specifically

Capture the image from the webcam to the PC, let AI ** real-time recognition ** what is reflected there, and display up to TOP3 on the screen. This time, we will use ** trained model **, so there is no time-consuming AI learning, so you can play ** quickly **.

Development environment

It uses a library called ʻOpenCV to capture image data from a webcam and an AI library called keras` to identify the image data. Install the required library packages.

What you need	Remarks column
Note PC with webcam, etc.	A webcam connected to a PC via USB is also OK
development language	Python3.7 * The version used is 3.7.7
Main required libraries	【 OpenCV 】 Library for processing images and videos * The version used is 4.3.0 【 keras 】 Python language neural network library * The version used is 2.3.1

DenseNet121

I'm doing it because it's easy Suddenly called DenseNet121, Wakewakaran! No more!

It's okay. calm down please. This is a ** trained model **, which means that we will use a training model called DenseNet121 this time. You can do it without knowing the details!

――Why did you choose the DenseNet121 model?

Various image classification models such as VGG16 and ResNet50 can be easily used from the keras library, but the image classification model used this time is ** model size is relatively small at 33MB. I chose DenseNet121 because it has a good recognition rate **. According to "Keras Documentation", if you recognize things with DenseNet121, the recognition accuracy rate up to TOP5 will be about 92%. (Approximately 75% for TOP1 only) FireShot Capture 003 - Applications - Keras Documentation - keras.io.png

Source: Keras Documentation https://keras.io/ja/applications/#documentation-for-individual-models

AI program

After installing the package of the library to be used (keras, opencv, etc.), copy the following AI program.

`main.py`


# -------------------------------------------------------------------------------------
#Display the camera on the screen.
#Image judgment with DenseNet121
# [+]Change Camera Device with key
# [s]Save image with key
# [ESC] or [q]End with key
# -------------------------------------------------------------------------------------
from keras.applications.densenet import DenseNet121
from keras.applications.densenet import preprocess_input, decode_predictions
from keras.preprocessing import image
import numpy as np
import cv2
import datetime


# -------------------------------------------------------------------------------------
# capture_device
# -------------------------------------------------------------------------------------
def capture_device(capture, dev):

    while True:

        #Capture image from camera device
        ret, frame = capture.read()
        if not ret:
            k = ord('+')
            return k

        #DenseNet121 image judgment
        resize_frame = cv2.resize(frame, (300, 224))            # 640x480(4:3) -> 300x224(4:3)Image resizing
        trim_x, trim_y = int((300-224)/2), 0                    #Trimmed to 224x224 for judgment
        trim_h, trim_w = 224, 224
        trim_frame = resize_frame[trim_y : (trim_y + trim_h), trim_x : (trim_x + trim_w)]
        x = image.img_to_array(trim_frame)
        x = np.expand_dims(x, axis=0)
        x = preprocess_input(x)
        preds = model.predict(x)                                #Image AI judgment

        # Usage
        disp_frame = frame
        txt1 = "model is DenseNet121"
        txt2 = "camera device No.(" + str(dev) + ")"
        txt3 = "[+] : Change Device"
        txt4 = "[s] : Image Capture"
        txt5 = "[ESC] or [q] : Exit"

        cv2.putText(disp_frame, txt1, (10,  30), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, txt2, (10,  60), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, txt3, (10,  90), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, txt4, (10, 120), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, txt5, (10, 150), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)

        #Image judgment character output
        output1 = 'No.1:{0}:{1}%'.format(decode_predictions(preds, top=3)[0][0][1],
                                         int(decode_predictions(preds, top=3)[0][0][2] * 100))
        output2 = 'No.2:{0}:{1}%'.format(decode_predictions(preds, top=3)[0][1][1],
                                         int(decode_predictions(preds, top=3)[0][1][2] * 100))
        output3 = 'No.3:{0}:{1}%'.format(decode_predictions(preds, top=3)[0][2][1],
                                         int(decode_predictions(preds, top=3)[0][2][2] * 100))

        cv2.putText(disp_frame, output1, (10, 300), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, output2, (10, 330), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
        cv2.putText(disp_frame, output3, (10, 360), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)

        #Camera screen output
        cv2.imshow('camera', disp_frame)

        #Wait 1msec and get the key
        k = cv2.waitKey(1) & 0xFF

        # [ESC] or [q]Continue to display on the screen until is pressed
        if (k == ord('q')) or (k == 27):
            return k

        # [+]Change device with
        if k == ord('+'):
            txt = "Change Device. Please wait... "
            XX = int(disp_frame.shape[1] / 4)
            YY = int(disp_frame.shape[0] / 2)
            cv2.putText(disp_frame, txt, (XX, YY), cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
            cv2.imshow('camera', disp_frame)
            cv2.waitKey(1) & 0xFF
            return k

        # [s]Save the image displayed on the screen with
        elif k == ord('s'):
            cv2.imwrite('camera_dsp{}.{}'.format(datetime.datetime.now().strftime('%Y%m%d_%H%M%S_%f'), "png"), disp_frame)
#           cv2.imwrite('camera_rsz{}.{}'.format(datetime.datetime.now().strftime('%Y%m%d_%H%M%S_%f'), "png"), resize_frame)
#           cv2.imwrite('camera_trm{}.{}'.format(datetime.datetime.now().strftime('%Y%m%d_%H%M%S_%f'), "png"), trim_frame)
#           cv2.imwrite('camera_raw{}.{}'.format(datetime.datetime.now().strftime('%Y%m%d_%H%M%S_%f'), "png"), frame)


# -------------------------------------------------------------------------------------
# camera
# -------------------------------------------------------------------------------------
def camera(dev):

    while True:

        capture = cv2.VideoCapture(dev)
        ret = capture_device(capture, dev)

        if (ret == ord('q')) or (ret == 27):
            #Resource release
            capture.release()
            cv2.destroyAllWindows()
            break

        if ret == ord('+'):
            dev += 1

            if dev == 9:
                dev = 0


# -------------------------------------------------------------------------------------
# main
# -------------------------------------------------------------------------------------
# ●DenseNet121
# https://keras.io/ja/applications/#densenet
#
#By running DenseNet121
# (1)DenseNet121 model,(2)Two of the classification files will be downloaded automatically.
#Therefore, at the first startup, the DenseNet 121 model, which is about 33MB, and the classification file
#It takes a long time to start up because it needs to be downloaded,
#After the second startup, the download will be omitted, so the startup will be faster.
#
#The download file is stored in the following directory.
# 「C:/Users/xxxx/.keras/models/」
#
# (1)Model of DenseNet121: DenseNet121_weights_tf_dim_ordering_tf_kernels.h5
# (2)Classification file(All 1000 categories)：imagenet_class_index.json

#Image classification model
model = DenseNet121(weights='imagenet')

#Camera activation
camera(dev=0)

AI program execution

At first startup

By executing the DenseNet121 library function, both the DenseNet121 model and the classification file will be downloaded automatically. Therefore, it takes a long time to start up because it is necessary to download the DenseNet121 model and class classification file, which are about 33MB, at the first startup, but after the second startup, the download is omitted and the startup is faster.

File storage location C:/Users/xxxx/.keras/models/

-** DenseNet121 model ** (DenseNet121_weights_tf_dim_ordering_tf_kernels.h5) -** Class classification file ** (imagenet_class_index.json)

When the DL of the model is over

If the window opens like this, the webcam image is output, and the TOP3 information recognized by AI is displayed on the screen, it is successful. By the way, the result of showing our dog (Toy Poodle) says that AI will be a Toy Poodle with a probability of 76% as shown below, so the recognition of AI will be correct.

** [Example of results] AI recognition rate TOP3 **

toy_poodle : 76%

miniature_poodle : 20%

Dandie_Dinmont : 1%

The recognition rate fluctuates in real time.

AI program execution log

The download log below is displayed only for the first time.

C:\Users\xxxx\anaconda3\envs\python37\python.exe C:/Users/xxxx/PycharmProjects/OpenCV/sample09.py
Using TensorFlow backend.
2020-08-12 10:38:59.579123: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
Downloading data from https://github.com/keras-team/keras-applications/releases/download/densenet/densenet121_weights_tf_dim_ordering_tf_kernels.h5

    8192/33188688 [..............................] - ETA: 12:39
   16384/33188688 [..............................] - ETA: 8:26 
   40960/33188688 [..............................] - ETA: 5:03
  106496/33188688 [..............................] - ETA: 2:59
  245760/33188688 [..............................] - ETA: 1:42

~ Omitted ~

32743424/33188688 [============================>.] - ETA: 0s
32776192/33188688 [============================>.] - ETA: 0s
32956416/33188688 [============================>.] - ETA: 0s
33005568/33188688 [============================>.] - ETA: 0s
33193984/33188688 [==============================] - 32s 1us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json

 8192/35363 [=====>........................] - ETA: 0s
40960/35363 [==================================] - 0s 0us/step
[ WARN:0] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (436) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback

Process finished with exit code 0

Identifiable class

If you refer to the "ʻimagenet_class_index.json" file, you can see that the image data can be classified into ** 1000 classes ** from No. 0 to 999 below. It is also a ** restriction **, but even if you recognize something that is not described here, it will be classified as one of these. If you want to classify things that are not here, please create a new model or check transfer learning, fine tuning`, etc.

`imagenet_class_index.json`


{
	"0": ["n01440764", "tench"],
	"1": ["n01443537", "goldfish"],
	"2": ["n01484850", "great_white_shark"],
	"3": ["n01491361", "tiger_shark"],
	"4": ["n01494475", "hammerhead"],
	"5": ["n01496331", "electric_ray"],
	"6": ["n01498041", "stingray"],
	"7": ["n01514668", "cock"],
	"8": ["n01514859", "hen"],
	"9": ["n01518878", "ostrich"],

~ Omitted ~

	"990": ["n12768682", "buckeye"],
	"991": ["n12985857", "coral_fungus"],
	"992": ["n12998815", "agaric"],
	"993": ["n13037406", "gyromitra"],
	"994": ["n13040303", "stinkhorn"],
	"995": ["n13044778", "earthstar"],
	"996": ["n13052670", "hen-of-the-woods"],
	"997": ["n13054560", "bolete"],
	"998": ["n13133613", "ear"],
	"999": ["n15075141", "toilet_tissue"]
}

that's all

Thank you for your hard work!

[Easy] AI automatic recognition with a webcam!