[PYTHON] Real-time object detection Android app with TensorFlow and Camera X

What to do this time

We will use the image analysis use case of cameraX to quickly create an app that detects objects in real time using CameraX and Tensorflow lite. (Note: The CameraX implementation is from 1.0.0-rc01.) The GitHub repository is now listed at the bottom of this article, so please refer to it as appropriate. It's a bit long, so if you want to try it for the time being, please see the repository.

I will make something like this ↓

It displays the bounding box and score.

Preparing the model

Find a trained model to use for object detection. This time, we will use ssd_mobileNet_v1 from TensorFlow Hub. Download the tflite model. ssd_mobileNet_v1 is a model like this.

input
shape 300 x 300
color channel 3
output sahpe
location [1, 10, 4] Bounding box
category [1, 10] Category label index(91 classcoco_datasetItisamodellearnedin)
score [1, 10] Detection result score
number of detection [1] Number of detected objects(This model is constant at 10)

There are many other trained models in TensorFlow Hub, so choose the one you like. However, if the input size is large, the number of parameters is large and it takes time to infer on Android, so be careful. You may also need to export the tflite model yourself.

This time, I will use the model as it is, but it seems interesting to use Tensorflow API for transfer learning.

Implemented in Android Studio

gradle Add dependency for Tensorflow lite API and CameraX, permission dispatcher for camera permissions.

build.gradle


    // permissionDispatcher
    implementation "org.permissionsdispatcher:permissionsdispatcher:4.7.0"
    kapt "org.permissionsdispatcher:permissionsdispatcher-processor:4.7.0"

    // cameraX
    def camerax_version = "1.0.0-rc01"
    implementation "androidx.camera:camera-core:${camerax_version}"
    implementation "androidx.camera:camera-camera2:$camerax_version"
    implementation "androidx.camera:camera-lifecycle:$camerax_version"
    implementation "androidx.camera:camera-view:1.0.0-alpha20"

    // tensorflow lite
    implementation 'org.tensorflow:tensorflow-lite:2.2.0'
    implementation 'org.tensorflow:tensorflow-lite-support:0.0.0-nightly'

Preparation of assets folder

Put the .tflite model you downloaded earlier in the assets folder of Android Studio. (Assets can be created by right-clicking on the project "New-> Folder-> Assets Folder") Also prepare the correct label to map the index of the detection result to the label. For your own lipo, download the label coco_dataset from here and put the txt file in the assets folder in the same way.

Now in the assets folder in Android Studio I think you have two items, ssd_mobile_net_v1.tflite and coco_dataset_labels.txt.

CameraX implementation

(Note: The CameraX implementation is from 1.0.0-rc01.) Basically, just keep going with this Official Tutorial.

Add camera permissions to the manifest

AndroidManifest.xml


<uses-permission android:name="android.permission.CAMERA" />

Layout file definition Define the camera view and surfaceView. Since the bounding box etc. is drawn in real time, use surfaceView instead of View to reflect the detection result in the view.

activity_main.xml


<androidx.constraintlayout.widget.ConstraintLayout 
//abridgement// >

    <androidx.camera.view.PreviewView
        android:id="@+id/cameraView"
        android:layout_width="0dp"
        android:layout_height="0dp"
        //abridgement// />

    <SurfaceView
        android:id="@+id/resultView"
        android:layout_width="0dp"
        android:layout_height="0dp"
        //abridgement// />
</androidx.constraintlayout.widget.ConstraintLayout>

Implementation of cameraX in MainActivity. We will add the permissionDispatcher later. This area is the same as the tutorial, so you may want to refer to the latest tutorial.

MainActivity.kt


private lateinit var cameraExecutor: ExecutorService

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    cameraExecutor = Executors.newSingleThreadExecutor()
    setupCamera()
}
    
fun setupCamera() {
    val cameraProviderFuture = ProcessCameraProvider.getInstance(this)

    cameraProviderFuture.addListener({
        val cameraProvider: ProcessCameraProvider = cameraProviderFuture.get()

        //Preview use case
        val preview = Preview.Builder()
            .build()
            .also { it.setSurfaceProvider(cameraView.surfaceProvider) }

        //Use rear camera
        val cameraSelector = CameraSelector.DEFAULT_BACK_CAMERA

        //Image analysis(This time object detection)Use case
        val imageAnalyzer = ImageAnalysis.Builder()
            .setTargetRotation(cameraView.display.rotation)
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
            .build()
        //TODO Implement Image Analyzer for object detection image analysis use case here

        try {
            cameraProvider.unbindAll()

            //Bind each use case to cameraX
            cameraProvider.bindToLifecycle(this, cameraSelector, preview, imageAnalyzer)

        } catch (exc: Exception) {
            Log.e("ERROR: Camera", "Use case binding failed", exc)
        }
    }, ContextCompat.getMainExecutor(this))
}

override fun onDestroy() {
    super.onDestroy()
    cameraExecutor.shutdown()
}

For the time being, if you come here, you should be able to see the camera preview if you manually allow the camera permission from the settings. However, surfaceView is black by default, so if the screen is black, comment out surfaceView and check it.

Implementation of permission dispatcher

Implement permission disptcher for camera permission requests. (Since the authority is manually granted, please skip it if you don't mind)

MainActivity.kt


@RuntimePermissions
class MainActivity : AppCompatActivity() {
    //Abbreviation
    @NeedsPermission(Manifest.permission.CAMERA)
    fun setupCamera() {...}
}

Add each annotation to the target class and method and build once. A function for the permission request is automatically generated.

Change the setupCamera method earlier as follows so that it will be called from the permission request result. In addition, this time we will not implement the processing such as when it is rejected.

MainActivity.kt


override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    cameraExecutor = Executors.newSingleThreadExecutor()
    //setupCamera()Delete
    //setUpCamera with permissionDispatcher()Call method
    setupCameraWithPermissionCheck()
}

override fun onRequestPermissionsResult(
    requestCode: Int,
    permissions: Array<String>,
    grantResults: IntArray
) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults)
    onRequestPermissionsResult(requestCode, grantResults)
}

This completes the implementation of camera preview related items. Next, implement image analysis use cases, model loading, result display, and so on.

Implementation of model loading function

Implement a function in MainActivity that reads the tflite model and the correct label from assets. I haven't done anything particularly difficult, so it's ok to copy.

MainActivity.kt



companion object {
    private const val MODEL_FILE_NAME = "ssd_mobilenet_v1.tflite"
    private const val LABEL_FILE_NAME = "coco_dataset_labels.txt"
}

//Interpreter with wrappers for working with tflite models
private val interpreter: Interpreter by lazy {
    Interpreter(loadModel())
}

//Model correct label list
private val labels: List<String> by lazy {
    loadLabels()
}

//Load tflite model from assets
private fun loadModel(fileName: String = MainActivity.MODEL_FILE_NAME): ByteBuffer {
    lateinit var modelBuffer: ByteBuffer
    var file: AssetFileDescriptor? = null
    try {
        file = assets.openFd(fileName)
        val inputStream = FileInputStream(file.fileDescriptor)
        val fileChannel = inputStream.channel
        modelBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, file.startOffset, file.declaredLength)
    } catch (e: Exception) {
        Toast.makeText(this, "Model file read error", Toast.LENGTH_SHORT).show()
        finish()
    } finally {
        file?.close()
    }
    return modelBuffer
}

//Get model correct label data from assets
private fun loadLabels(fileName: String = MainActivity.LABEL_FILE_NAME): List<String> {
    var labels = listOf<String>()
    var inputStream: InputStream? = null
    try {
        inputStream = assets.open(fileName)
        val reader = BufferedReader(InputStreamReader(inputStream))
        labels = reader.readLines()
    } catch (e: Exception) {
        Toast.makeText(this, "txt file read error", Toast.LENGTH_SHORT).show()
        finish()
    } finally {
        inputStream?.close()
    }
    return labels
}

Implementation of image analysis use cases

We will implement the main object detection inference pipeline. It has become easier to implement by using the image analysis use case of CameraX. (It's not possible to implement it in a few lines ...) In Tutorial, the pixel values ​​are averaged.

Implement ImageAnalysis.Analyzer provided by cameraX and create aObjectDetector class that receives a preview of the camera and returns the analysis result. Define so that the analysis result can be received as a callback with typealias.

ObjectDetector.kt


typealias ObjectDetectorCallback = (image: List<DetectionObject>) -> Unit
/**
 *CameraX Object Detection Image Analysis Use Case
 * @param yuvToRgbConverter Image buffer YUV for camera image_420_Convert from 888 to RGB format
 * @param interpreter tflite Library for manipulating models
 * @param labels List of correct labels
 * @param resultViewSize The size of the surfaceView that displays the result
 * @Receive a list of parsing results in the param listener callback
 */
class ObjectDetector(
    private val yuvToRgbConverter: YuvToRgbConverter,
    private val interpreter: Interpreter,
    private val labels: List<String>,
    private val resultViewSize: Size,
    private val listener: ObjectDetectorCallback
) : ImageAnalysis.Analyzer {
    override fun analyze(image: ImageProxy) {
         //Implementation of TODO inference code
    }
}

/**
 *Class to put the detection result
 */
data class DetectionObject(
    val score: Float,
    val label: String,
    val boundingBox: RectF
)

Rewrite the part of "TODO here to implement ImageAnalyzer of object detection image analysis use case" of MainActivity as follows.

MainActivity.kt


//Image analysis(This time object detection)Use case
val imageAnalyzer = ImageAnalysis.Builder()
    .setTargetRotation(cameraView.display.rotation)
    .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
    .build()
    .also {
        it.setAnalyzer(
            cameraExecutor,
            ObjectDetector(
                yuvToRgbConverter,
                interpreter,
                labels,
                Size(resultView.width, resultView.height)
            ) { detectedObjectList ->
               //Display of TODO detection result
            }
        )
    }

See comments for each constructor variable. I think that ** YuvToRgbConverter ** is an error here, but I will explain it from now on, so it's okay.

I will implement the analyze method of the ImageAnalysis.Analyzer interface, but here the preview image of the camera will flow in the argument of the analyze method with the type ImageProxy. I can't infer without converting this ImageProxy to a bitmap or tensor, but this is a bit annoying. .. ..

Android.Media.Image is included in ImageProxy, and image pixel data is grouped and saved as one or more Planes. An android camera will generate a Image in the format YUV_420_888, so you need to create a converter to convert this to an RGB bitmap.

Sure, I think pytorch mobile had a converter, but tensorflow didn't. When I searched the repository, there was a source in the cameraX sample, so I will use it this time. (Although you may implement it yourself)

So Copy the this official sample converter to create the YuvToRgbConverter class and add that instance to MainActivity as follows:

MainActivity.kt


//Converter to convert YUV image of camera to RGB
private val yuvToRgbConverter: YuvToRgbConverter by lazy {
    YuvToRgbConverter(this)
}

Model-related variable definitions

Define the input image size of the model and the variables to receive the result in the ObjectDetector class. It must match the shape of the model you are using.

ObjectDetector.kt


companion object {
    //Model input and output sizes
    private const val IMG_SIZE_X = 300
    private const val IMG_SIZE_Y = 300
    private const val MAX_DETECTION_NUM = 10

    //Since the tflite model used this time has been quantized, normalize related is 127.Not 5f but as follows
    private const val NORMALIZE_MEAN = 0f
    private const val NORMALIZE_STD = 1f

    //Detection result score threshold
    private const val SCORE_THRESHOLD = 0.5f
}

private var imageRotationDegrees: Int = 0
private val tfImageProcessor by lazy {
    ImageProcessor.Builder()
        .add(ResizeOp(IMG_SIZE_X, IMG_SIZE_Y, ResizeOp.ResizeMethod.BILINEAR)) //Resize the image to fit the model's input
        .add(Rot90Op(-imageRotationDegrees / 90)) //The flowing Image Proxy is rotated 90 degrees, so its correction
        .add(NormalizeOp(NORMALIZE_MEAN, NORMALIZE_STD)) //Normalization related
        .build()
}

private val tfImageBuffer = TensorImage(DataType.UINT8)

//Bounding box of detection results[1:10:4]
//The bounding box[top, left, bottom, right]Form of
private val outputBoundingBoxes: Array<Array<FloatArray>> = arrayOf(
    Array(MAX_DETECTION_NUM) {
        FloatArray(4)
    }
)

//Discovery result class label index[1:10]
private val outputLabels: Array<FloatArray> = arrayOf(
    FloatArray(MAX_DETECTION_NUM)
)

//Each score of the detection result[1:10]
private val outputScores: Array<FloatArray> = arrayOf(
    FloatArray(MAX_DETECTION_NUM)
)

//Number of detected objects(This time it is set at the time of tflite conversion, so 10(Constant))
private val outputDetectionNum: FloatArray = FloatArray(1)

//Put together in a map to receive detection results
private val outputMap = mapOf(
    0 to outputBoundingBoxes,
    1 to outputLabels,
    2 to outputScores,
    3 to outputDetectionNum
)

It's hard to see because it's just variables, but I need all of them. Image preprocessing is done using the ImageProcessor in the tensorflow lite library. See comments for a description of each variable. Basically, the shown here model info is defined in kotlin.

Inference code implementation

Then use the interpreter to infer with the model.

ObjectDetector.kt


//Image YUV-> RGB bitmap -> tensorflowImage ->Convert to tensorflowBuffer, infer and output the result as a list
private fun detect(targetImage: Image): List<DetectionObject> {
    val targetBitmap = Bitmap.createBitmap(targetImage.width, targetImage.height, Bitmap.Config.ARGB_8888)
    yuvToRgbConverter.yuvToRgb(targetImage, targetBitmap) //Convert to rgb
    tfImageBuffer.load(targetBitmap)
    val tensorImage = tfImageProcessor.process(tfImageBuffer)

    //Performing inference with the tflite model
    interpreter.runForMultipleInputsOutputs(arrayOf(tensorImage.buffer), outputMap)

    //Format the inference result and return it as a list
    val detectedObjectList = arrayListOf<DetectionObject>()
    loop@ for (i in 0 until outputDetectionNum[0].toInt()) {
        val score = outputScores[0][i]
        val label = labels[outputLabels[0][i].toInt()]
        val boundingBox = RectF(
            outputBoundingBoxes[0][i][1] * resultViewSize.width,
            outputBoundingBoxes[0][i][0] * resultViewSize.height,
            outputBoundingBoxes[0][i][3] * resultViewSize.width,
            outputBoundingBoxes[0][i][2] * resultViewSize.height
        )

        //Add only those larger than the threshold
        if (score >= ObjectDetector.SCORE_THRESHOLD) {
            detectedObjectList.add(
                DetectionObject(
                    score = score,
                    label = label,
                    boundingBox = boundingBox
                )
            )
        } else {
            //The detection results are sorted in descending order of score, so if the score falls below the threshold, the loop ends.
            break@loop
        }
    }
    return detectedObjectList.take(4)
}

First, convert the cameraX image to YUV-> RGB bitmap-> tensorflowImage-> tensorflowBuffer Infer using interpreter. Since the inference result is stored in the outputMap put in the argument, create a detect function that formats the result from each defined output variable and returns it as a list.

Then call this detect function from the analyze function to complete the ObjectDetector class.

ObjectDetector.kt


//Infer the preview image flowing from cameraX by putting it in the object detection model.
@SuppressLint("UnsafeExperimentalUsageError")
override fun analyze(image: ImageProxy) {
    if (image.image == null) return
    imageRotationDegrees = image.imageInfo.rotationDegrees
    val detectedObjectList = detect(image.image!!)
    listener(detectedObjectList) //Receive detection result in callback
    image.close()
}

Note that you must always call image.close (). android.Media.Image consumes system resources and needs to be freed.

If you can implement it so far, the implementation of the inference pipeline is complete. Finally, implement the display of the detection result.

Implemented detection result display

Since the view is drawn in real time, use surfaceView instead of View to implement the display such as the bounding box. Create a OverlaySurfaceView class and write the initialization process appropriately. What are callbacks and surfaceViews? I will omit it because there are many articles written by other people.

OverlaySurfaceView.kt


class OverlaySurfaceView(surfaceView: SurfaceView) :
    SurfaceView(surfaceView.context), SurfaceHolder.Callback {

    init {
        surfaceView.holder.addCallback(this)
        surfaceView.setZOrderOnTop(true)
    }

    private var surfaceHolder = surfaceView.holder
    private val paint = Paint()
    private val pathColorList = listOf(Color.RED, Color.GREEN, Color.CYAN, Color.BLUE)

    override fun surfaceCreated(holder: SurfaceHolder) {
        //Make surfaceView transparent
        surfaceHolder.setFormat(PixelFormat.TRANSPARENT)
    }

    override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {
    }

    override fun surfaceDestroyed(holder: SurfaceHolder) {
    }
}

We will create a draw function that displays a bounding box.

OverlaySurfaceView.kt


fun draw(detectedObjectList: List<DetectionObject>) {
    //Get canvas via surfaceHolder(Even when the screen is not active, it will be drawn and exception may occur, so make it nullable and handle it below)
    val canvas: Canvas? = surfaceHolder.lockCanvas()
    //Clear what was previously drawn
    canvas?.drawColor(0, PorterDuff.Mode.CLEAR)

    detectedObjectList.mapIndexed { i, detectionObject ->
        //Bounding box display
        paint.apply {
            color = pathColorList[i]
            style = Paint.Style.STROKE
            strokeWidth = 7f
            isAntiAlias = false
        }
        canvas?.drawRect(detectionObject.boundingBox, paint)

        //Label and score display
        paint.apply {
            style = Paint.Style.FILL
            isAntiAlias = true
            textSize = 77f
        }
        canvas?.drawText(
            detectionObject.label + " " + "%,.2f".format(detectionObject.score * 100) + "%",
            detectionObject.boundingBox.left,
            detectionObject.boundingBox.top - 5f,
            paint
        )
    }

    surfaceHolder.unlockCanvasAndPost(canvas ?: return)
}

Although it is a canvas acquired via surfaceHolder, it is handled as nullable because the view may leak. I'm just using canvas to display the bounding box (Rect) and the characters.

All you have to do is set the SurfaceView callback and so on.

MainActity.kt


private lateinit var overlaySurfaceView: OverlaySurfaceView

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)
    overlaySurfaceView = OverlaySurfaceView(resultView)
    //Abbreviation
}

Change the callback "Display TODO detection result" part of the image analysis use case of MainActivity as follows.

MainActivity.kt


//Image analysis(This time object detection)Use case
val imageAnalyzer = ImageAnalysis.Builder()
    .setTargetRotation(cameraView.display.rotation)
    .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
    .build()
    .also {
        it.setAnalyzer(
            cameraExecutor,
            ObjectDetector(
                yuvToRgbConverter,
                interpreter,
                labels,
                Size(resultView.width, resultView.height)
            ) { detectedObjectList ->
                //Display of analysis results
                overlaySurfaceView.draw(detectedObjectList)
            }
        )
    }

This is completed! Did you implement it nicely?

end

I think everyone thinks that cameraX has become rc and it's about time. There are various use cases, and it is attractive that it is easy to do and expandable if implemented according to it. Personally, I think it's okay to put it in the product. ..

This time Click here for GitHub

Recommended Posts

Real-time object detection Android app with TensorFlow and Camera X
[Python] Real-time object detection with iPad camera
What you can and cannot do with Tensorflow 2.x
2020/02 Python 3.7 + TensorFlow 2.1 + Keras 2.3.1 + YOLOv3 Object detection with the latest version
Real-time edge detection with OpenCV
Install Python 2.7.9 and Python 3.4.x with pip.
Logo detection using TensorFlow Object Detection API
How to process camera images with Teams and Zoom Sentiment analysis with Tensorflow
I tried object detection with YOLO v3 (TensorFlow 2.0) on a windows CPU!
Hello World and face detection with opencv-python 4.2
Compare raw TensorFlow with tf.contrib.learn and Keras
Try Object detection with Raspberry Pi 4 + Coral
Try real-time object recognition using YOLOv2 (TensorFlow)
Cooking object detection with yolo + image classification
[kotlin] Create an app that recognizes photos taken with a camera on android
Build a flask app made with tensorflow and dlib to work on centos7
I tried object detection with YOLO v3 (TensorFlow 2.1) on the GPU of windows!