We will use the image analysis use case of cameraX to quickly create an app that detects objects in real time using CameraX
and Tensorflow lite
.
(Note: The CameraX implementation is from 1.0.0-rc01
.)
The GitHub repository is now listed at the bottom of this article, so please refer to it as appropriate.
It's a bit long, so if you want to try it for the time being, please see the repository.
I will make something like this ↓
It displays the bounding box and score.
Find a trained model to use for object detection.
This time, we will use ssd_mobileNet_v1 from TensorFlow Hub. Download the tflite model
.
ssd_mobileNet_v1
is a model like this.
input | |
---|---|
shape | 300 x 300 |
color channel | 3 |
output | sahpe | |
---|---|---|
location | [1, 10, 4] | Bounding box |
category | [1, 10] | Category label index(91 classcoco_datasetItisamodellearnedin) |
score | [1, 10] | Detection result score |
number of detection | [1] | Number of detected objects(This model is constant at 10) |
There are many other trained models in TensorFlow Hub, so choose the one you like.
However, if the input size
is large, the number of parameters is large and it takes time to infer on Android, so be careful.
You may also need to export the tflite model yourself.
This time, I will use the model as it is, but it seems interesting to use Tensorflow API
for transfer learning.
gradle Add dependency for Tensorflow lite API and CameraX, permission dispatcher for camera permissions.
build.gradle
// permissionDispatcher
implementation "org.permissionsdispatcher:permissionsdispatcher:4.7.0"
kapt "org.permissionsdispatcher:permissionsdispatcher-processor:4.7.0"
// cameraX
def camerax_version = "1.0.0-rc01"
implementation "androidx.camera:camera-core:${camerax_version}"
implementation "androidx.camera:camera-camera2:$camerax_version"
implementation "androidx.camera:camera-lifecycle:$camerax_version"
implementation "androidx.camera:camera-view:1.0.0-alpha20"
// tensorflow lite
implementation 'org.tensorflow:tensorflow-lite:2.2.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.0.0-nightly'
Put the .tflite
model you downloaded earlier in the assets folder of Android Studio. (Assets can be created by right-clicking on the project "New-> Folder-> Assets Folder")
Also prepare the correct label to map the index of the detection result to the label.
For your own lipo, download the label coco_dataset
from here and put the txt file in the assets folder in the same way.
Now in the assets folder in Android Studio
I think you have two items, ssd_mobile_net_v1.tflite
and coco_dataset_labels.txt
.
(Note: The CameraX implementation is from 1.0.0-rc01
.)
Basically, just keep going with this Official Tutorial.
Add camera permissions to the manifest
AndroidManifest.xml
<uses-permission android:name="android.permission.CAMERA" />
Layout file definition
Define the camera view and surfaceView
.
Since the bounding box etc. is drawn in real time, use surfaceView
instead of View
to reflect the detection result in the view.
activity_main.xml
<androidx.constraintlayout.widget.ConstraintLayout
//abridgement// >
<androidx.camera.view.PreviewView
android:id="@+id/cameraView"
android:layout_width="0dp"
android:layout_height="0dp"
//abridgement// />
<SurfaceView
android:id="@+id/resultView"
android:layout_width="0dp"
android:layout_height="0dp"
//abridgement// />
</androidx.constraintlayout.widget.ConstraintLayout>
Implementation of cameraX in MainActivity. We will add the permissionDispatcher later. This area is the same as the tutorial, so you may want to refer to the latest tutorial.
MainActivity.kt
private lateinit var cameraExecutor: ExecutorService
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
cameraExecutor = Executors.newSingleThreadExecutor()
setupCamera()
}
fun setupCamera() {
val cameraProviderFuture = ProcessCameraProvider.getInstance(this)
cameraProviderFuture.addListener({
val cameraProvider: ProcessCameraProvider = cameraProviderFuture.get()
//Preview use case
val preview = Preview.Builder()
.build()
.also { it.setSurfaceProvider(cameraView.surfaceProvider) }
//Use rear camera
val cameraSelector = CameraSelector.DEFAULT_BACK_CAMERA
//Image analysis(This time object detection)Use case
val imageAnalyzer = ImageAnalysis.Builder()
.setTargetRotation(cameraView.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
.build()
//TODO Implement Image Analyzer for object detection image analysis use case here
try {
cameraProvider.unbindAll()
//Bind each use case to cameraX
cameraProvider.bindToLifecycle(this, cameraSelector, preview, imageAnalyzer)
} catch (exc: Exception) {
Log.e("ERROR: Camera", "Use case binding failed", exc)
}
}, ContextCompat.getMainExecutor(this))
}
override fun onDestroy() {
super.onDestroy()
cameraExecutor.shutdown()
}
For the time being, if you come here, you should be able to see the camera preview if you manually allow the camera permission from the settings. However, surfaceView
is black by default, so if the screen is black, comment out surfaceView
and check it.
Implement permission disptcher for camera permission requests. (Since the authority is manually granted, please skip it if you don't mind)
MainActivity.kt
@RuntimePermissions
class MainActivity : AppCompatActivity() {
//Abbreviation
@NeedsPermission(Manifest.permission.CAMERA)
fun setupCamera() {...}
}
Add each annotation to the target class and method and build once. A function for the permission request is automatically generated.
Change the setupCamera
method earlier as follows so that it will be called from the permission request result.
In addition, this time we will not implement the processing such as when it is rejected.
MainActivity.kt
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
cameraExecutor = Executors.newSingleThreadExecutor()
//setupCamera()Delete
//setUpCamera with permissionDispatcher()Call method
setupCameraWithPermissionCheck()
}
override fun onRequestPermissionsResult(
requestCode: Int,
permissions: Array<String>,
grantResults: IntArray
) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults)
onRequestPermissionsResult(requestCode, grantResults)
}
This completes the implementation of camera preview related items. Next, implement image analysis use cases, model loading, result display, and so on.
Implement a function in MainActivity that reads the tflite
model and the correct label from assets.
I haven't done anything particularly difficult, so it's ok to copy.
MainActivity.kt
companion object {
private const val MODEL_FILE_NAME = "ssd_mobilenet_v1.tflite"
private const val LABEL_FILE_NAME = "coco_dataset_labels.txt"
}
//Interpreter with wrappers for working with tflite models
private val interpreter: Interpreter by lazy {
Interpreter(loadModel())
}
//Model correct label list
private val labels: List<String> by lazy {
loadLabels()
}
//Load tflite model from assets
private fun loadModel(fileName: String = MainActivity.MODEL_FILE_NAME): ByteBuffer {
lateinit var modelBuffer: ByteBuffer
var file: AssetFileDescriptor? = null
try {
file = assets.openFd(fileName)
val inputStream = FileInputStream(file.fileDescriptor)
val fileChannel = inputStream.channel
modelBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, file.startOffset, file.declaredLength)
} catch (e: Exception) {
Toast.makeText(this, "Model file read error", Toast.LENGTH_SHORT).show()
finish()
} finally {
file?.close()
}
return modelBuffer
}
//Get model correct label data from assets
private fun loadLabels(fileName: String = MainActivity.LABEL_FILE_NAME): List<String> {
var labels = listOf<String>()
var inputStream: InputStream? = null
try {
inputStream = assets.open(fileName)
val reader = BufferedReader(InputStreamReader(inputStream))
labels = reader.readLines()
} catch (e: Exception) {
Toast.makeText(this, "txt file read error", Toast.LENGTH_SHORT).show()
finish()
} finally {
inputStream?.close()
}
return labels
}
We will implement the main object detection inference pipeline. It has become easier to implement by using the image analysis use case of CameraX. (It's not possible to implement it in a few lines ...) In Tutorial, the pixel values are averaged.
Implement ImageAnalysis.Analyzer
provided by cameraX and create aObjectDetector class
that receives a preview of the camera and returns the analysis result.
Define so that the analysis result can be received as a callback with typealias
.
ObjectDetector.kt
typealias ObjectDetectorCallback = (image: List<DetectionObject>) -> Unit
/**
*CameraX Object Detection Image Analysis Use Case
* @param yuvToRgbConverter Image buffer YUV for camera image_420_Convert from 888 to RGB format
* @param interpreter tflite Library for manipulating models
* @param labels List of correct labels
* @param resultViewSize The size of the surfaceView that displays the result
* @Receive a list of parsing results in the param listener callback
*/
class ObjectDetector(
private val yuvToRgbConverter: YuvToRgbConverter,
private val interpreter: Interpreter,
private val labels: List<String>,
private val resultViewSize: Size,
private val listener: ObjectDetectorCallback
) : ImageAnalysis.Analyzer {
override fun analyze(image: ImageProxy) {
//Implementation of TODO inference code
}
}
/**
*Class to put the detection result
*/
data class DetectionObject(
val score: Float,
val label: String,
val boundingBox: RectF
)
Rewrite the part of "TODO here to implement ImageAnalyzer of object detection image analysis use case" of MainActivity as follows.
MainActivity.kt
//Image analysis(This time object detection)Use case
val imageAnalyzer = ImageAnalysis.Builder()
.setTargetRotation(cameraView.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
.build()
.also {
it.setAnalyzer(
cameraExecutor,
ObjectDetector(
yuvToRgbConverter,
interpreter,
labels,
Size(resultView.width, resultView.height)
) { detectedObjectList ->
//Display of TODO detection result
}
)
}
See comments for each constructor variable. I think that ** YuvToRgbConverter ** is an error here, but I will explain it from now on, so it's okay.
I will implement the analyze
method of the ImageAnalysis.Analyzer
interface, but here the preview image of the camera will flow in the argument of the analyze
method with the type ImageProxy
.
I can't infer without converting this ImageProxy
to a bitmap or tensor, but this is a bit annoying. .. ..
Android.Media.Image
is included in ImageProxy
, and image pixel data is grouped and saved as one or more Plane
s. An android camera will generate a Image
in the format YUV_420_888
, so you need to create a converter to convert this to an RGB bitmap.
Sure, I think pytorch mobile had a converter, but tensorflow didn't. When I searched the repository, there was a source in the cameraX sample, so I will use it this time. (Although you may implement it yourself)
So
Copy the this official sample converter to create the YuvToRgbConverter
class and add that instance to MainActivity as follows:
MainActivity.kt
//Converter to convert YUV image of camera to RGB
private val yuvToRgbConverter: YuvToRgbConverter by lazy {
YuvToRgbConverter(this)
}
Define the input image size of the model and the variables to receive the result in the ObjectDetector
class. It must match the shape
of the model you are using.
ObjectDetector.kt
companion object {
//Model input and output sizes
private const val IMG_SIZE_X = 300
private const val IMG_SIZE_Y = 300
private const val MAX_DETECTION_NUM = 10
//Since the tflite model used this time has been quantized, normalize related is 127.Not 5f but as follows
private const val NORMALIZE_MEAN = 0f
private const val NORMALIZE_STD = 1f
//Detection result score threshold
private const val SCORE_THRESHOLD = 0.5f
}
private var imageRotationDegrees: Int = 0
private val tfImageProcessor by lazy {
ImageProcessor.Builder()
.add(ResizeOp(IMG_SIZE_X, IMG_SIZE_Y, ResizeOp.ResizeMethod.BILINEAR)) //Resize the image to fit the model's input
.add(Rot90Op(-imageRotationDegrees / 90)) //The flowing Image Proxy is rotated 90 degrees, so its correction
.add(NormalizeOp(NORMALIZE_MEAN, NORMALIZE_STD)) //Normalization related
.build()
}
private val tfImageBuffer = TensorImage(DataType.UINT8)
//Bounding box of detection results[1:10:4]
//The bounding box[top, left, bottom, right]Form of
private val outputBoundingBoxes: Array<Array<FloatArray>> = arrayOf(
Array(MAX_DETECTION_NUM) {
FloatArray(4)
}
)
//Discovery result class label index[1:10]
private val outputLabels: Array<FloatArray> = arrayOf(
FloatArray(MAX_DETECTION_NUM)
)
//Each score of the detection result[1:10]
private val outputScores: Array<FloatArray> = arrayOf(
FloatArray(MAX_DETECTION_NUM)
)
//Number of detected objects(This time it is set at the time of tflite conversion, so 10(Constant))
private val outputDetectionNum: FloatArray = FloatArray(1)
//Put together in a map to receive detection results
private val outputMap = mapOf(
0 to outputBoundingBoxes,
1 to outputLabels,
2 to outputScores,
3 to outputDetectionNum
)
It's hard to see because it's just variables, but I need all of them.
Image preprocessing is done using the ImageProcessor
in the tensorflow lite library.
See comments for a description of each variable. Basically, the shown here model info is defined in kotlin.
Then use the interpreter to infer with the model.
ObjectDetector.kt
//Image YUV-> RGB bitmap -> tensorflowImage ->Convert to tensorflowBuffer, infer and output the result as a list
private fun detect(targetImage: Image): List<DetectionObject> {
val targetBitmap = Bitmap.createBitmap(targetImage.width, targetImage.height, Bitmap.Config.ARGB_8888)
yuvToRgbConverter.yuvToRgb(targetImage, targetBitmap) //Convert to rgb
tfImageBuffer.load(targetBitmap)
val tensorImage = tfImageProcessor.process(tfImageBuffer)
//Performing inference with the tflite model
interpreter.runForMultipleInputsOutputs(arrayOf(tensorImage.buffer), outputMap)
//Format the inference result and return it as a list
val detectedObjectList = arrayListOf<DetectionObject>()
loop@ for (i in 0 until outputDetectionNum[0].toInt()) {
val score = outputScores[0][i]
val label = labels[outputLabels[0][i].toInt()]
val boundingBox = RectF(
outputBoundingBoxes[0][i][1] * resultViewSize.width,
outputBoundingBoxes[0][i][0] * resultViewSize.height,
outputBoundingBoxes[0][i][3] * resultViewSize.width,
outputBoundingBoxes[0][i][2] * resultViewSize.height
)
//Add only those larger than the threshold
if (score >= ObjectDetector.SCORE_THRESHOLD) {
detectedObjectList.add(
DetectionObject(
score = score,
label = label,
boundingBox = boundingBox
)
)
} else {
//The detection results are sorted in descending order of score, so if the score falls below the threshold, the loop ends.
break@loop
}
}
return detectedObjectList.take(4)
}
First, convert the cameraX image to YUV-> RGB bitmap-> tensorflowImage-> tensorflowBuffer
Infer using interpreter
. Since the inference result is stored in the outputMap
put in the argument, create a detect
function that formats the result from each defined output variable and returns it as a list.
Then call this detect
function from the analyze
function to complete the ObjectDetector
class.
ObjectDetector.kt
//Infer the preview image flowing from cameraX by putting it in the object detection model.
@SuppressLint("UnsafeExperimentalUsageError")
override fun analyze(image: ImageProxy) {
if (image.image == null) return
imageRotationDegrees = image.imageInfo.rotationDegrees
val detectedObjectList = detect(image.image!!)
listener(detectedObjectList) //Receive detection result in callback
image.close()
}
Note that you must always call image.close ()
. android.Media.Image
consumes system resources and needs to be freed.
If you can implement it so far, the implementation of the inference pipeline is complete. Finally, implement the display of the detection result.
Since the view is drawn in real time, use surfaceView
instead of View
to implement the display such as the bounding box.
Create a OverlaySurfaceView
class and write the initialization process appropriately.
What are callbacks and surfaceViews? I will omit it because there are many articles written by other people.
OverlaySurfaceView.kt
class OverlaySurfaceView(surfaceView: SurfaceView) :
SurfaceView(surfaceView.context), SurfaceHolder.Callback {
init {
surfaceView.holder.addCallback(this)
surfaceView.setZOrderOnTop(true)
}
private var surfaceHolder = surfaceView.holder
private val paint = Paint()
private val pathColorList = listOf(Color.RED, Color.GREEN, Color.CYAN, Color.BLUE)
override fun surfaceCreated(holder: SurfaceHolder) {
//Make surfaceView transparent
surfaceHolder.setFormat(PixelFormat.TRANSPARENT)
}
override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {
}
override fun surfaceDestroyed(holder: SurfaceHolder) {
}
}
We will create a draw
function that displays a bounding box.
OverlaySurfaceView.kt
fun draw(detectedObjectList: List<DetectionObject>) {
//Get canvas via surfaceHolder(Even when the screen is not active, it will be drawn and exception may occur, so make it nullable and handle it below)
val canvas: Canvas? = surfaceHolder.lockCanvas()
//Clear what was previously drawn
canvas?.drawColor(0, PorterDuff.Mode.CLEAR)
detectedObjectList.mapIndexed { i, detectionObject ->
//Bounding box display
paint.apply {
color = pathColorList[i]
style = Paint.Style.STROKE
strokeWidth = 7f
isAntiAlias = false
}
canvas?.drawRect(detectionObject.boundingBox, paint)
//Label and score display
paint.apply {
style = Paint.Style.FILL
isAntiAlias = true
textSize = 77f
}
canvas?.drawText(
detectionObject.label + " " + "%,.2f".format(detectionObject.score * 100) + "%",
detectionObject.boundingBox.left,
detectionObject.boundingBox.top - 5f,
paint
)
}
surfaceHolder.unlockCanvasAndPost(canvas ?: return)
}
Although it is a canvas acquired via surfaceHolder, it is handled as nullable because the view may leak.
I'm just using canvas
to display the bounding box (Rect) and the characters.
All you have to do is set the SurfaceView callback and so on.
MainActity.kt
private lateinit var overlaySurfaceView: OverlaySurfaceView
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
overlaySurfaceView = OverlaySurfaceView(resultView)
//Abbreviation
}
Change the callback "Display TODO detection result" part of the image analysis use case of MainActivity as follows.
MainActivity.kt
//Image analysis(This time object detection)Use case
val imageAnalyzer = ImageAnalysis.Builder()
.setTargetRotation(cameraView.display.rotation)
.setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) //Show only the preview image of the latest camera
.build()
.also {
it.setAnalyzer(
cameraExecutor,
ObjectDetector(
yuvToRgbConverter,
interpreter,
labels,
Size(resultView.width, resultView.height)
) { detectedObjectList ->
//Display of analysis results
overlaySurfaceView.draw(detectedObjectList)
}
)
}
This is completed! Did you implement it nicely?
I think everyone thinks that cameraX has become rc and it's about time. There are various use cases, and it is attractive that it is easy to do and expandable if implemented according to it. Personally, I think it's okay to put it in the product. ..
This time Click here for GitHub
Recommended Posts