Create an application that recognizes the image (image) captured by the camera on Android in real time. Run the trained model on android using PyTorch Mobile.
This ↓
The sample app I made is listed at the bottom, so please take a look if you like.
First, add dependencies (as of February 2020) camera x and pytorch mobile
build.gradle
def camerax_version = '1.0.0-alpha06'
implementation "androidx.camera:camera-core:${camerax_version}"
implementation "androidx.camera:camera-camera2:${camerax_version}"
implementation 'org.pytorch:pytorch_android:1.4.0'
implementation 'org.pytorch:pytorch_android_torchvision:1.4.0'
Add the following to the end of the upper ** android {} **
build.gradle
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
After adding the dependency, we will implement the function to take a picture using ** Camera X **, a library that makes it easy to handle the camera on Android.
Below, we will implement the official Camera X Tutorial. Details are mentioned in other articles, so omit it and just the code.
Permission permission
<uses-permission android:name="android.permission.CAMERA" />
Place a button to start the camera and textureView for preview display
activity_main.xml
<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity">
<TextureView
android:id="@+id/view_finder"
android:layout_width="0dp"
android:layout_height="0dp"
android:layout_marginBottom="16dp"
app:layout_constraintBottom_toTopOf="@+id/activateCameraBtn"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
<androidx.constraintlayout.widget.ConstraintLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:alpha="0.7"
android:animateLayoutChanges="true"
android:background="@android:color/white"
app:layout_constraintEnd_toEndOf="@+id/view_finder"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="@+id/view_finder">
<TextView
android:id="@+id/inferredCategoryText"
android:layout_width="0dp"
android:layout_height="wrap_content"
android:layout_marginStart="8dp"
android:layout_marginTop="16dp"
android:layout_marginEnd="8dp"
android:text="Inference result"
android:textSize="18sp"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
<TextView
android:id="@+id/inferredScoreText"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginStart="24dp"
android:layout_marginTop="16dp"
android:text="Score"
android:textSize="18sp"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/inferredCategoryText" />
</androidx.constraintlayout.widget.ConstraintLayout>
<Button
android:id="@+id/activateCameraBtn"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_marginBottom="16dp"
android:text="Camera activation"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent" />
</androidx.constraintlayout.widget.ConstraintLayout>
use case Camera X offers three use cases: ** preview, image capture, and image analysis **. This time we will use preview and image analysis. It will be easier to sort out the code by matching it with the use case. By the way, the possible combinations are as follows. (From official documentation)
We will implement up to the preview of the use case of Camera X. Almost the same content as Tutorial.
MainActivity.kt
private const val REQUEST_CODE_PERMISSIONS = 10
private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)
class MainActivity : AppCompatActivity(), LifecycleOwner {
private val executor = Executors.newSingleThreadExecutor()
private lateinit var viewFinder: TextureView
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
viewFinder = findViewById(R.id.view_finder)
//Camera activation
activateCameraBtn.setOnClickListener {
if (allPermissionsGranted()) {
viewFinder.post { startCamera() }
} else {
ActivityCompat.requestPermissions(
this, REQUIRED_PERMISSIONS, REQUEST_CODE_PERMISSIONS
)
}
}
viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
updateTransform()
}
}
private fun startCamera() {
//Implementation of preview useCase
val previewConfig = PreviewConfig.Builder().apply {
setTargetResolution(Size(viewFinder.width, viewFinder.height))
}.build()
val preview = Preview(previewConfig)
preview.setOnPreviewOutputUpdateListener {
val parent = viewFinder.parent as ViewGroup
parent.removeView(viewFinder)
parent.addView(viewFinder, 0)
viewFinder.surfaceTexture = it.surfaceTexture
updateTransform()
}
/**We will implement the image analysis useCase here later.**/
CameraX.bindToLifecycle(this, preview)
}
private fun updateTransform() {
val matrix = Matrix()
val centerX = viewFinder.width / 2f
val centerY = viewFinder.height / 2f
val rotationDegrees = when (viewFinder.display.rotation) {
Surface.ROTATION_0 -> 0
Surface.ROTATION_90 -> 90
Surface.ROTATION_180 -> 180
Surface.ROTATION_270 -> 270
else -> return
}
matrix.postRotate(-rotationDegrees.toFloat(), centerX, centerY)
//Reflected in textureView
viewFinder.setTransform(matrix)
}
override fun onRequestPermissionsResult(
requestCode: Int, permissions: Array<String>, grantResults: IntArray
) {
if (requestCode == REQUEST_CODE_PERMISSIONS) {
if (allPermissionsGranted()) {
viewFinder.post { startCamera() }
} else {
Toast.makeText(
this,
"Permissions not granted by the user.",
Toast.LENGTH_SHORT
).show()
finish()
}
}
}
private fun allPermissionsGranted() = REQUIRED_PERMISSIONS.all {
ContextCompat.checkSelfPermission(
baseContext, it
) == PackageManager.PERMISSION_GRANTED
}
}
This time we will use the trained resnet18.
import torch
import torchvision
model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet.pt")
If it can be executed successfully, a file called resnet.pt will be generated in the same hierarchy. Image recognition is performed using this trained resnet18.
Put the downloaded model in the ** asset folder ** of android studio. (Since it does not exist by default, you can create it by right-clicking on the res folder-> New-> Folder-> Asset folder)
Write the ImageNet class in a file to infer and convert it to a class name. Create a new ** ImageNetClasses.kt ** and write 1000 classes of ImageNet in it. It's too long, so copy it from github.
ImageNetClasses.kt
class ImageNetClasses {
var IMAGENET_CLASSES = arrayOf(
"tench, Tinca tinca",
"goldfish, Carassius auratus",
//Abbreviation(Please copy from github)
"ear, spike, capitulum",
"toilet tissue, toilet paper, bathroom tissue"
)
}
Next, we will implement image analysis of the use case of Camera X. Create a new file called ImageAnalyze.kt and perform image recognition processing.
In the flow, it feels like loading the model and converting the preview image to a tensor so that it can be used with pytorch mobile with image analysis use case, passing it through the model loaded from the asset folder earlier, and getting the result.
After that, I wrote an interface and a custom listener to reflect the inference result in the view. (I don't know how to write correctly around here, so please let me know if there is a smart way to write it.)
ImageAnalyze.kt
class ImageAnalyze(context: Context) : ImageAnalysis.Analyzer {
private lateinit var listener: OnAnalyzeListener //Custom listener for updating View
private var lastAnalyzedTimestamp = 0L
//Network model model loading
private val resnet = Module.load(getAssetFilePath(context, "resnet.pt"))
interface OnAnalyzeListener {
fun getAnalyzeResult(inferredCategory: String, score: Float)
}
override fun analyze(image: ImageProxy, rotationDegrees: Int) {
val currentTimestamp = System.currentTimeMillis()
if (currentTimestamp - lastAnalyzedTimestamp >= 0.5) { // 0.Infer every 5 seconds
lastAnalyzedTimestamp = currentTimestamp
//Convert to tensor(I checked the image format and found YUV_420_It was called 888)
val inputTensor = TensorImageUtils.imageYUV420CenterCropToFloat32Tensor(
image.image,
rotationDegrees,
224,
224,
TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
TensorImageUtils.TORCHVISION_NORM_STD_RGB
)
//Infer with a trained model
val outputTensor = resnet.forward(IValue.from(inputTensor)).toTensor()
val scores = outputTensor.dataAsFloatArray
var maxScore = 0F
var maxScoreIdx = 0
for (i in scores.indices) { //Get the index with the highest score
if (scores[i] > maxScore) {
maxScore = scores[i]
maxScoreIdx = i
}
}
//Get the category name from the score
val inferredCategory = ImageNetClasses().IMAGENET_CLASSES[maxScoreIdx]
listener.getAnalyzeResult(inferredCategory, maxScore) //Update View
}
}
////Function to get the path from the asset file
private fun getAssetFilePath(context: Context, assetName: String): String {
val file = File(context.filesDir, assetName)
if (file.exists() && file.length() > 0) {
return file.absolutePath
}
context.assets.open(assetName).use { inputStream ->
FileOutputStream(file).use { outputStream ->
val buffer = ByteArray(4 * 1024)
var read: Int
while (inputStream.read(buffer).also { read = it } != -1) {
outputStream.write(buffer, 0, read)
}
outputStream.flush()
}
return file.absolutePath
}
}
fun setOnAnalyzeListener(listener: OnAnalyzeListener){
this.listener = listener
}
}
I was confused because the image was an unfamiliar type called ImageProxy, but when I checked the format, I thought that I had to convert it to bitmap with YUV_420_888, but pytorch mobile has a method to convert from YUV_420 to tensor, and it can be easily inferred just by throwing it in. It was.
By the way, if you look at the code, you may have thought that it is real-time, but every 0.5 seconds ..
Introduced the ImageAnalyze class created earlier to Camera X as a use case, and finally implemented the interface of the ImageAnalyze class in MainActivity using an anonymous object, and completed it so that the view can be updated.
Add the following code to the end of onCreate. (At the top, I commented "/ ** I will implement the image analysis useCase here ** /" later)
MainActivity.kt
//Implementation of image analysis useCase
val analyzerConfig = ImageAnalysisConfig.Builder().apply {
setImageReaderMode(
ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE
)
}.build()
//instance
val imageAnalyzer = ImageAnalyze(applicationContext)
//Display inference results
imageAnalyzer.setOnAnalyzeListener(object : ImageAnalyze.OnAnalyzeListener {
override fun getAnalyzeResult(inferredCategory: String, score: Float) {
//Change the view from other than the main thread
viewFinder.post {
inferredCategoryText.text = "Inference result: $inferredCategory"
inferredScoreText.text = "Score: $score"
}
}
})
val analyzerUseCase = ImageAnalysis(analyzerConfig).apply {
setAnalyzer(executor, imageAnalyzer)
}
//useCase is preview and image analysis
CameraX.bindToLifecycle(this, preview, analyzerUseCase) //Added image analysis to use case
Complete! !! If you can implement it so far, the application at the beginning should be completed. Please play around with it.
This code is listed on github, so please refer to it as appropriate.
Camera X Really convenient! You can easily perform image analysis in combination with pytroch mobile. It can't be helped that the processing makes it heavier. If you can prepare a model, you can easily make various image recognition applications using a camera. After all, I wonder if it is quick to make an application using that model such as transfer learning.
I want to make and release a machine learning application ... ~~ We plan to make a sample app in the near future. (Currently under review) ~~
I added it because it passed the examination. I tried to incorporate the content written in this article into the app. It is published on the Play Store.
If you want to experience it quickly, or if you are willing to download it, we would appreciate it if you could download it.
Play Store: Object Analyzer English and Japanese support
To be honest, there is a big difference between what can be judged and what cannot be judged ...
Recommended Posts