I used the AudioRecord class to do audio processing on Android. This is a bit of a songwriter, and I've referred to some Japanese pages, but I'm afraid there are a lot of ambiguities in the specs ...
It may be difficult to understand without basic knowledge of speech processing in the first place, but there are many APIs such as `positionNotificationPeriod`
and `` `notificationMarkerPosition``` that are not clear what is different.
So, I will leave a memo (code with comments) of the official document and the specifications investigated in the local test.
AudioRecordSample.kt
import android.media.AudioFormat
import android.media.AudioRecord
import android.media.MediaRecorder
import android.util.Log
import kotlin.math.max
/**
*Sample code for AudioRecord class
*/
class AudioRecordSample {
//Sampling rate(Hz)
//All device support guarantee is 44100 only
private val samplingRate = 44100
//frame rate(fps)
//How many times you want to process audio data per second
//Decide on your own
private val frameRate = 10
//1 frame of audio data(=Short value)Number of
private val oneFrameDataCount = samplingRate / frameRate
//Number of bytes of audio data in one frame(byte)
// Byte = 8 bit, Short =Because it's 16 bit,Double short
private val oneFrameSizeInByte = oneFrameDataCount * 2
//Audio data buffer size(byte)
//Requirement 1:Must be larger than oneFrameSizeInByte
//Requirement 2:Must be greater than the minimum required by the device
private val audioBufferSizeInByte =
max(oneFrameSizeInByte * 10, //Appropriately provided a buffer for 10 frames
android.media.AudioRecord.getMinBufferSize(samplingRate,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT))
fun startRecording() {
//Create an instance
val audioRecord = AudioRecord(
MediaRecorder.AudioSource.MIC, //Audio source
samplingRate, //Sampling rate
AudioFormat.CHANNEL_IN_MONO, //Channel settings.MONO and STEREO guarantees support for all devices
AudioFormat.ENCODING_PCM_16BIT, //PCM16 guarantees support for all devices
audioBufferSizeInByte) //buffer
//How many audio data to process( =Number of data in one frame)
audioRecord.positionNotificationPeriod = oneFrameDataCount
//At the timing when the number specified here is reached,Subsequent onMarkerReached is called
//Doesn't it seem necessary for normal streaming processing?
audioRecord.notificationMarkerPosition = 40000 //Do not set if not used.
//Array to store audio data
val audioDataArray = ShortArray(oneFrameDataCount)
//Specify callback
audioRecord.setRecordPositionUpdateListener(object : AudioRecord.OnRecordPositionUpdateListener {
//Processing for each frame
override fun onPeriodicNotification(recorder: AudioRecord) {
recorder.read(audioDataArray, 0, oneFrameDataCount) //Read voice data
Log.v("AudioRecord", "onPeriodicNotification size=${audioDataArray.size}")
//Process as you like
}
//Marker timing processing.
//Called when notificationMarkerPosition is reached
override fun onMarkerReached(recorder: AudioRecord) {
recorder.read(audioDataArray, 0, oneFrameDataCount) //Read voice data
Log.v("AudioRecord", "onMarkerReached size=${audioDataArray.size}")
//Process as you like
}
})
audioRecord.startRecording()
}
}
It's just basic. I'll write it again if I can get more advanced insights such as performance.
Recommended Posts