[PYTHON] Distinguish t + pazolite songs by machine learning (NNC challenge development)

1.First of all

[Last time] 1 showed that "t + pazolite music can be discriminated by a machine" as a work of [NNC Challenge] 2. Although I was satisfied with the result, I was not able to think about machine learning, which is important from the beginning to the preprocessing of data so that the content can be input to NNC. I would like to make more practical content by making use of what I noticed in the previous challenge.

2. Did the machine really judge the song?

Last time, I made a classifier that mixes the music data provided by [Audiostock] 3 with the music and determines whether the music is due to the music or not. The evaluation boasts a correct answer rate of 99% or more, and the result is that the music of one composer can be judged by machine learning. The data used is about 10,000 songs provided by Audiostock, plus the songs on hand that have been cut to the same size as the provided data. The question here is whether the data provided was diverse enough to determine the uniqueness of the song. The author did not actually reproduce all of the provided data. The data provided may not have been "fast" songs that exceeded t + pazolite's specialty of BPM 150, and it was easy to determine the results in response to the sound of the special synthesizer used in the Topazo song. There is a possibility. This time I would like to verify this point.

3. Verification

First, we will verify the possibility that the provided data was biased. If the data provided was too dissociated from the song to determine, it would be easier to determine. Therefore, as verification data, we prepared a mixture of "similar" songs and topo songs, and investigated whether the topo songs could be identified. The data I prepared is small, but the result looks like a photo. As expected, the accuracy rate dropped. The cause is that there were few songs that were "similar" to the learning data. Although there is a sufficient difference between the data provided by Audiostock and the Pazo song, it is not possible to distinguish between the TANO-C song and the Pazo song on the same scale.

4. Improvement

The music of other composers of the composer group "HARDCORE TANO-C" to which t + pazolite belongs will be mixed with the learning data. The result of learning with TANO-C music data mixed with learning data is as follows. Judgment results with higher single-digit accuracy than the previous results were obtained. It can also be seen that Topazo songs are identifiable songs among songs of the same genre.

5. At the end

From this result, it can be seen that a specific composer can be identified by machine learning. It was also found that the accuracy can be improved by mixing a lot of songs of the composer of the same genre as the composer in the learning data. Next time, I would like to consider a method to improve accuracy by machine learning method itself instead of data preprocessing.