[PYTHON] SoundFont mapping problem

This article is from Music Tools / Libraries / Technology Advent Calendar 2019 12/24.

(This time I prepared it in a hurry, so it may not fit the purpose a little. Next year, I would like to do my own sound source and audio compression.

Introduction

I have been using DTM for a long time with the combination of Music Studio Producer, which was introduced 10 years ago, and Timidity ++ and SoundFont, which was introduced 8 years ago. However, due to the compatibility problem between the unupdated MIDI driver of Timidity and Windows 10, we finally decided to abolish Timidity ...

Ten years ago, I created a mapping file manually, but in the last 10 years, I have supported programming, so this time I am talking about trying to get a comfortable DTM environment with the power of the program.

Soundfont mapping

When using a SoundFont in a DAW that can only handle MIDI, use a virtual sound source that can use a SoundFont such as Timidity as a MIDI sound source. The tone of the SoundFont can be specified by combining ** bank number and preset number **, and any tone in the SoundFont can be specified by MIDI message to support MIDI ** bank select and program change **. You can choose.

However, if multiple SoundFonts are used at the same time, banks and preset numbers may collide, and depending on the SoundFont, the arrangement of tones specified by MIDI may differ significantly, which is inconvenient and mapping is performed.

Fortunately Timidity has this feature and we've placed it for ease of use. (Since I was using it as a user at that time, of course I didn't know the file format of SoundFont, and I wasn't familiar with MIDI.)

Arrange the tones on Excel in a space of 256 * 256, and set the Timidity setting file and DAW tone file (instrument name, program number, bank number correspondence table. If set, the instrument name will be displayed in the DAW GUI. I created it because it is displayed.) I wrote it by hand. image.png

There were a lot of SoundFonts I wanted to add, but I assigned them in Excel and added the config file by hand ... it's annoying.

Virtual MIDI Synth from Timidity

After a major update of Win10 around this summer, Timidity ++ finally stopped working. In the first place, Timidity ++ is running with an unsigned driver, and it has not been updated, so I thought it was about to be the limit and switched

Virtual MIDI Synth has been introduced as a virtual MIDI sound source for transfer.

However, ** there is no SoundFont mapping function **. It is possible to load SoundFonts from multiple files, but it seems that the sound of the SoundFont file with high priority will be used in the event of a collision. This was a problem.

So, let's rewrite the SoundFont file to prevent collisions, and then automatically generate the tone map. <Long introduction

specification

I added the deletion of the tone color because some of the SoundFont files contained garbage tone color data, and the omission of the tone color name was added because the tone color name was too long and the DAW display became strange.

Note that the Music Studio Producer and Virtual MIDI Synth configuration files are not explained here.

SoundFont file structure

Specifications: http://freepats.zenvoid.org/sf2/sfspec24.pdf

SoundFonts are stored in RIFF format. RIFF stores data in units called chunks, and chunks are made up of IDs, sizes, and data.

RIFF structure

** Basic structure of chunks **

item size Remarks
Chunk ID 4byte Chunk identifier(RIFF/LIST etc.)
Data size 4byte Data size (little endian)
data Nbyte

In addition, the first chunk, the RIFF chunk, and the LIST chunk, which combines multiple chunks, are prepared as special chunks. (Chunks other than RIFF and LIST cannot contain chunks.)

** RIFF chunk structure **

item size Remarks
Chunk ID 4byte RIFF
Data size 4byte N+4
File identifier 4byte Identifier of the data stored in the RIFF file(For SoundFont sfbk)
data Nbyte Contains chunks and LIST chunks

** LIST chunk structure **

item size Remarks
Chunk ID 4byte LIST
Data size 4byte N+4
List identifier 4byte Identifier of the data stored in the list(INFO/data etc.)
data Nbyte Contains chunks and LIST chunks

RIFF files can use these chunks to represent nested structure data. image.png

Since the size of the data part is written at the beginning of all chunks, unnecessary chunks can be skipped. Therefore, the purpose can be achieved by implementing that only the chunks related to the bank number and preset number of the SoundFont chunks are processed and the chunks after that are skipped as they are.

RIFF structure of sound files

The RIFF structure of the SoundFont is as follows. Of these, the chunks under pdta contain the instrument name and preset number. image.png

pdta contains sub-chunks for ** presets **, ** instruments **, and ** samples **. Of these, an instrument is a unit that is used inside a SoundFont as a unit that combines multiple samples, and a preset is a unit that is used by a user as a group of multiple instruments. Therefore, this time we will only use preset-related sub-chunks.

Note that sub-chunks are stored as an array of structures, and the value at the end is a special value that indicates the end. Also, the size is an integral multiple of sizeof (structure).

phdr sub-chunk

The phdr sub-chunk contains header information (preset instrument name, bank, preset number, etc.).

struct phdr {
  char achPresetName[20];  //Preset name null terminating ascii
  WORD wPreset;  //Preset number
  WORD wBank;  //Bank number 0~127 for musical instruments 128 for percussion
  WORD wPresetBagNdx;  //index at the beginning of pbag
  DWORD dwLibrary; //Reservation 0
  DWORD dwGenre; //Reservation 0
  DWORD dwMorphology; //Reservation 0
}

Note that wPresetBagNdx must be incremented in order from the beginning of phdr.

Initially, I overlooked this specification, and thought that it would be okay if I rewrote only phdr and erased unnecessary ones, and as a result of implementing it, the sound was different. You also need to edit the pbag, pmod, and pgen subchunks to meet this specification.

The value of the end (EOP) of phdr is as follows.

Variable name value
achPresetName EOP
wPreset 0
wBank 0
wPresetBagNdx index at the end of pbag
dwLibrary 0
dwGenre 0
dwMorphology 0

pbag sub-chunk

The pbag subchunk contains information that indicates which modulation (pmod) and generator (pgen) to use in the preset. The association between a preset and pbag is from the pbag pointed to by wPresetBagNdx of one preset to the pbag of wPresetBagNdx-1 of the next preset. (Therefore, it is possible to associate multiple pbags with one preset)

struct pbag {
  WORD wGenNdx;  //index at the beginning of pgen
  WORD wModNdx; //index at the beginning of pmod
}

Like phdr, wGenNdx and wModNdx need to be incremented from the beginning of the pbag.

The value at the end of pbag is as follows.

Variable name value
wGenNdx index at the end of pgen
wModNdx index at the end of pmod

pgen sub-chunk

The pgen sub-chunk stores parameter information (generators) such as instruments, volumes, and filters associated with presets.

The contents are in the key value format of parameter types and values.

struct pgen {
  WORD sfGenOper; //Parameter type
  WORD genAmount;  //Parameter value
}

Note that genAmount contains two byte, short, or word type values depending on the type of parameter. (The size is fixed to word.)

The value at the end of pbag is as follows.

Variable name value
sfGenOper 0
genAmount 0

pmod sub-chunk

The pmod sub-chunk contains information that associates how the sound changes (changes volume, filters) from dynamic parameters such as MIDI control changes and velocities.

struct pmod {
  WORD sfModSrcOper; //Modulation source parameter type(CC, velocity, etc.
  WORD sfModDestOper; //Types of parameters to operate(Volume, filter strength, etc.)
  SHORT modAmount; //Operation amount
  WORD sfModAmtSrcOper; //Types of modulation source parameters that change the amount of modulation manipulation
  WORD sfModTransOper; //Convert the input operation amount(Linear, curved)
}

The value at the end of pmod is as follows.

Variable name value
sfModSrcOper 0
sfModDestOper 0
modAmount 0
sfModAmtSrcOper 0
sfModTransOper 0

Sub-chunk relationship

image.png

Looking at the relationship between each sub-chunk, it looks like this.

For example, in the example of this figure, preset 0 is associated with bag0 and bag1, bag0 is associated with gen0, gen1 and mod0, and bag1 is associated with gen2 with mod1 and mod2, so the generator used in the preset is gen0 ~ The image of gen2 and modulation is mod0 ~ mod2.

(It may not be correct because it does not read the specifications, but it seems that the generator and modulation are associated with each bag and make a sound, but you do not have to worry too much as far as you can touch the file.)

Soundfont parser

Source code: https://github.com/mmitti/sf2conv/blob/master/riff.py

I have created a script using Python and the struct module that can parse (part of) the structure of RIFF and SoundFont.

It supports reading and writing RIFF chunks, LIST chunks, phdr, pbag, pmod, and pgen sub-chunks. Also, since the other chunks are not edited, the read one is written as it is.

When deleting phdr, it is necessary to delete the corresponding ones for pbag, pmod, and pgen, so the update process is performed at the time of writing.

(I implemented it during the waiting time of the driving school during the summer vacation, but it's dirty now. What is RiffRoot or Element?

A program that maps SoundFonts

Source code: https://github.com/mmitti/sf2conv/blob/master/main.py

I made a script that converts the SoundFont using the above Riff parser (or rather the SoundFont parser) and spits out the tone map of MusicStudio Producer and the configuration file of Virtual MIDI Synth.

Simply write the SoundFont to be input to the json file, the tone to be excluded, the replacement rule of the tone name, etc., rewrite the SoundFont file using this, and spit out the tone name, bank, and program number to the tone map. It is a script.

There are sounds with strange names (such as ------) and SoundFonts with wind instruments assigned to the program numbers of the piano, and as a result of increasing the rules that can be set, the number of setting items has increased. However, once I wrote the setting file, the tones are assigned to the empty parts and added to the DAW's tones list, making it easier to add new SoundFonts.

in conclusion

This time, because the DTM environment was damaged, I had to look inside the SoundFont, which deepened my understanding a little. Since it became easier to add SoundFonts with the script, I introduced sinfon that I wanted to use immediately and typed in one song.

After the FPGA USB MIDI device I'm making is completed, I'd like to make a MIDI sound module that reads SoundFonts, so I may research and write about SoundFonts soon.

See you again

Recommended Posts

SoundFont mapping problem
Probability problem