ESP32DrumPad is drum pad MIDI controller.
For those of you who are still new to MIDI, please read the following references: https://www.music.mcgill.ca/~ich/classes/mumt306/StandardMIDIfileformat.html
A MIDI controller is any hardware or software that generates and transmits Musical Instrument Digital Interface (MIDI) data to MIDI-enabled devices, typically to trigger sounds and control parameters of an electronic music performance. They most often use a musical keyboard to send data about the pitch of notes to play, although a MIDI controller may trigger lighting and other effects. A wind controller has a sensor that converts breath pressure to volume information and lip pressure to control pitch. Controllers for percussion and stringed instruments exist, as well as specialized and experimental devices. Some MIDI controllers are used in association with specific digital audio workstation software. The original MIDI specification has been extended to include a greater range of control features.
Keyboards can be used to trigger drum sounds, but are impractical for playing repeated patterns such as rolls, due to the length of key travel. After keyboards, drum pads are the next most significant MIDI performance controllers. Drum controllers may be built into drum machines, may be standalone control surfaces, or may emulate the look and feel of acoustic percussion instruments. MIDI triggers can also be installed into acoustic drum and percussion instruments. The pads built into drum machines are typically too small and fragile to be played with sticks, and are played with fingers.
In this project, we will make a drum pad MIDI controller with piezoelectric ceramic as a transducer. This transducer will convert the pressure received into electrical voltage. The electric voltage is converted by an analog to digital converter (ADC) into an integer. This number will be operated with logic and mathematical formulas to get the velocity value of the blows received by the drum pad. Velocity values between 0 and 127 according to MIDI standards. A drum pad consists of several pads, each pad will represent a certain instrument such as kick drum, snare drum, hat, tom, cymbals, and so on.
- 15 instument pads (16 for ESP32 S3). We will only develop MIDI controller with 12 pads
- enable velocity
- low latency
- WIFI access point
- WIFI client
- built-in web server
- configurable via web application
- support USB
- support websocket
- support bluetooth low energy (BLE)
ESP32 has 15 pin ADC and each pin can read analog signal to be converted into velocity.
ESP32 read signal as analog so it can be converted into velocity. ESP32 maybe read more than one sample of signal to get more accurate value.
ESP32 read each ADC asynchronously so it has low latency.
WIFI access point is used to connect to ESP32 before it connected to other WIFI. User can also configure MIDI controller via this access point by accessing MIDI controller IP address.
ESP32 can connect to other WIFI so it can send MIDI signal over internet. User can also configure MIDI controller by accessing MIDI controller IP address.
Web server is used to configure MIDI controller.
User can configure instrument, threshold, and velocity to each pads via web application. User can access an URL after connected to access MIDI controller access point.
HTML, CSS and JavaScript code must be written in such a way that it can be compressed into the smallest possible size. One byte of resource on the ESP32 means so much that inefficient code becomes a very big problem, even fatal. Images, whether JPEG, GIF or PNG, are not permitted at all because they will take up so many resources unnecessarily.
The favicon format is SVG so it is easier to create compared to raster formats such as JPEG, PNG, ICO, or GIF. SVG has better image quality with smaller file size.
Screenshot
The arrangement of the pads on the configuration page depends on the arrangement set by the developer before the program is deployed into the ESP32. This arrangement is fixed and cannot be changed by the user. Users can only change the instrument, threshold, headroom and duration on each pad.
ESP32 can send MIDI signal over USB and it recognized as MIDI controller.
ESP32 can send MIDI signal over LAN using WebSocket protocol.
ESP32 can send MIDI signal over bluetooth and it recognized as MIDI controller.
Key | Note | Drum Sound |
---|---|---|
35 | B1 | Acoustic Bass Drum |
36 | C2 | Bass Drum 1 |
37 | C#2 | Side Stick |
38 | D2 | Acoustic Snare |
39 | D#2 | Hand Clap |
40 | E2 | Electric Snare |
41 | F2 | Low Floor Tom |
42 | F#2 | Closed Hi Hat |
43 | G2 | High Floor Tom |
44 | G#2 | Pedal Hi-Hat |
45 | A2 | Low Tom |
46 | A#2 | Open Hi-Hat |
47 | B2 | Low-Mid Tom |
48 | C3 | Hi Mid Tom |
49 | C#3 | Crash Cymbal 1 |
50 | D3 | High Tom |
51 | D#3 | Ride Cymbal 1 |
52 | E3 | Chinese Cymbal |
53 | F3 | Ride Bell |
54 | F#3 | Tambourine |
55 | G3 | Splash Cymbal |
56 | G#3 | Cowbell |
57 | A3 | Crash Cymbal 2 |
58 | A#3 | Vibraslap |
59 | B3 | Ride Cymbal 2 |
60 | C4 | Hi Bongo |
61 | C#4 | Low Bongo |
62 | D4 | Mute Hi Conga |
63 | D#4 | Open Hi Conga |
64 | E4 | Low Conga |
65 | F4 | High Timbale |
66 | F#4 | Low Timbale |
67 | G4 | High Agogo |
68 | G#4 | Low Agogo |
69 | A4 | Cabasa |
70 | A#4 | Maracas |
71 | B4 | Short Whistle |
72 | C5 | Long Whistle |
73 | C#5 | Short Guiro |
74 | D5 | Long Guiro |
75 | D#5 | Claves |
76 | E5 | Hi Wood Block |
77 | F5 | Low Wood Block |
78 | F#5 | Mute Cuica |
79 | G5 | Open Cuica |
80 | G#5 | Mute Triangle |
81 | A5 | Open Triangle |
1000nnnn0kkkkkkk0vvvvvvv
is note off event where nnnn is channel number (0 - 15), kkkkkkk
is key (0 to 127), vvvvvvv
is velocity (0 - 127).
1001nnnn0kkkkkkk0vvvvvvv
is note on event where nnnn is channel number (0 - 15), kkkkkkk
is key (0 to 127), vvvvvvv
is velocity (0 - 127).
So, to send Bass Drum 1 with velocity 90 percent, MIDI controller will send 1001100100010010001110010
where first 1001
represent note on, second 1001
represent channel number (channel 10), 00100100
represent Bass Drum 1 code (36), and 01110010
represent velocity 114 (90 percent).
To send note off event, MIDI controller can send 1000100100010010001110010
.
We need an ESP32 expansion board so that we have enough pins for the piezoelectric sensor with minimal effort. Each piezoelectric sensor will be connected to the expansion board with two cables. Because GND and analog input are separated by VCC, we need 3x1 dupont headers for each piezoelectric sensor.
We do not need the VCC pin to supply power to the piezoelectric sensor because the piezoelectric sensor will generate an electrical voltage of itself when it receives enough vibrations. Thus, we can remove the VCC jumper pin. However, if we want to add another device, for example a sound generator that will be integrated with the MIDI controller that we are going to make, we can choose to use a jumper and set it according to the voltage we need.
We still use the built-in USB on the ESP32 board to connect it to the PC because we will use this USB as a MIDI controller. The USB on the expansion board is only used as a power supply if needed or in scenarios using BLE (Bluetooth Low Energy) to transmit MIDI signals to the PC.
GPIO | Input | Output | Pad CH | Notes |
---|---|---|---|---|
0 | pulled up | OK | outputs PWM signal at boot, must be LOW to enter flashing mode | |
1 | TX pin | OK | debug output at boot | |
2 | OK | OK | CH 14 | connected to on-board LED, must be left floating or LOW to enter flashing mode |
3 | OK | RX pin | HIGH at boot | |
4 | OK | OK | CH 15 | |
5 | OK | OK | outputs PWM signal at boot, strapping pin | |
6 | x | x | connected to the integrated SPI flash | |
7 | x | x | connected to the integrated SPI flash | |
8 | x | x | connected to the integrated SPI flash | |
9 | x | x | connected to the integrated SPI flash | |
10 | x | x | connected to the integrated SPI flash | |
11 | x | x | connected to the integrated SPI flash | |
12 | OK | OK | CH 11 | boot fails if pulled high, strapping pin |
13 | OK | OK | CH 12 | |
14 | OK | OK | CH 10 | outputs PWM signal at boot |
15 | OK | OK | CH 13 | outputs PWM signal at boot, strapping pin |
16 | OK | OK | (U2_RXD) | |
17 | OK | OK | (U2_TXD) | |
18 | OK | OK | (SCK / VSPI_SCL) | |
19 | OK | OK | (MISO) | |
21 | OK | OK | (SDA) | |
22 | OK | OK | (SCL) | |
23 | OK | OK | (MOSI) | |
25 | OK | OK | CH 7 | |
26 | OK | OK | CH 8 | |
27 | OK | OK | CH 9 | |
32 | OK | OK | CH 5 | |
33 | OK | OK | CH 6 | |
34 | OK | CH 3 | input only | |
35 | OK | CH 4 | input only | |
36 | OK | CH 1 | input only | |
39 | OK | CH 2 | input only |
The ESP32 WROOM-32 only allows us to create drum pads with 15 channels. This is because there are only 15 pins that can function as ADC (Analog to Digital Converter). However, our application allows the use of an ESP32 S3 which has a 16 pin ADC. In this project, we will only use 12 channel of ADC due it's limitation. So, channel 13, 14 and 15 will be removed.
When using the ESP32 WROOM-32, the application must be limited so that it does not scan pin 16, causing the application to become unstable. This setting is hard coded in the application before flashing it to the ESP32.
Keep in mind that GPIO 2 is connected to the onboard LED. The onboard LED can no longer be used as an indicator because it is already used as an ADC input. In addition, reading the ADC input must also be done after the ESP32 has finished booting. We also have to remove the onboard LED so that GPIO 2 can be used effectively for the ADC thereby preventing a voltage drop entering the ADC and causing reading errors.
Pad | Pin |
---|---|
1 | GPIO 36 |
2 | GPIO 39 |
3 | GPIO 34 |
4 | GPIO 35 |
5 | GPIO 32 |
6 | GPIO 33 |
7 | GPIO 25 |
8 | GPIO 26 |
9 | GPIO 27 |
10 | GPIO 14 |
11 | GPIO 12 |
12 | GPIO 13 |
13 | GPIO 15 |
14 | GPIO 2 |
15 | GPIO 4 |
Pad | Pin |
---|---|
1 | GPIO 36 |
2 | GPIO 39 |
3 | GPIO 34 |
4 | GPIO 35 |
5 | GPIO 32 |
6 | GPIO 33 |
7 | GPIO 25 |
8 | GPIO 26 |
9 | GPIO 27 |
10 | GPIO 14 |
11 | GPIO 12 |
12 | GPIO 13 |
With 12 pads, we can use GPIO 2 as a system indicator and don't need to remove the onboard LED.
The MIDI controller configuration is stored in EEPROM which ensures that the configuration is maintained even if the MIDI controller is not powered. This data will be read every time the MIDI controller is turned on and when the configuration is required by the software. The configuration will be rewritten to EEPROM every time the user makes configuration changes.
Each channel has an EEPROM address allocation to store the channel configuration. Even though the perchannel configuration data type is string, this data is divided into several segments that have different data types.
We only have 512 bytes (4096 bits) of EEPROM space as storage media. We have to manage it so that it can accommodate dozens of data of various types and sizes. This allocation is not ideal for normal use.
With the internal EEPROM built in on the ESP32, we can map data addresses as follows:
Data | Offset | Length | Type | Description |
---|---|---|---|---|
offsetAPSSID | 0 | 30 | String | SSID AP |
offsetAPSSIDPassword | 30 | 30 | String | Password AP |
offsetAPIP | 60 | 4 | IPAddress | IP Address AP |
offsetAPGateway | 64 | 4 | IPAddress | Gateway AP |
offsetAPSubnet | 68 | 4 | IPAddress | Subnet AP |
offsetAPHidden | 72 | 1 | Byte | Hidden AP |
offsetWifiEnable | 73 | 1 | Byte | Enable Wifi |
offsetWifiSSID | 74 | 30 | String | SSID Wifi |
offsetWifiSSIDPassword | 104 | 30 | String | Password Wifi |
offsetWSEnable | 134 | 1 | Byte | Enable WS |
offsetWSScheme | 135 | 5 | String | Scheme WS |
offsetWSHost | 140 | 40 | String | Host WS |
offsetWSPort | 180 | 2 | Word | Port WS |
offsetWSPath | 182 | 50 | String | Path WS |
offsetWSUsername | 232 | 20 | String | Username WS |
offsetWSPassword | 252 | 20 | String | Password WS |
offsetConfigured | 272 | 1 | Byte | Flag that controller was configured |
offsetSolo | 273 | 1 | Byte | Flag solo channel |
offsetSoloChannel | 274 | 1 | Byte | Solo channel number |
offsetMCUser | 275 | 20 | String | MIDI Controller username |
offsetMCPassword | 295 | 20 | String | MIDI Controller password |
offsetMidiChannel | 315 | 1 | Byte | MIDI channel |
offsetReadInterval | 316 | 4 | DWord | Read analog interval |
offsetCh1 | 320 | 10 | String | Channel 1 |
offsetCh2 | 330 | 10 | String | Channel 2 |
offsetCh3 | 340 | 10 | String | Channel 3 |
offsetCh4 | 350 | 10 | String | Channel 4 |
offsetCh5 | 360 | 10 | String | Channel 5 |
offsetCh6 | 370 | 10 | String | Channel 6 |
offsetCh7 | 380 | 10 | String | Channel 7 |
offsetCh8 | 390 | 10 | String | Channel 8 |
offsetCh9 | 400 | 10 | String | Channel 9 |
offsetCh10 | 410 | 10 | String | Channel 10 |
offsetCh11 | 420 | 10 | String | Channel 11 |
offsetCh12 | 430 | 10 | String | Channel 12 |
offsetCh13 | 440 | 10 | String | Channel 13 |
offsetCh14 | 450 | 10 | String | Channel 14 |
offsetCh15 | 460 | 10 | String | Channel 15 |
offsetCh16 | 470 | 10 | String | Channel 16 |
offsetTestVelocity | 480 | 1 | Byte | Test velocity |
So far we do not need additional storage media.
If we want ideal space for storage, we can add an external EEPROM module connected to the ESP32 via SPI or I2C. For example, the AT24C256 Module can provide a much larger storage capacity of up to 256 kilo bytes with I2C communication.
The wiring of AT24C256 is as follows:
AT24C256 | ESP32 |
---|---|
VCC | VIN 3.3V |
GND | GND |
SCL | SCL (GPIO 22) |
SDA | SDA (GPIO 21) |
We can use 8 AT24C256 simultaneously on the same bus. By configuring the address pins, we can determine the AT24C256 addresses ranging from 80 to 87 with the following configuration:
A2 | A1 | A0 | Hexadecimal | Decimal |
---|---|---|---|---|
0 | 0 | 0 | 50 | 80 |
0 | 0 | 1 | 51 | 81 |
0 | 1 | 0 | 52 | 82 |
0 | 1 | 1 | 53 | 83 |
1 | 0 | 0 | 54 | 84 |
1 | 0 | 1 | 55 | 85 |
1 | 1 | 0 | 56 | 86 |
1 | 1 | 1 | 57 | 87 |
The AT24C128/256 has a hardware data protection scheme that allows the user to write protect the whole memory when the WP pin is at VCC.
Offset | Length | Type | Range | Usage |
---|---|---|---|---|
0 | 1 | Byte | 0-127 | Configuration flag |
1 | 1 | Byte | 0-127 | Instrument code |
2 | 2 | Word | 0-65535 | Threshold (minimum value) |
4 | 2 | Word | 0-65535 | Head room |
6 | 4 | DWord | 0-4294967295 | Duration (micro second) |
Configuration flag, instrument code, threshold, head room and duration are read separately by different functions. To read data with the byte
type, the function will read one byte of data at the specified address. To read data with type word
, the function will read two bytes of data starting from the specified address. To read data with double word
type, the function will read four bytes of data starting from the specified address.
void readChannelConfig(int channel)
{
int offset = getChannelOffset(channel);
byte configured = readByte(offset);
byte instrumentCode = readByte(offset + 1);
uint16_t threshold = readWord(offset + 2);
uint16_t headRoom = readWord(offset + 4);
uint32_t duration = readDoubleWord(offset + 6);
// add code here
}
int getChannelOffset(int channel)
{
return memOffset + ((channel - 1) * memSize);
}
To convert average pressure into velocity, the calculation uses the following parameters:
- 12 bit voltage input (0 - 4095) is obtained from the average voltage on the ADC PIN
- 12 bit threshold (0 - 4095) which will reduce the original average voltage to prevent vibrations from one pad propagating to other pads. The smaller the threshold, the weaker the strike will produce vibrations on other pads that are not hit. The greater the threshold, the less sensitive the pad is, which causes the player to have to hit faster to get high velocity.
- head room (0 - 4095) is the maximum tension produced by the fastest shot of the average player. The smaller the headroom, the smaller the head will produce high velocity.
The threshold will basically have almost the same value for all pads because the threshold prevents nearby pads from being accidentally triggered if the pad next to it is hit at very high speed. Meanwhile, head room can have very varying values depending on the instrument used. For example, the snare may require a harder hit than the hat.
The sampling interval must be precise. A small interval will reduce the latency between the time the pad is hit and the time the note ON signal is sent. Conversely, if the interval is too small, then the ESP32 cannot capture the maximum voltage which represents the player's speed when hitting the pad.
The duration must be set in such a way that when the next voltage is read, the voltage value is below the threshold. This is intended to prevent the ESP32 from sending the ON note twice when the player hits the pad once. Thus, the number of note ON signals sent by the ESP32 is equal to the number of pad strokes made by the player.
Default values which are test results must be provided by the application.
The ESP32 takes several voltage samples and then averages them. This average value is then converted into velocity.
The average voltage is not the peak voltage but rather the average value taken from several samples when a voltage above the threshold is detected. This voltage may be far from the peak value depending on the sampling time. If the sampling time is close to the peak voltage, then the data-average voltage will be close to the peak voltage value. On the other hand, if the sampling time is far from the peak voltage, then the average data voltage will be far from the peak voltage value.
In the example above, it is known that the threshold is equal to 383 and the head room is equal to 2059. The voltage that will be converted to velocity is above 0.309450549 volts while voltages below 0.309450549 volts will be ignored. Meanwhile, any voltage will be reduced by 0.309450549 volts. A voltage of 2.0625 volts will be converted to a velocity of 127 while a voltage of 0.309450549 volts will be converted to a velocity of 0. The relationship between voltage and velocity is as follows:
velocity = 127 * (voltage - 0.309450549 / (2.0625 - 0.309450549))
where voltage
is the average voltage of several voltage samples taken.
We can write the diagram above into the following function:
uint8_t calcVelocity(uint16_t inp, uint16_t thd, uint16_t headRoom)
{
uint16_t inp2 = inp >= thd ? inp - thd : 0;
float outp2 = 127 * (float) inp2 / (float)(headRoom - thd);
if(outp2 > 127)
{
outp2 = 127;
}
return (uint8_t) round(outp2);
}
Duration in general is the time calculated from the note on
signal to the next note off
signal on the same channel and instrument or note. When it comes to drum pads, duration is very important because this will affect the speed at which the player plays the drums. The default value of the duration must be provided by the application.
Drum pads can be equipped with TFT displays that allow users to see the status of the drum pad. However, the display is not an important feature considering the limited resources of the ESP 32.
Some modules that can be used for displays are as follows:
Module | Com | Resolution | Color | Touch | SD Card |
---|---|---|---|---|---|
OLED | IIC | 128 x 64 | White | No | No |
ST7735 | SPI | 128 x 160 | RGB | No | Yes |
ili9341 | SPI | 240 x 320 | RGB | Yes | Yes |
OLED | ESP32 |
---|---|
VCC | VIN 3.3V |
GND | GND |
SCL | SCL (GPIO 22) |
SDA | SDA (GPIO 21) |
ST7735 | ESP32 |
---|---|
LED | VIN 3.3V |
SCK | CLK (GPIO 18) |
SDA | MOSI (GPIO 23) |
A0 | GPIO 4 |
RESET | RESET |
CS | CSO (GPIO 5) |
GND | GND |
VCC | VIN 3.3V |
ili9341 | ESP32 |
---|---|
LED | VIN 3.3V |
SCK | CLK (GPIO 18) |
MOSI | MOSI (GPIO 23) |
MISO | MISO (GPIO 19) |
D/C | GPIO 4 |
RESET | VIN 3.3V |
CS | CSO (GPIO 5) |
GND | GND |
VCC | VIN 3.3V |
Please support our YouTube channel by subsribe it at https://www.youtube.com/@maliktamvan