/Birds

ML-based bird detection and classification tools and devices

Primary LanguageC++MIT LicenseMIT

Birds

ML-based bird detection and classification tools and devices

WIP

I looked at several BirdNET-Pi repos on git hub, and settled on the one from ppdpauw (git@github.com:pddpauw/BirdPi.git). <add reasons why, and what I had to do here>

I tried a variety of different approaches to get high-quality audio input, including:

  • creating a remote microphone (and camera) unit that streams RTSP audio (and video) to an instance of BirdNET-Pi
  • tried a large variety of different microphones and ADCs connected directly to the Raspi running BirdNET-Pi.
  • tried a number of different outdoor WebCams as RTSP sources

BirdPi Sensor

Notes

  • Audio-based bird detector

    • listen for bird sounds
    • classify bird sounds with ML model (e.g., BirdNET)
    • when given birds are detected (with a given confidence level)
      • take a picture
      • send a notification
      • log time/date
    • development process
      • start running ML model and audio capture on desktop
      • run ML model server and logic on desktop
        • capture and stream audio from ESP32-S3 Sense device
      • run ML model and logic on ESP device, send notifications, log on server (or local file system on SD card)
  • Links

  • Birdpi

  • BirdNet-Pi [DEPRECATED]

    • command lines
      • ~/BirdNET-Pi/scripts/birdnet_recording.sh
        • fmpeg -hide_banner -loglevel info -nostdin -vn -thread_queue_size 512 -i rtsp://192.168.166.115:8554 -map 0:a:0 -t 15 -acodec pcm_s16le -ac 2 -ar 48000 file:/home/jdn/BirdSongs/StreamData/2023-10-19-birdnet-RTSP_1-11:24:37.wav
        • change acodec to pcm_s16be and ar to 16000
      • ?
        • fmpeg -hide_banner -loglevel info -nostdin -vn -thread_queue_size 512 -i rtsp://192.168.166.115:8554 -map 0:a:0 -t 15 -acodec pcm_s16le -ac 2 -ar 48000 file:/home/jdn/BirdSongs/StreamData/2023-10-19-birdnet-RTSP_1-11:24:37.wav
      • ?
        • ffmpeg -hide_banner -loglevel info -nostdin -vn -thread_queue_size 512 -i rtsp://192.168.166.115:8554 -map 0:a:0 -t 15 -acodec pcm_s16le -ac 2 -ar 48000 file:/home/jdn/BirdSongs/StreamData/2023-10-14-birdnet-RTSP_1-15:57:13.wav
      • ?
        • ffmpeg -nostdin -loglevel error -ac 1 -i rtsp://192.168.166.115:8554 -acodec libmp3lame -b:a 320k -ac 1 -content_type audio/mpeg -f mp3 icecast://source:birdnetpi@localhost:8000/stream -re
    • ?
      • arecord -f S16_LE -c1 -r48000 -t wav --max-file-time 15 -D dsnoop:CARD=Device,DEV=0 --use-strftime /home/jdn/BirdSongs/%B-%Y/%d-%A/%F-birdnet-%H:%M:%S.wav
    • ?
      • sox -V1 /home/jdn/BirdSongs/October-2023/26-Thursday/2023-10-26-birdnet-16:11:06.wav -n remix 1 rate 24k spectrogram -c BirdSongs/October-2023/26-Thursday/2023-10-26-birdnet-16:11:06.wav -o /home/jdn/BirdSongs/Extracted/spectrogram.png
    • ?
      • ffmpeg -nostdin -loglevel error -ac 1 -f alsa -i dsnoop:CARD=Device,DEV=0 -acodec libmp3lame -b:a 320k -ac 1 -content_type audio/mpeg -f mp3 icecast://source:birdnetpi@localhost:8000/stream -re
    • Settings
      • use the GLOBAL_6K_V2.4_Model_FP16 model
      • enter lat/lon
      • setup Apprise Notifications
        • select notification mechanism
          • Google Chat: ?
          • Home Assistant: ?
          • IFTTT: ?
          • MQTT: ?
          • Telegram: ?
          • Twitter: ?
          • WhatsApp: ?
          • SMS
            • Twilio: ?
            • Vonage: ?
          • Desktop: ?
          • Email: ?
        • set white-/black-list birds
        • select min time between notifications of same species
    • Advanced Settings:
      • purge old files when disk full (or keep)
      • Audio Settings
        • Audio Card:
          • 'default': use PulseAudio (always recommended)
          • else: use ALSA sound card device from "arecord -L" list
            • USB mic: "dsnoop:CARD=Device,DEV=0"
            • ?
        • Audio Channels: 1 [1-2]
        • Recording Length: 15 seconds [6-60] (multiples of 3 recommended)
        • Extraction Length: [min=3, max=Recording Length]
        • Extractions Audio Format: s16
        • RSTP Stream: (multiple streams are allowed)
      • Password
      • Custom URL
    • ?
  • ESP32-S3 Sense A/V source device

    • Micro-RTSP-Audio Example
      • comes with AudioTestSource class that emits a loop of tone bursts
      • play with ffmpeg
        • ffplay -v debug rtsp://192.168.166.115:8554
      • N.B. doesn't work with vlc
      • modify example code
        • use wifi.h for credentials
        • use default port
      • Arduino
        • Ardunio2->File->Examples->Micro-RTSP-Audio->RTSPTestServer
        • Tools->Events Run on Core '0'
        • Tools->Arduino Runs on Core '0'
        • Tools->Core Debug Level 'Debug'
      • VSCode/PlatformIO
        • ? having trouble making this work ?
      • make different audio test sources
        • edit AudioTestSource.h
          • ?
        • select RTSP format
          • https://en.wikipedia.org/wiki/RTP_payload_formats
            • original code defaulted to 2 byte PCM info with 16000 samples per second on 1 channel
              • p_fmt = new RTSPFormatPCM();
          • RTSPFormat.h
            • PCMInfo() class defines: sample rate, channels, sample size
            • PCMFormatPCM() class can use defaults or provide pointer to PCMInfo object
    • ESP32 AV Source
  • XIAO ESP32-S3 Sense Microphone

    • configuration
      • I2S.setAllPins(-1, 42, 41, -1, -1);
      • I2S.begin(PDM_MONO_MODE, 16000, 16);
    • Pulse Density Modulation
      • each sample is in int16_t data format
    • from: https://wiki.seeedstudio.com/xiao_esp32s3_sense_mic/
      • It should be noted that for the current ESP32-S3 chip, we can only use PDM_MONO_MODE and the sampling bit width can only be 16bit. only the sampling rate can be modified, but after testing, the sampling rate at 16kHz is relatively stable.
  • Audio tools

    • audacity
      • good visualization and editing of audio streams
    • ffmpeg
    • ffplay: media player using ffmpeg libraries
      • options -decoders: list decoders -acodec : audio codec, pcm_s16be, pcm_s16le, etc. -vn: no video
    • jaaa: JACK and ALSA audio analyzer *
    • jackd: JACK audio connection kit sound server
      • jackd1 and jackd2 exist
    • jnoisemeter: measure audio test signals
      • depends on Jack
      • can measure S/N ratio of sound card
    • rec
      • record from default device to WAV file
        • rec --channels 1 -b 24 ./.wav
    • sox
      • print information about file
        • sox -i ./.wav
      • print statistics on audio in file
        • sox ./.wav -n stat
      • print stats for all wav files in a dir
        • for f in *.wav; do echo "----"; echo $f; sox $f -n stat; sox $f -n stats; done
    • soxi: get information about audio file
      • equivalent to sox --i
    • spek: audio spectrum analyzer
      • give it an audio file and it displays spectrogram and info on the file
    • ?
  • USB ADCs

    • Brand: USB type, bias voltage, linux dev
    • UGREEN: Type A, 0V, n/a ==> USB device not recognized
      • ?
    • UGREEN: Type C, 0V, n/a ==> USB device not recognized
      • ?
    • JSAUX: Type A, 0V, n/a ==> USB device not recognized
      • ?
    • JSAUX i/o: Type A, 2.7V, ?
      • Audio Card: "snoop:CARD=AUDIO,DEV=0"
      • dmesg
        • ?
      • arecord -t wav -c 1 -r 48000 -f S16_LE -D dsnoop:CARD=AUDIO,DEV=0 /tmp/foo.wav
        • low volume
    • CULILUX: Type C, 0V, Generic USB-C Audio Adapter
      • Audio Card: "snoop:CARD=Adapter,DEV=0"
      • dmesg
        • New USB device found, idVendor=0bda, idProduct=4926, bcdDevice= 0.05 Product: USB-C Audio Adapter input: Generic USB-C Audio Adapter as /devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.3/0003:0BDA:4926.0003/input/input4 hid-generic 0003:0BDA:4926.0003: input,hidraw0: USB HID v1.11 Device [Generic USB-C Audio Adapter] on usb-0000:01:00.0-1.1/input3
      • ?
    • SABRENT: Type A, 2.7V, USB Audio Device
      • input and output capable, different connectors
        • use the pink connector
      • Audio Card: "snoop:CARD=Device,DEV=0"
      • dmesg
        • New USB device found, idVendor=0d8c, idProduct=0014, bcdDevice= 1.00 Product: USB Audio Device Manufacturer: C-Media Electronics Inc. input: C-Media Electronics Inc. USB Audio Device as /devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.3/0003:0D8C:0014.0004/input/input5 hid-generic 0003:0D8C:0014.0004: input,hidraw0: USB HID v1.00 Device [C-Media Electronics Inc. USB Audio Device] on usb-0000:01:00.0-1.1/input3
      • arecord -t wav -c 1 -r 48000 -f S16_LE -D dsnoop:CARD=Device,DEV=0 /tmp/foo.wav
        • low volume
    • MCSPER: Type A, 0V, ?
      • input and output capable, same connector
      • high-speed USB capable?
      • Audio Card: "snoop:CARD=Audio,DEV=0"
      • dmesg
        • New USB device found, idVendor=001f, idProduct=0b21, bcdDevice= 1.00 Product: TX USB Audio Manufacturer: TX Co.,Ltd Warning! Unlikely big volume range (=11520), cval->res is probably wrong. [2] FU [PCM Playback Volume] ch = 1, val = -11520/0/1 Warning! Unlikely big volume range (=8191), cval->res is probably wrong. [5] FU [Mic Capture Volume] ch = 1, val = 0/8191/1 input: TX Co.,Ltd TX USB Audio as /devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.3/0003:001F:0B21.0005/input/input6 hid-generic 0003:001F:0B21.0005: input,hidraw0: USB HID v2.01 Device [TX Co.,Ltd TX USB Audio] on usb-0000:01:00.0-1.1/input3
    • GeneralPlus, Type A, 2.44V, USB Audio Device
      • white dongle, with input and output
      • Audio Card: "snoop:CARD=Device,DEV=0"
      • dmesg
        • USB HID v2.01 Device [GeneralPlus USB Audio Device]
  • Microphones

    • USB Mic
      • PNP Sound Device
      • Audio Card: "snoop:CARD=Device,DEV=0"
      • dmesg:
        • New USB device found, idVendor=08bb, idProduct=2902, bcdDevice= 1.00 Product: USB PnP Sound Device Manufacturer: C-Media Electronics Inc. input: C-Media Electronics Inc. USB PnP Sound Device as /devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1/1-1.1:1.2/0003:08BB:2902.0002/input/input3 hid-generic 0003:08BB:2902.0002: input,hidraw0: USB HID v1.00 Device [C-Media Electronics Inc. USB PnP Sound Device] on usb-0000:01:00.0-1.1/input2
      • arecord -t wav -c 1 -r 48000 -f S16_LE -D dsnoop:CARD=Device,DEV=0 /tmp/foo.wav
        • low volume?
    • MCm-1
      • SABRENT
        • low volume
      • MCSPER
        • nothing
      • CULILUX
        • nothing
      • JSAUX i/o
        • low volume
    • Sun Mic
      • SABRENT
        • very low volume
    • Sun Lozenge Mic
      • SABRENT
        • without battery: nothing
        • with battery: low volume
    • Yamaha Mic
      • SABRENT
        • very low volume
    • Moded Sun Mic
      • SABRENT
        • noise
    • Mic MAX4466
      • ?
    • Mic Amp - MAX4466
    • Electret Mic Amp - MAX9814
  • I2S Microphone on Raspi

    • I2S Microphones
      • Adafruit PDM MEMS Microphone
      • Adafruit MEMS Microphone SPW2430
      • Adafruit I2S Microphone
        • links
        • SPH0645LM4H
          • 1.6-3.6V input (not 5V tolerant)
        • 50-15KHz
        • connect to uctrl I2S
        • pins: Clock, Data, Left-Right (Word Select) Clock)
          • SEL: Low=left channel, High=right channel, default low/left
          • LRCL: aka WS, high=right channel Tx, low=left channel Tx
          • DOUT: data output
          • BCLK: bit clock, 2-4MHz
          • GND: ground
          • 3V: 3V3
        • can select either Left or Right channel by grounding Select pin
          • other channel opposite Select and shared Clock, WS, and Data
        • Raspi mono mic
          • install
            • "sudo apt-get -y install git raspberrypi-kernel-headers"
            • "sudo git clone https://github.com/adafruit/Raspberry-Pi-Installer-Scripts.git"
            • "sudo cd Raspberry-Pi-Installer-Scripts/i2s_mic_module"
            • "sudo make clean"
            • "sudo make"
            • "sudo make install"
            • "sudo echo 'snd-i2smic-rpi' > /etc/modules-load.d/snd-i2smic-rpi.conf"
            • "sudo echo 'options snd-i2smic-rpi rpi_platform_generation=2' > /etc/modprobe.d/snd-i2smic-rpi.conf"
            • "sudo sed -i -e 's/#dtparam=i2s/dtparam=i2s/g' /boot/config.txt"
          • manual load drivers
            • sudo modprobe snd-i2smic-rpi rpi_platform_generation=PI_SEL
              • PI_SEL: 0=Pi0, 1=Pi2_3, 2=Pi4
          • load drivers on startup
          • connections
            • SEL: ground
            • BCLK: pin 12 (BCM 18)
            • DOUT: pin 20 (BCM 38)
            • LRCL: pin 35 (BCM 19)
          • test audio input
            • "arecord -l"
            • "arecord -D plughw:0 -c1 -r 48000 -f S32_LE -t wav -V mono -v file.wav"
          • add volume control
            • ~/.asoundrc pcm.dmic_hw { type hw card sndrpii2scard channels 2 format S32_LE } pcm.dmic_sv { type softvol slave.pcm dmic_hw control { name "Boost Capture Volume" card sndrpii2scard } min_dB -3.0 max_dB 30.0 }

pcm.device{ type hw card sndrpii2scard format S32_LE rate 48000 } pcm.mic_control { type softvol slave.pcm device control { name "Boost Capture Volume" card sndrpii2scard } min_dB -3.0 max_dB 30.0 } pcm.mic_sv { type plug slave.pcm mic_control } pcm.!default{ type plug slave.pcm mic_sv } * volume control GUI - "alsamixer" * select I2S mic: "F6" * set recording volume: "F4" and arrow up/down * volume control - "arecord -D dmic_sv -c1 -r 48000 -f S32_LE -t wav -V mono -v .wav" * PUI Audio I2S Microphones - DMM-4026-B-I2S-EB-R * MEMS MIC EVAL BD -26DB 1.8VDC * Sensitivity: -26db * Freq Range: 20-20KHz * Vcc: 1.8-3.6V * Icc: 1mA - DMM-4026-B-I2S-R * MICROPHONE -26DB 1.8VDC * Sensitivity: -26db * Freq Range: 20-20KHz * Vcc: 1.5-3.6V * Icc: 0.82mA

  • I2S Microphones
    • Fermion
      • MEMS, 3.3V, I2S, SPL: 140dB, SNR: 59dB
    • MakerPortal ==> obsolete, out of production
      • INMP441
      • MEMS
      • Sensitivity: -26dB
      • Freq Range: 60-15KHz
      • Noise Floor: -87dB
      • Sample Rate: 44.1-48KHz
    • SPH0645LM4H
      • MEMS
      • Sensitivity: -26dB
      • Freq Range: 20-10KHz
      • SNR: 65dB
      • 1.62-3.6V @ 600uA
    • ICS-43434 TDK InvenSense
      • MEMS
      • Sensitivity: -26dB
      • Freq Range: 60-20KHz
      • SNR: 64dB
      • Noise Floor: -87dB
      • Vcc: 1.65-3.63V @ 550uA
    • ICS-43432 TDK InvenSense
      • same as ICS-43434 but lower power and smaller package

BirdPi Repos