MTG/essentia.js

How to import ML models ad use for autotagging

bianchilo opened this issue · 1 comments

What is the issue about?

  • Bug
  • Feature request
  • Usage question
  • Documentation
  • Contributing / Development

What part(s) of Essentia.js is involved?

  • essentia.js-core (vanilla algorithms)
  • essentia.js-model (machine learning algorithms)
  • essentia.js-plot (plotting utility module)
  • essentia.js-extractor (typical algorithm combinations utility)

Description

Hello everyone, I am trying to adapt the Real-time music autotagging with MusicCNN example using a different machine learning model among those published on Essentia ( https://essentia.upf.edu/models/ ) . Target is recognizing musical instruments in realtime. I chose mtg_jamendo_instrument-discogs-effnet-1.pb because it has more musical instruments. I converted it to TensorFlow format using tensorflowjs-converter, and now I have the problem of handling a different feature input required by this model.

The model used in the example I was modifying had the following input configuration:

"inputs": [
{
"name": "model/Placeholder",
"type": "float",
"shape": [
187,
96
]
}
]
and it performs inference with "algorithm": "TensorflowPredictMusiCNN"

However, the model I would like to use now has the following input configuration:

"inputs": [
{
"name": "model/Placeholder",
"type": "float",
"shape": [
1280
]
}
]
and it performs inference with "algorithm": "TensorflowPredict2D"

So, at the very least, I need to change the FeatureExtractProcessor. Is there any place where I can find an example that suits my case or detailed information on how to do this? I haven't found anything in the documentation that helps me understand what I need to change in the code. Any suggestions are welcome. Thank you in advance.

Steps to reproduce / Code snippets / Screenshots

System info

Chromium based browser, Essentia.js

These new models based on effnet-discogs has a different signal flow than older generation models. From the python docs and examples, you can see the following process

audio input -> embeddings -> activations (tags)

There are two models for inference, ie. one to compute embeddings (vector representation) and another one to compute tags from these vector representations.

For making this work in JS, the following signal chain has to be added to essentia.js-model lib.

  1. audio -> embeddings
  2. embeddings -> tags