/max-human-pose-estimator-tfjs

Make your own music just by waving your arms in front of a webcam.

Primary LanguageJavaScriptApache License 2.0Apache-2.0

Create music using MAX human pose estimator and TensorFlow.js

In this code pattern you will create music based on the movement of your arms in front of a webcam.

It is based on the Veremin but modified to use the Human Pose Estimator model from the Model Asset eXchange (MAX). The Human Pose Estimator model is trained to detect humans and their poses in a given image. It is converted to the TensorFlow.js web-friendly format.

The web application streams video from your web camera. The Human Pose Estimator model is used to predict the location of your wrists within the video. The application takes the predictions and converts them to tones in the browser or to MIDI values which get sent to a connected MIDI device.

Browsers must allow access to the webcam and support the Web Audio API. Optionally, to integrate with a MIDI device the browser will need to support the Web MIDI API (e.g., Chrome browser version 43 or later).

Architecture

Flow

  1. Human pose estimator model is converted to the TensorFlow.js web format using the Tensorflow.js converter.
  2. User launches the web application.
  3. Web application loads the TensorFlow.js model.
  4. User stands in front of webcam and moves arms.
  5. Web application captures video frame and sends to the TensorFlow.js model. Model returns a prediction of the estimated poses in the frame.
  6. Web application processes the prediction and overlays the skeleton of the estimated pose on the Web UI.
  7. Web application converts the position of the user’s wrists from the estimated pose to a MIDI message, and the message is sent to a connected MIDI device or sound is played in the browser.

Included Components

Featured Technologies

  • Web MIDI API: An API supporting the MIDI protocol, enabling web applications to enumerate and select MIDI input and output devices on the client system and send and receive MIDI messages
  • Web Audio API: A high-level Web API for processing and synthesizing audio in web applications
  • Tone.js: A framework for creating interactive music in the browser

Demo

To try this application without installing anything, simply visit ibm.biz/veremax in a web browser that has access to a web camera and support for the Web Audio API.

Max Human Pose Estimator Demo

Steps

There are two ways to run your own Veremax:

Deploy to IBM Cloud

Pre-requisites:

To deploy to the IBM Cloud, from a terminal run:

  1. Clone the max-human-pose-estimator-tfjs locally:

    $ git clone https://github.com/IBM/max-human-pose-estimator-tfjs
    
  2. Change to the directory of the cloned repo:

    $  cd max-human-pose-estimator-tfjs
    
  3. Log in to your IBM Cloud account:

    $ ibmcloud login
    
  4. Target a Cloud Foundry org and space:

    $ ibmcloud target --cf
    
  5. Push the app to IBM Cloud:

    $ ibmcloud cf push
    

    Deploying can take a few minutes.

  6. View the app with a browser at the URL listed in the output.

    Note: Depending on your browser, you may need to access the app using the https protocol instead of the http

Run locally

To run the app locally:

  1. From a terminal, clone the max-human-pose-estimator-tfjs locally:

    $ git clone https://github.com/IBM/max-human-pose-estimator-tfjs
    
  2. Point your web server to the cloned repo directory (/max-human-pose-estimator-tfjs)

    For example:

    • using the Web Server for Chrome extension (available from the Chrome Web Store)

      1. Go to your Chrome browser's Apps page (chrome://apps)
      2. Click on the Web Server
      3. From the Web Server, click CHOOSE FOLDER and browse to the cloned repo directory
      4. Start the Web Server
      5. Make note of the Web Server URL(s) (e.g., http://127.0.0.1:8887)
    • using the Python HTTP server module

      1. From a terminal shell, go to the cloned repo directory
      2. Depending on your Python version, enter one of the following commands:
        • Python 2.x: python -m SimpleHTTPServer 8080
        • Python 3.x: python -m http.server 8080
      3. Once started, the Web Server URL should be http://127.0.0.1:8080
  3. From your browser, go to the Web Server's URL

Run in Docker

Pre-requisite:

From a terminal:

  1. Clone this repository

    $ git clone https://github.com/IBM/max-human-pose-estimator-tfjs
    $ cd max-human-pose-estimator-tfjs
    
  2. Build the Docker image

    $ docker build -t veremax .
    
  3. Run the Docker container

    $ docker run -d -p 3000:80 veremax
    
  4. In your browser, open localhost:3000 and enable the web camera

To stop the Docker container. From a terminal:

  1. Obtain the container id for veremax

    $ docker ps
    
  2. Stop the Docker container using the container id obtained above

    $ docker stop container_id  
    

Using the app

For best results use in a well-lit area with good contrast between you and the background. And stand back from the webcam so at least half of your body appears in the video.

At a minimum, your browsers must allow access to the web camera and support the Web Audio API.

In addition, if it supports the Web MIDI API, you may connect a MIDI synthesizer to your computer. If you do not have a MIDI synthesizer you can download and run a software synthesizer such as SimpleSynth.

If your browser does not support the Web MIDI API or no (hardware or software) synthesizer is detected, the app defaults to using the Web Audio API to generate tones in the browser.

Open your browser and go to the app URL. Depending on your browser, you may need to access the app using the https protocol instead of the http. You may also have to accept the browser's prompt to allow access to the web camera. Once access is allowed, the Human Pose Estimator model gets loaded.

After the model is loaded, the video stream from the web camera will appear and include an overlay with skeletal information detected by the model. The overlay will also include two adjacent zones/boxes. When your wrists are detected within each of the zones, you should here some sound.

  • Move your right hand/arm up and down (in the right zone) to generate different notes
  • Move your left hand/arm left and right (in the left zone) to adjust the velocity of the note.

Click on the Controls icon (top right) to open the control panel. In the control panel you are able to change MIDI devices (if more than one is connected), configure post-processing settings, set what is shown in the overlay, and configure additional options.

Converting the model

The converted MAX Human Pose Estimator model is available here, along with information on how to use the model in other applications.

Alternatively, you can convert the model to the TensorFlow.js web friendly format yourself following the steps below.

Note: The Human Pose Estimator model is a frozen graph. The later versions of the tensorflowjs_converter no longer supports frozen graph models. To convert frozen graphs it is recommended to use an older version of the Tensorflow.js converter (0.8.0) with --output_json=true so the model assets can be in the format accepted by Tensorflow.js 1.x.

  1. Install the tensorflowjs 0.8.0 Python module

  2. Download and extract the pre-trained Human Pose Estimator model

  3. From a terminal, run the tensorflowjs_converter:

    tensorflowjs_converter \
        --input_format=tf_frozen_model \
        --output_node_names='Openpose/concat_stage7' \
        --output_json=true \
        {model_path} \
        {output_dir}
    

    where

    • {model_path} is the path to the extracted model
    • {output_dir} is the directory to save the converted model artifacts
    • output_node_names (Openpose/concat_stage7) is obtained by inspecting the model’s graph. One useful and easy-to-use visual tool for viewing machine learning models is Netron.

When completed, the contents of {output_dir} will be the web friendly format of the Human Pose Estimator model for TensorFlow.js 1.x.

Links

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ