This file walks you through the setting up a development environment for synthbot. This means:
- Setting up your directory structure,
- Running the docker image, and
- Running the dev environment.
Step up your project directory:
mkdir soundtools # project directory to track relevant files
cd soundtools # we'll always assume we're working in this directory henceforth
mkdir data # keep data files organized in one place
git clone "https://github.com/synthbot-anon/synthbot.git"
The data/
directory organization should reflect preprocessing flows. You'll be adding more directories in data/
as you go through the preprocessing steps. For now, we'll just add two folders for Clipper's clips and our pronunciation dictionaries.
mkdir data/clipper-samples
mkdir data/dictionaries
Make sure you've downloaded and (if needed) extracted Clipper's Master File 2.0 into the data/clipper-samples
directory. Make sure you've also downloaded cmudict-0.7b.txt
, librispeech.txt
, and horsewords.txt
, and placed them within the data/dictionaries
directory.
Within soundtools/
, your directory structure should now look something like this:
data/
clipper-samples/
Reviewed episodes/
# lots of .json files
...
Sliced Dialogue/
EQG/...
FiM/...
Label files/...
...
dictionaries/
cmudict-0.7b.txt
librispeech.txt
horsewords.txt
synthbot/
notebooks/
src/
tests/
requirements.txt
...
Make sure you've installed Docker. You can find the installation instructions here.
Once you've installed Docker, you can run the soundtools
container with the following command:
docker run \
--mount "type=bind,src=$(pwd)/data,dst=/home/celestia/data" \
--mount "type=bind,src=$(pwd)/synthbot,dst=/home/celestia/synthbot" \
--publish "127.0.0.1:8888:8888" \
--publish "127.0.0.1:6006:6006" \
--hostname "soundtools" \
--rm -it synthbot/soundtools:latest
I recommend saving that as a shell script. You can add --gpus 3
or similar to give the container access to your GPUs.
When you run the command, you should be dropped into an ubuntu 18.04 shell under the user celestia
. Note that changes you make to container files WILL NOT persist when you exit the container. The only changes that persist will be ones performed on host files that are available within the container.
Walking through the docker run
command line arguments:
--mount "type=bind,src=$(pwd)/data,dst=/home/celestia/data"
makes yourdata/
directory available in the container using a bind mount. A bind mount is what's used to connect files across different partitions and filesystems on the same system. This is how we're making host files available within the container. Any changes you make within thedata/
directory inside the container will persist.--mount "type=bind,src=$(pwd)/synthbot,dst=/home/celestia/synthbot"
makes yoursynthbot/
directory available in the container, again using a bind mount. Any changes you make within thesynthbot/
directory will persist, and any change you make in thesynthbot/
directory on your host will be reflected back as changed within the container. This means you can use your IDE of choice outside of the container to modify code while running the actual code within the container, and you won't have to keep restarting the container to do this.--publish "127.0.0.1:8888:8888"
forwards the container port 8888 to your host port 8888. The127.0.0.1
forces this port to be available on your host computer in a way that it isn't exposed to the rest of the internet. This is good because we're eventually going to run Jupyter on this port, and anyone with access to this port will be able to execute arbitrary code within your container.--publish "127.0.0.1:6006:6006"
forwards the necessary port for Tensorflow's TensorBoard.--hostname "soundtools"
sets the network name of your container. This might have some utility later, but for now it just makes the command prompt less ugly.--rm
tells Docker to clean up after itself once the container is done running. Don't remove this flag unless you know what you're doing, otherwise you'll find that Docker is consuming way more disk space than it seems to need.-it synthbot/soundtools:latest
drops you into an interactive shell in thesynthbot/soundtools:latest
container hosted on Docker Hub.
And you're set up! When the container starts, it will give you a 127.0.0.1:8888
URL you can open in your browser (outside of the container). The next step is to open the Jupyter URL in a browser, navigate to the notebooks/
directory, and run through Preprocess data.ipynb
. This will walk you through preprocessing the data.