/backyard-ml

Scalable custom machine learning models for everyone

MIT LicenseMIT

DEMO: For Beginners (zero coding experience)

Secrets File

Step 0: Create A Secrets File

Step 0: Create A Secrets File

Create an empty text file, name it secrets.txt, then copy and paste the template below and save it.

LS_HOST=
TOKEN=
S3_BUCKET=
S3_REGION=us-east-1
S3_ACCESS_KEY=
S3_SECRET_KEY=
S3_ENDPOINT=https://s3.wasabisys.com
DB_CONNECTION_STRING=
DB_NAME=label_studio

This file is very important, and you will be using it a lot, so keep it somewhere safe.

Data Preparation (Step 1 to 4)

Step 1: Deploy Label-Studio

Step 1: Deploy Label-Studio

  1. Sign up on Heroku: https://signup.heroku.com/ and verify your email.
  2. Click this button: .
  3. Pick any name for the app (e.g., label-studio-0).
  4. Change DISABLE_SIGNUP_WITHOUT_LINK from 0 to 1. For USERNAME, type your email address.
  5. Enter a username and password for the default account that you will use for label-studio.
  6. Click deploy!

When your label-studio app is deployed, you will see something that like this:

ls_deploy_ok

  1. Click on View, then login with your username and password. Copy the base URL of your home page and use it for LS_HOST in your secrets.txt file. For example: https://my-labelstudio.herokuapp.com – make sure you include https://, and don't include anything after .com.
LS_HOST=https://replace-me-with-your-app-url.herokuapp.com
...
  1. Click on your initials icon (top right) -> Account & Settings, then copy the value of Access Token. Add the token value to the TOKEN variable in your secrets.txt file. For example:
...
TOKEN=SoMe-sUpEr-sEcReT-LaBeL-StUdIo-tOkEn
...

Step 2: Create a cloud storage

Step 2: Create a cloud storage

You would wanna keep your data in the cloud so it's easily moved between different apps. For that, you will create a Simple Storage Service (S3) bucket, where all your data will be kept. There are a lot of options for S3-compatible object storage service, but here, we will use Wasabi.

  1. Sign up on Wasabi: https://wasabi.com/sign-up/ and verify your email.
  2. Open the console: https://console.wasabisys.com, then click on Buckets -> CREATE BUCKET.

create_bucket

  1. Pick up a name for your bucket (e.g., data-0), select a region (e.g., us-east-1), then click Next. Copy the value of the bucket and region name to your secrets.txt file. For example:
...
S3_BUCKET=My-bUcKeT-NaMe
S3_REGION=us-east-1
...
  1. Keep the options as they are, then click Next, then click CREATE BUCKET.

That's it! You made an S3 bucket! Now generate access id and secret to connect to the bucket.

  1. Click on the key icon on the left bar -> click on CREATE NEW ACCESS KEY. Leave the default selection as it is, then click create.

Generate key

  1. Copy the keys to clipboard, then use the value of the access key id for S3_ACCESS_KEY and the secret key for S3_SECRET_KEY in your secrets.txt file (don't paste the name of the variables that were automatically copied, just their values). For example:
...
S3_ACCESS_KEY=mY-s3-BuCkEt-aCcEsS-KeY
S3_SECRET_KEY=mY-sUpEr-sEcReT-S3-bUcKeT-SeCrEt-kEy
...
  1. Visit this page and copy the Service URL that correspond to your bucket's region. Then use it's value for S3_ENDPOINT (e.g.,: the endpoint for us-east-1 is https://s3.wasabisys.com). Remove https:// from the URL. For example:
...
S3_ENDPOINT=s3.wasabisys.com
...

Step 3: Create a backend database

Step 3: Create a backend database

This database will be used to store all the information related to the data generated by the project. We will use MongoDB as a backend database. You can create one for free with plenty of storage for what we need.

  1. Sign up on MongoDB Atlast: https://www.mongodb.com/cloud/atlas/register and verify your email.
  2. Visit https://cloud.mongodb.com and click on Build a Database -> select Shared -> Create.
  3. Keep the options as they are. You can change the cluster name if you want, or just keep the default name.
  4. Click Create Cluster, then pick a username and a strong password, then click Create user.
  5. For the IP Address, enter 0.0.0.0/0 -> Add Entry. Then, click Finish and close.
  6. Click Go to database -> click Connect. Select Python for the driver and 3.6 or later for version.

mongodb_con_str

  1. Copy the connection string and replace <password> with your database password, and paste it as a value for DB_CONNECTION_STRING in your secrets.txt file, and leave the database name as it is. For example:
...
DB_CONNECTION_STRING=mongodb+srv://server.mongodb.net/myFirstDatabase?retryWrites=true&w=majority
DB_NAME=label_studio
...

Step 4: Connecting the cloud storage to label-studio

Step 4: Connecting the cloud storage to label-studio

  1. Go back to your label-studio application. Click on Create project.
  2. Pick a name for your project, then click on labeling setup and select object detection with bounding boxes.
  3. Remove the two default labels, then add the labels that you expect to see in your dataset (you can edit this later to add more). Make sure to add one label per line (note: the label should not include a backslash \!). Click on Add, then save.
  4. Go the project settings (top right) -> click on cloud storage -> add source storage.
  5. For Bucket Name, Region Name, Access Key ID, and Secret Key fields, use the values of S3_BUCKET, S3_REGION, S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY from your secrets.txt file, respectively.
  6. For S3 endpoint use S3_ENDPOINT value from your secrets.txt file, but append https:// at the start of the URL.
  7. Clear the Session token field and leave it empty. Toggle Treat every bucket object as a source file and Recursive scan to turn them ON, set the expiry time to 120, then click Add storage.

Now anything you upload the bucket can be synced to label studio!

The upload interface:

upload

After upload, sync:

sync

Now if you upload any image to your bucket and sync the storage, you will be able see the images you uploaded as tasks in your label-studio project. You can label the objects in the image by opening a task -> clicking the label, then drawing a bounding box around the object -> submit.


AI Model training and prediction (Step 5 to 6)

Step 5: Setting up the model prediction-workflow

Step 5: Setting up the model prediction-workflow

  1. If you don't have an account yet, sign up on GitHub here and verify your email.
  2. First, fork the model repository by clicking on this button: , then click Create fork.
  3. In your fork page (the page URL will have YOUR_USERNAME/BirdFSD-YOLOv5; this is your fork page). Click on your forked repository settings -> Secrets.
  1. For every variable in your secrets.txt file, copy and paste the name of the variable (before =) to the Name field and use the value that correspond to that name (after =) for the secret's Value field. Repeat this for every secret in your secrets.txt file.
  1. Enable workflows in your fork:
  1. Then, click on and enable all the workflows that are highlighted wuth a red square in the image below:
Step 6: Training the model

Step 6: Training the model

  • You can train your model when you have "enough" annotations. The number of annotations required for a reliable model will differ based on your use case and the size of your dataset (you can learn more here). Train your model every now and then after you annotate a sizeable chunk of your data (e.g., after every 100, 500 or 1000 new annotation). The more annotated data you add, the better the model will learn.
  1. Sign up to W&B to track your training (optional, but recommended).

  2. Log in to your Google account, then click on this button to open the training notebook: Open In Colab

  3. Click on Copy to Drive.

  1. Click on the folder icon, then Drag and drop secrets.txt to the files section in Google Colab.
  1. Click right on the file -> Rename file, then rename it to .env (don't worry if you can't see the file after renaming it, it just became a hidden file).
  1. Follow the instructions in the notebook to train the model.