A docker compose deployment of Label studio data labeling and annotation tool.
Make
To run this project, you will need to add the following environment variables to your .env file.
DJANGO_DB
POSTGRE_DB_NAME
POSTGRE_USER
POSTGRE_PASSWORD
POSTGRE_PORT
POSTGRE_HOST
POSTGRES_HOST_PORT
LABEL_STUDIO_VOLUME_HOST_AUDIO_PATH
Example .env file
DJANGO_DB=default
POSTGRE_NAME=YOUR_PASTGRES_DATABASE_NAME
POSTGRE_USER=YOUR_PASTGRES_DATABASE_USER
POSTGRE_PASSWORD=YOUR_PASTGRES_DATABASE_PASSWORD
POSTGRE_PORT=5432
POSTGRE_HOST=postgres
POSTGRES_HOST_PORT=YOUR_PASTGRES_DATABASE_HOST_PORT
LABEL_STUDIO_VOLUME_HOST_AUDIO_PATH=./data/files
To deploy the label studio annotation tool
make deploy
To stop deployment run
make stop
To delete deployment run
make undeploy
# log into postgres container, and run
createdb YOUR_PASTGRES_DATABASE_NAME -O YOUR_PASTGRES_DATABASE_USER -U YOUR_PASTGRES_DATABASE_USER
# then open http://localhost:8080/
# first, signup with email: admin@admin.com / password: passwd123
# then log into labelstudio container, and run
# cd /label-studio/label_studio
# python3 manage.py createsuperuser
Once you are in the creation project setup screen, navigate to Labeling Setup -> Custom template You will see some audio templates by default provided by label studio.
The following example we are already using:
<View>
<Audio name="audio" value="$audio" zoom="true" hotkey="ctrl+enter" />
<Header value="Provide Transcription" />
<TextArea name="transcription" toName="audio"
rows="4" editable="true" maxSubmissions="1" />
<Choices name="approved" toName="audio" showInLine="true">>
<Choice value="approved"/>
<Choice value="to revision" />
</Choices>
</View>
How to interface is supposed to view with the above example:
To listen to audio files on the interface, you have to set up cloud or database storage as the source on the setting.
Navigate to project Settings -> Cloud Storage -> Add Source Storage
Fill the fields with your audio files path directory
If you would like to change the audio files directory, be aware of changing LABEL_STUDIO_VOLUME_HOST_AUDIO_PATH
on the .env
file.
Check out label studio documentation for additional information
NOTE: We do not recommend using this approach. Use it if you don't have access to a cloud storage alternative (S3) Label studio, by default, supports storing input/output files directly to S3.
You could customize the annotation format as whatever you like.
Only if you using local file audio storage, be aware of keeping ?d=audios
variable (audios parent directory) before audio path.
Check out label studio documentation for more about input formats
{
"data": {
"audio": "/data/local-files/?d=audios/wav/5621300/5621300_00_00_07_00_00_17.wav",
"content_id": "5621300",
"entrega": 6,
"segment_id": "5621300_00_00_07_00_00_17"
},
"annotations": [
{
"result": [
{
"value": {
"text": [
"És el descobriment de l'any que et dic de l'any de la dècada, no del segle i el més important és la via directe per guanyar el premi Nobel ai quina alegria,"
]
},
"from_name": "transcription",
"to_name": "audio",
"type": "textarea"
}
]
}
],
"predictions": [
{
"score": 0.125,
"model_version": "version 0",
"result": [
{
"value": {
"text": [
"és el descobriment de l'any que et dic de l'any de la dècada no del segle i el més important és la via directa per guanyar el premi nobel aque l'alegria"
]
},
"from_name": "transcription",
"to_name": "audio",
"type": "textarea"
}
]
}
]
}
There are several ways to import you data to label studio, once you get you data in the expected format you could:
- Sync data from cloud or database storage.
- Import it from the Label Studio UI.
- If your data is stored locally, import it into Label Studio
- Send data using Label Studio Rest API.
For additional information, please check out label studio official documentation Also, don't hesitate to ask us. You could start a new discussion on this repository