This Repository belongs th ML Record Mining and its sole purpose is to provide a Dashboard for proper annotations and hand labeling.
This project is an open project, and contributions are welcome from any individual. All contributors to this project are bound by a code of conduct. Please review and follow this code of conduct as part of your contribution.
Issues and bug reports are always welcome. Code clean-up, and feature additions can be done either through branches.
All products of the Throughput Annotation Project are licensed under an MIT License unless otherwise noted.
Files and directory structure in the repository are as follows: This structure might be modified as the project progresses.
throughput-ec/UnacquiredSitesDashboard/
├── input
│ └── predictions_train_dummy # dummy file for reproducibility / output of ML Record Mining
├── output
│ └── db_val_dummy # dummy file for reproducibility / output of Dashboard
├── src
│ └── test_preprocess_all_data.py # dashboard plotly script
├── .gitignore
├── CODE_OF_CONDUCT.md
├── db.Dockerfile
├── LICENSE
└── README.md
This project uses the ML Record Mining output files:
predictions_train.tsv
predictions over sentences to assess whether they have coordinates or not.
These files are used as input in a Dashboard where people can correct the output by hand labeling.
This project is developed using Python.
It runs on a MacOS system.
The project pulls data from ML Record Mining output files. For the sake of reproducibility, two dummy data files have been included.
This project will generate tsv files with human corrections in order to help improve ML model.
The current pipeline that is followed is: \n \n
In order to run the Dashboard and help hand labeling new data, please follow these instructions.
- Clone/download this repository.
- Using the command line, go to the root directory of this repository.
- Get the unacquired_sites_db_app image from DockerHub from the command line:
docker pull sedv8808/unacquired_sites_db_app
- Verify you are in the root directory of this project. Type the following (filling in <Path_on_your_computer> with the absolute path to the root of this project on your computer).
docker run -v <User's Path>/sentences.tsv:/app/input/ -v <User's Path>/output/:/app/output -p 8050:8050 sedv8808/unacquired_sites_db_app:latest
-
Go to your internet browser and enter the following address: http://0.0.0.0:8050/
-
Navigate through the different articles and mark the sentences that have coordinates.
-
Click the save button once you finish ONE article.
-
Sentences will be saved in the output/ folder. Kindly send those outputs to us.
This repository consists of 1 Python script.
In order to run this project, you need to:
-
Clone or download this repository.
-
Run the following code in the terminal at the project's root repository. To run the scripts:
# From the command line.
# Load dashboard
python3 src/modules/dashboard/record_mining_dashboard.py --input_file=<data> --output_file=<directory>
# To visualize in your browser, enter the following http address.
http://127.0.0.1:8050/
The Record Mining Machine Learning Dashboard can help the user identify sentences that are incorrectly tagged and so, fix the problem.