/amazon-sagemaker-bert-classify-pytorch

This sample show you how to train BERT on Amazon Sagemaker using Spot instances

Primary LanguagePythonMIT No AttributionMIT-0

Build Status

Amazon Sagemaker BERT text classification using PyTorch

This sample show you how to

To get started, use the notebook BertTextClassification.ipynb

Dataset

We use the Dbpedia ontology dataset, for more details, see https://wiki.dbpedia.org/services-resources/dbpedia-data-set-2014#2

Customise for your dataset

In order to customise this sample, for your own dataset, perform the following steps

  1. Create a dataset class, that implements the PyTorch Dataset abstract class, see dbpedia_dataset.py as an example implementation.
  2. Create a label mapper class, that implements abstract class LabelMapperBase, to maps string labels to zero indexed integer labels. See an example implementation dbpedia_dataset_label_mapper.py.
  3. Replace the use of classes DbpediaDataset and DbpediaLabelMapper in builder.py with your own custom dataset and label mapper class

Running locally

  1. Install python 3.7.4

  2. Set up requirements.

    pip install -r tests/requirements.txt
  3. Verify set up

    export PYTHONPATH=./src
    pytest

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.