🔥 Data loaders and abstractions for text and NLP - for Ruby
Add this line to your application’s Gemfile:
gem "torchtext"
This library follows the Python API. Many methods and options are missing at the moment. PRs welcome!
Text classification
Load a dataset
train_dataset, test_dataset = TorchText::Datasets::AG_NEWS.load(root: ".data", ngrams: 2)
Supported datasets are:
Supports:
- tokenizer
- ngrams_iterator
Compute the BLEU score
candidate_corpus = [["My", "full", "pytorch", "test"], ["Another", "Sentence"]]
references_corpus = [[["My", "full", "pytorch", "test"], ["Completely", "Different"]], [["No", "Match"]]]
TorchText::Data::Metrics.bleu_score(candidate_corpus, references_corpus)
Supports:
- InProjContainer
- MultiheadAttentionContainer
- ScaledDotProduct
Supports:
- Vocab
This library downloads and prepares public datasets. We don’t host any datasets. Be sure to adhere to the license for each dataset.
If you’re a dataset owner and wish to update any details or remove it from this project, let us know.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/torchtext-ruby.git
cd torchtext-ruby
bundle install
bundle exec rake test