NLR

This repository contains NLR dataset samples.

NLR dataset contains samples of natural language representations (NLRs) across questions from multiple domains, presenting a new data point for Natural Language Representation, thereby enabling users to test components of DB interaction systems end-to-end.

Getting Started

The data can be found at dataset/NLR_labels.json.

Documentation

Dataset Details

Each sample in dataset/NLR_labels.json contains the following fields:

question_id: ID for the sample question. db_id: domain of the database NLR: The natural language representation of the db_result. result_size_complexity: row count + column count of the db_result.

Example

{
  "question_id": 0,
  "db_id": "financial",
  "NLR": "There are 13 accounts who choose issuance after transaction staying in the East Bohemia region.",
  "result_size_complexity": 2
}

Dataset Creation

The data was created through a combination of synthetic generation and manual curation, between October 2024 and May 2025. The research work is being published by Oracle, and this data is part of research being released to the community.

Intended Use

NLR is being shared with the research community to facilitate reproduction of our results and foster further research in this area.

NLR is intended to be used by domain experts who are independently capable of evaluating the quality of outputs before acting on them.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide

Security

Please consult the security guide for our responsible security vulnerability disclosure process

License

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

oracle-samples/nlr-bird

NLR