Summary

The Hindi UD treebank is based on the Hindi Dependency Treebank (HDTB), created at IIIT Hyderabad, India.

Introduction

The Hindi Universal Dependency Treebank was automatically converted from Hindi Dependency Treebank (HDTB) which is part of an ongoing effort of creating multi-layered treebanks for Hindi and Urdu. HDTB is developed at IIIT-H India.

Acknowledgments

The project is supported by NSF Grant (Award Number: CNS 0751202; CFDA Number: 47.070).

Any publication reporting the work done using this data should cite the following references:

Riyaz Ahmad Bhat, Rajesh Bhatt, Annahita Farudi, Prescott Klassen, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, Ashwini Vaidya, Sri Ramagurumurthy Vishnu, and Fei Xia. The Hindi/Urdu Treebank Project. In the Handbook of Linguistic Annotation (edited by Nancy Ide and James Pustejovsky), Springer Press

@InCollection{bhathindi,
  Title                    = {The Hindi/Urdu Treebank Project},
  Author                   = {Bhat, Riyaz Ahmad and Bhatt, Rajesh and Farudi, Annahita and Klassen, Prescott and Narasimhan, Bhuvana and Palmer, Martha and Rambow, Owen and Sharma, Dipti Misra and Vaidya, Ashwini and Vishnu, Sri Ramagurumurthy and others},
  Booktitle                = {Handbook of Linguistic Annotation},
  Publisher                = {Springer Press}
}

Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, Fei Xia. Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure. In the Proceedings of the 7th International Conference on Natural Language Processing, ICON-2009, Hyderabad, India, Dec 14-17, 2009.

@inproceedings{palmer2009hindi,
  title={Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure},
  author={Palmer, Martha and Bhatt, Rajesh and Narasimhan, Bhuvana and Rambow, Owen and Sharma, Dipti Misra and Xia, Fei},
  booktitle={The 7th International Conference on Natural Language Processing},
  pages={14--17},
  year={2009}
}

Changelog

  • 2023-05-15 v2.12
    • Fixed: Finite verbs head clauses, hence ccomp instead of obj.
    • Two sentences split after exclamation mark.
  • 2022-11-15 v2.11
    • Fixed a number of various validation errors.
  • 2021-05-15 v2.8
    • Normalized lemmatization of punctuation symbols: LEMMA=FORM.
  • 2019-05-15 v2.4
    • Fixed some violations of the guidelines reported by the new validator.
  • 2018-04-15 v2.2
    • Repository renamed from UD_Hindi to UD_Hindi-HDTB.
  • 2017-03-01 v2.0
    • Converted to UD v2 guidelines (Dan Zeman).
  • 2015-11-01 v1.2
    • Initial release (Riyaz Bhat and Dan Zeman).
=== Machine-readable metadata =================================================
Data available since: UD v1.2
License: CC BY-NC-SA 4.0
Includes text: yes
Genre: news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Bhat, Riyaz Ahmad; Zeman, Daniel
Contributing: here
Contact: zeman@ufal.mff.cuni.cz
===============================================================================