/verb_transitivity

Data on verb transitivity in English and script to extract transitivity information from Google's syntactic ngrams corpus

Primary LanguagePython

Verb Transitivity

This repo contains data on verb transitivity in English and script to extract transitivity information from Google's syntactic ngrams corpus.

For a recent project, I needed transitivity information for English verbs: For each verb what percentage of the time was it intransitive, transitive and ditransitive? After much unfruitful online searching I generated the data myself, using the Google Syntactic NGram corpus. The data and extraction script are available in this repo.

verb_transitivity.tsv

This file contains extracted transitivity information for all verb forms that occur more than 2,000 times in the Google Syntactic NGrams "English 1 Million" Corpus (7965 verb forms total) Verbs are not lemmaized, thus "give" "gives" and "gave" all appear in the top-10 most common ditransitive verbs. In some cases, verbs were recorded as occurring with indirect but no direct object. I marked these cases as cross-transitive ("xtrans" in the .tsv file).

There are 10 verb forms that always occur intransitively (for verb forms with > 10,000 occruances). They are:

Verb Percent Intransitive
targeted 1.00
mineralized 1.00
circumstanced 1.00
atrophied 1.00
dehydrated 1.00
truncated 1.00
televised 1.00
synchronized 1.00
interrelated 1.00
succumbed 1.00
is 0.999941

The Top-10 Most transitive verb forms (for verb forms with > 10,000 occruances) are:

Verb Percent Transitive
murder 0.977389
access 0.976555
avenge 0.975976
reminds 0.975470
defray 0.971862
frequent 0.968574
extricating 0.966914
extricate 0.965874
undecieve 0.965505
garrison 0.964685

The Top-10 Most ditransitive verb forms (for verb forms with > 10,000 occruances) are:

Verb Percent Ditransitive
give 0.319187
gave 0.315119
cost 0.230948
handling 0.218010
gives 0.194149
lend 0.185281
giving 0.175866
shewed 0.122527
handed 0.116729
grant 0.110312

arg_structure_extractor.py

This file is a python script for extracting the verb transitivity information from the Google Syntactic NGram Corpus. To run it, place the files you wish to extract transitivity information from in a directory titled "verb_args" in the same directory as this script.