teusbenschop/shona

A counted word list to assist with proof reading the text

Opened this issue · 5 comments

The attached Zip file contains a tab delimited counted word list for the text of the Shona translation.
The analysis was done by means of a bespoke TextPipe filter.

In removing punctuation, special provision was made to preserve the hyphenated words.
The final count duplicate lines filter also sorted the words and is case-sensitive.

This may assist with proof reading the text. Browse the file to look for any mis-spelled words.

The data analysed is from verse and paragraph text, but excludes section headings.
Cross-references and all USFM tags were first removed.

merged.words.count.txt.zip

The analysis can be readily repeated in the future, should the need arise after further corrections.

This is helpful indeed. Also for the spelling of certain names.

Updated after your recent commits.

merged.words.count.txt.zip

Here is a derived file in which the third tab field has the words reversed.
When opened with Excel the data can be sorted on column C to get the words in column B in rhyming order. i.e. Words with similar endings are found together.
This technique can sometimes be fruitful in finding further anomalies in spellings.

merged.words.count.rev.txt.zip

Following the merge that Removed all cross-references,
I just reran the filters to generate the counted words list.

merged.words.count.rev.txt.zip