A counted word list to assist with proof reading the text

Question

A counted word list to assist with proof reading the text

Opened this issue 7 years ago · 5 comments

The attached Zip file contains a tab delimited counted word list for the text of the Shona translation.
The analysis was done by means of a bespoke TextPipe filter.

In removing punctuation, special provision was made to preserve the hyphenated words.
The final count duplicate lines filter also sorted the words and is case-sensitive.

This may assist with proof reading the text. Browse the file to look for any mis-spelled words.

The data analysed is from verse and paragraph text, but excludes section headings.
Cross-references and all USFM tags were first removed.

merged.words.count.txt.zip

Answer 1 · 2017-09-12T18:32:53.000Z

The analysis can be readily repeated in the future, should the need arise after further corrections.

Answer 2 · 2017-09-13T07:24:08.000Z

This is helpful indeed. Also for the spelling of certain names.

Answer 3 · 2017-09-13T12:15:27.000Z

Updated after your recent commits.

merged.words.count.txt.zip

Answer 4 · 2017-09-13T12:21:02.000Z

Here is a derived file in which the third tab field has the words reversed.
When opened with Excel the data can be sorted on column C to get the words in column B in rhyming order. i.e. Words with similar endings are found together.
This technique can sometimes be fruitful in finding further anomalies in spellings.

merged.words.count.rev.txt.zip

Answer 5 · 2017-09-25T12:54:08.000Z

Following the merge that Removed all cross-references,
I just reran the filters to generate the counted words list.

merged.words.count.rev.txt.zip