This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and different operations in a text file or corpus.
The corpus used, can be found here
All the results are stored in different csv files with sorted list and dictionaries and kept in Results
folder. There are unit test functions to individually to confirm if the functions give accurate results. It also shows the execution time of each functions given as follows.
- 'total_words' 0.00 ms
- 'total_alphabet_and_punctuation' 1162.16 ms
- 'word_frequency' 165.80 ms
- 'alphabet_and_punctuation_frequencies' 352.30 ms
- 'alphabet_and_punctuation_frequencies' 11.15 ms
- 'alphabetic_word_frequencies' 132.92 ms
- 'starting_and_ending_with_vowel' 106.04 ms
- 'total_sentences' 0.01 ms
- 'length_of_sentences' 9.96 ms
For unit testing, the execution time for each method is given as follows:
- 'total_words' 0.00 ms
- 'total_alphabet_and_punctuation' 0.06 ms
- 'word_frequency' 0.01 ms
- 'alphabet_and_punctuation_frequencies' 0.03 ms
- 'alphabet_and_punctuation_frequencies' 0.00 ms
- 'alphabetic_word_frequencies' 0.01 ms
- 'starting_and_ending_with_vowel' 0.01 ms
- 'total_sentences' 0.00 ms
- 'length_of_sentences' 0.01 ms