/NLP_task

This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and different operations in a text file or corpus.

Primary LanguagePython

Task1

This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and different operations in a text file or corpus.

Data

The corpus used, can be found here

Results

All the results are stored in different csv files with sorted list and dictionaries and kept in Results folder. There are unit test functions to individually to confirm if the functions give accurate results. It also shows the execution time of each functions given as follows.

  • 'total_words' 0.00 ms
  • 'total_alphabet_and_punctuation' 1162.16 ms
  • 'word_frequency' 165.80 ms
  • 'alphabet_and_punctuation_frequencies' 352.30 ms
  • 'alphabet_and_punctuation_frequencies' 11.15 ms
  • 'alphabetic_word_frequencies' 132.92 ms
  • 'starting_and_ending_with_vowel' 106.04 ms
  • 'total_sentences' 0.01 ms
  • 'length_of_sentences' 9.96 ms

For unit testing, the execution time for each method is given as follows:

  • 'total_words' 0.00 ms
  • 'total_alphabet_and_punctuation' 0.06 ms
  • 'word_frequency' 0.01 ms
  • 'alphabet_and_punctuation_frequencies' 0.03 ms
  • 'alphabet_and_punctuation_frequencies' 0.00 ms
  • 'alphabetic_word_frequencies' 0.01 ms
  • 'starting_and_ending_with_vowel' 0.01 ms
  • 'total_sentences' 0.00 ms
  • 'length_of_sentences' 0.01 ms