/persian-word-extractor

This script creates a list of unique words from Persian text. Words can be sorted by frequency or alphabetical order. This is a new project, there could be major bugs in the code.

Primary LanguagePython

persian-word-extractor

This script creates a list of unique words from Persian text. Words are sorted by the frequency that they appear in the source.txt file. This is a new project, there could be major bugs in the code. Words with accent marks are excluded from results.

Features:

  • sort by frequency or alphabetical order

  • extract words from source.txt or online links

How to use:

  1. Create a file named 'source.txt' in root directory and paste source text inside.
  2. Run 'main.py'
  3. Follow CLI instructions.
  4. Results will be written to 'output.txt' in root directory.

Feel free to tweak the code to suit your needs.

How did I use it?

I ran this script on a large body of Persian text to extract words for contribution to Monkeytype. I added the "Persian 1k" & "Persian 5k" tests. My first open-source contribution!!