/WordFreq

Count word frequencies from xml, html and txt files and store them into an sqlite3 database

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Wordfrec

This script is used to get the words and their frequencies from one or several xml, html and txt files and to store them in a database.

How to use

The input files must be xml, html or txt. Python 3 is required. Command:

python3 wordfreq.py file1 file2 file3...

You get the database ('wordfreq.db') with all the word forms (not lemmas) and their frequencies.

License

General Public License. See LICENSE file