English-Persian Tokenizer

Overview

The English-Persian Tokenizer is a simple Python program that classifies input strings into English words or Persian words. It leverages a Deterministic Finite Automaton (DFA) to perform this classification, making it a handy tool for distinguishing English and Persian words within text.

Features

Tokenize input text into English and Persian words.
Utilizes a DFA for efficient classification.
Easily customizable for additional languages or character sets.

Usage

Clone or download this repository to your local machine.
Ensure you have Python installed (Python 3 is recommended).
Open a terminal and navigate to the repository's directory.
Run the tokenizer by executing the tokenizer.py script, providing the text you want to classify as an argument.
```
python tokenizer.py "Your input text here."
```

Thank you for using the English-Persian Tokenizer!

kiarashrahmani/English-Persian-Tokenizer

English-Persian Tokenizer

Overview

Features

Usage