/w2ni18n

temporary archived project --- Convert number words from different languages with Python, Sharp or Java API eg. three hundred and forty two to numbers (342) or vingt-et-un (21) or две целых три десятых (2.3).Word2Number-i18n convert number words (eg. twenty one) to numeric digits (21).

Primary LanguagePythonMIT LicenseMIT

This project is temporary archived.

Word to Number i18n

Convert number words from different languages with Python, CSharp or Java API eg. three hundred and forty two to numbers (342) or vingt-et-un (21) or две целых три десятых (2.3).Word2Number-i18n convert number words (eg. twenty one) to numeric digits (21). Below is the installation, usage and other details of this module.

Fun with numbers

  • French: We can calculate! So 4*20+10 are 90 and this is normal (quatre-vingt-dix).
  • German: We have our own rules. All under million is one word (but often false written) and also we say for 21 onetwenty (einunzwanzig) not twenty one.
  • Spanish: Less is more, we need no word for 1.000.000.000 as word an says mil millón (for example tres mil millones).

Supported natural languages

  • English
  • French
  • Portugues
  • Russian
  • Slovak
  • Spanish
  • Persian (just python implementation)

Request new language

Do follow steps

1. check your request (directory data), textfile name ISO-639-1 code
2. if not found check ISO-639-3 code
3. if not found create new file with new ISO-639-1/3 code

Example

You want to tranfer NLP CARD to numeric value for Lower Sorbian. German (de) isn't it. You do not found an ISO-639-1 code, you do not found an ISO-639-1 file for dsb extension. You create a new file number_system_dsb.txt with utf-8 encoding

null=0
jaden=1
dwa=2
tśi=3
styri=4
pěś=5
šesć=6
sedym=7
wósym=8
źewjeś=9
źaseś=10

Every config file need the values from zero to nine and every supported language need a config file.

And add all single words with values. In extensions you need also value for point and additional replacement words and measure words in the same configuration file. Its different to value names where the key is localized now the key is internal used and parts or full predefined. Feel free to copy a existing language config.

Example:

# 0-9
# much more
twenty=20
thousand=1000
# special
point=komma
replace:what=with
replace:две=два
replace:dozen=twelfe
replace:gran=thousand
measure:aLotOf=1000
measure:namesAreLikeIceOnMadeira=1000000

The point is elementary to work with decimal values and this value is the word in your language. The measure are the multiplier for the numbers. It extends the numeric value with an label or like NLP with an tag. So the name thousand is the name for the numeric value and a gran are a synonym for thousand and thousand is a measure word.

So giving twenty nice gran do the follow:

  1. word2number-i18n register initial value 1000 as measure. The name aLotOf is your comment.
  2. word2number-i18n looking for replace and change text to twenty nice thousand
  3. word2number-i18n filter none number words to trash and your text is twenty thousand
  4. word2number-i18n find out 1000 is a measure and the localized name is thousand. In result of this thousand is here a multiplier for the numeric value before.
  5. word2number-i18n take the numeric value 20 instead of twenty and the multiplier after to give your the result 20000

Internal much more especially checks inside working.

Supported programming languages

Python

Please note that these Python implementation is similar to Java and CSharp.

Installation

Please ensure that you have updated pip to the latest version before installing word2number-i18n.

You can install the module using Python Package Index using the below command.

    pip3 install word2number-i18n 

Installation from source

On macOS

    # git clone https://github.com/bastie/w2ni18n.git w2n
    # python3 setup.py install

Make sure you install all requirements given in requirements.txt

    pip3 install -r requirements.txt

Usage

Add the word2number-i18n to requirements.txt in your project. First you have to import the module using the below code.

from word2numberi18n import w2n

Then you can use the word_to_num method to convert a number-word to numeric digits, as shown below.

print(w2n.word_to_num("two million three thousand nine hundred and eighty four"))
2003984
print(w2n.word_to_num('two point three')) 
2.3
print(w2n.word_to_num('one hundred thirty-five')) 
135
print(w2n.word_to_num('million million'))
Error: Redundant number! Please enter a valid number word (eg. two million twenty three thousand and forty nine)
None
print(w2n.word_to_num('blah'))
Error: No valid number words found! Please enter a valid number word (eg. two million twenty three thousand and forty nine)
None

i18n

word2number looking for your specific language with

1. given parameter for language and if not found
2. defined environment variable w2n.lang with ISO lang code like en, hi, de and if not found
3. over locale.getdefaultlocale() and if not found
4. over environment variable "LANGUAGE" and if not found
5. fallback to english 

In result use it classic Python like (example unit_testing_ru.py):

os.environ['w2n.lang'] = 'ru'
w2n.word_to_num('две целых три десятых')

Also work with object oriented Python like (example unit_testing_fr.py):

instance = w2n.W2N(lang_param="fr")
instance.word_to_num('trente-et-un')

Place in the data directory your language specific dictionary file with ISO lang code in the name.

Develop package

    # python3 -m reuse lint
    # python3 -m flake8 | grep -v ":80: E501"
    #
    # python3 setup.py sdist bdist_wheel
    # python3 -m twine check dist/*
    # python3 -m twine upload dist/*

Make sure you install all requirements given in development.txt

    pip3 install -r development.txt

Java

Please note that these Java implementation is similar to Python and CSharp.

Installation

Download the latest version of word2number from GitHub.

Installation from source

On macOS

    # git clone https://github.com/bastie/w2ni18n.git w2n
    # ./w2n/java/src/build.sh

i18n

word2number looking for your specific language with

1. construct your object instance with ISO lang code as parameter like fr, de and if not
2. defined environment variable (not property) w2n.lang with ISO lang code like en, hi, de and if not found
3. over java.util.Locale.getDefault() and if null
4. over environment variable "LANGUAGE" and if not found
5. fallback to english

Place in the data directory your language specific dictionary file with ISO lang code in the name.

Usage

Add the word2number-i18n to module-info.java in your project.

requires word2number;

Then import the class using the below code.

import word2number.W2N;

Then you can use the wordToNum method to convert a number-word to numeric digits, as shown below.

    Locale.setDefault(Locale.CANADA);
    W2N english = new W2N();
    System.out.println(english.wordToNum("three hundred fifty"));

    Locale.setDefault(Locale.CANADA_FRENCH);
    W2N french = new W2N();
    System.out.println(french.wordToNum("vingt et un"));
    
    System.out.println(english.wordToNum("three point one four"));

In result it prints

350
21
3.14

Develop package

Call the build.sh script and use the new w2ni18n-VERSION-.jar file

CSharp

Please note that these CSharp implementation is similar to Java and Python.

Installation

Download the latest version of word2number from nuget called w2ni18n.

Installation from source

On macOS

    # git clone https://github.com/bastie/w2ni18n.git w2n
    # cd ./w2n/csharp
    # ./build.sh

i18n

word2number looking for your specific language with

1. Construct your W2N instance with ISO lang code parameter like es, fr, pt and if not
2. defined environment variable (not property) w2n.lang with ISO lang code like en, hi, de and if not found
3. over java.util.Locale.getDefault() and if null
4. over environment variable "LANGUAGE" and if not found
5. fallback to english

Place in the data directory your language specific dictionary file with ISO lang code in the name.

Usage

dotnet add MyNextProject.csproj package w2ni18n

Then import the namespace using the below code.

using word2number;

Then you can use the wordToNum method to convert a number-word to numeric digits, as shown below.

Environment.SetEnvironmentVariable ("w2n.lang","en");
    W2N english = new W2N();
    Console.WriteLine(english.wordToNum("one billion two million twenty three thousand and forty nine point two three six nine"));

In result it prints

1002023049.2369

Develop package

Call the build.sh script and use the new W2N.dll file

Bugs/Errors

  • german language need more specific algorithm
  • french language need update the property point=virgule. This is included in source but not in releases.

w2n fixed

  • Add regex to fix comma bug fixed by jnelson16
  • fixed floating point conversation bug
  • accept also number values because it is more understandable to handle str:112 as same as int:112

Thanks

Thanks to word2number i18n coder and contributors

License

The MIT License (MIT)

Copyright (c) 2016 Akshay Nagpal

Copyright (c) 2020-2021 Sebastian Ritter

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.