google/cld3

Cannot detect the regional variations/ dialects in the Language with gcld3

Opened this issue · 0 comments

I am trying to detect the regional variations in language using gcld3. Below is the code I have tried so far...

import gcld3

def detect_language_with_region(text):
# Create a language detector object
detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)

# Detect the language
result = detector.FindLanguage(text)

# Extract detected language
detected_language = result.language if result.is_reliable else "undetermined"

return detected_language

Example usage

text = "This is a sample text in English."
detected_language = detect_language_with_region(text)
print("Detected language:", detected_language)


The output of this code is as below:

Detected language: en

I want to detect regional variations/ dialects like "en-US", "en-GB", "en-AU" etc. as per country/region.
Is it possible to detect such dialects with gcld3?

Please, help on this. Any suggestions are welcome...