Automated benchmarking for language-optimized LLMs. Evaluates both grammatical accuracy and semantic closeness of translations.
Primary LanguageJupyter NotebookApache License 2.0Apache-2.0