/TutorialCrossLingualEmbeddings

Introduction to cross-lingual word-embeddings at Wikimania 2019

Primary LanguageJupyter Notebook

Introduction to cross-lingual word-embeddings at Wikimania 2019

Word-embeddings allows machines to measure the semantic distance between a pair of words or sentences. This is done by converting each string (words or sentences) in vectors, allowing to perform mathematical operations with those strings. For example, it is possible to measure the distance between //cat// and //dog//, that might be smaller (so, they are both animals) than the distance between //cat// and //car//.

Recently, researchers have been working in make those embeddings cross-lingual, allowing to measure the distance between strings in different languages. Therefore, translations such as //cat// [en] and //gato// [es], should very similar (ideally identical) in the vector space.

In the research team we have been using those cross-lingual embeddings to create section alignments across different projects, or to align template parameters.

The session will be organized as follows:

First Part: Understanding and playing with cross-lingual word-embeddings

**Second Part: Use cases on section alignment and recommendation **

If you are just interested in using the APIs you are welcome to come just to the second part of session.

Materials and recommendations:

If you want to do hands-on work, and try your own alignments you will need to install some packages and download some data in advance:

If you want to know more about word-embeddings alignments check this repository.