This repository contains around 600 French and Spanish brand names (words
subdirectory), their contexts (up to 10,000) in JSI Timestamped Web corpus ( data/jsi_contexts
subdirectory) and a streamlit app to explore these data (timeline, metadata distribution and evolution, lexico-syntactic patterns).
The streamlit app is publicly available here : Streamlit app.
First check you have Python 3.8+ and Github installed on your computer. Go to the parent folder where you would like to copy the repo then clone the repository
git clone https://github.com/ecartierlipn/brand_names_exploration.git
go to brand_names_exploration
subdirectory and install the dependencies :
pip install -requirements.txt
Then run the streamlit app :
streamlit run data_exploration.py
The local installation enables to (manually) edit the word contexts before exploration (filename for each word : data/jsi_contexts/<fra|spa>_jsi_newsfeed_virt.<word>.complete_csv
).
To retrieve the JSI Timestamped web corpus contexts for each word (or other words you would like to add) and parse the results before exploration, you can use sketchengine_extract_contexts_from_wordlist.py
(you need to provide your sketchengine API key). You can add new words in word list in files in words
subdirectory, just by adding a word in a new line.
-
Linguistic data (words): Natalia Soler
-
Linguistic data (word contexts): JSI Timestamped web corpus
-
Software : Emmanuel Cartier
Check requirements.txt
file for python libraries.