[FEAT] Python scripting: format GSheet csv to json via Google API data pull
ngiangre opened this issue · 7 comments
⚠️ IMPORTANT: Please fill out this template to give us as much information as possible to consider/implement the feature.
Prerequisites
- check this box if you have completed the following:
- Reviewed the contributing guidelines and support files
- Reviewed the README file for the repository you are working in
- Searched for relevant instructions on our Discord server
- Searched the issues of the repository you are working in to make sure one was not already filed
Summary
This is a modular issue sprouting from #643
All the translations are in an accurate format and now we need to pull the sheets (English, Spanish, French, Italian, Dutch (Netherlands), Russian) via the google API into the respository for front end developers to reference by the json keys.
There is a link in the python script src/python/pull_gsheet_data.py
for creating your own google api key. This python script is just a start and needs more development.
each translation sheet has parentKey, childKey, fieldKey, value, translatedValue, and then other columns. We need a structure of { 'parentKey' : { 'childKey' : { 'fieldKey' : { 'value' : '', 'translatedValue' : '', ... } } } }. In the case of education, { 'parentKey' : { 'childKey' : [ {'value': '', ... }, {'value' : '', ... }, ... ] } }
The Date attributes are not needed - filter those out.
The resulting json files should go into public/locales/
though I put them in docs/content
to not mess things up.
Here's some extra code I already started using that might be helpful:
`
Init sheet names and output dir
data_model_sheet_name = "Data Model"
education_sheet_name="Education"
health_sheet_name = "Health"
translation_sheets_regex = " - Master Sheet"
translation_sheets_not_regex = "OLD"
languages_to_pull = ['English','Dutch (Netherlands)','Spanish','Italian','French','Russian']
out_dir = "../../docs/content/"
Set dictionary to connect languages to two-letter abbreviations
#https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
language_letters_dict= { 'English' : 'en', 'Dutch' : 'nl', 'Spanish' : 'es',
'Italian' : 'it', 'French' : 'fr', 'Russian' : 'ru'}
`
Motivation
We need to have translations.json files in locales to support other languages
Possible Alternatives
only english and then hard coding.
Additional Context
Please comment here for more detail or to work through fixing the issue. You can ask @ngiangre for assistance on python scripting.
I'm happy to remind myself how to work with python!
go ahead @pavel-ilin!
This still needs work!!
My bad - didn't think about how the "fixes" keyword would close this!
no worries haha I’m barely keeping up. ISSUE IS BACK OPEN! We need to create nested jsons from google api pulled csv files!
Some progress on this. Here's a preview
and an example where an array of values would be favorable:
I posted a translation.json in the #engineering channel on discord if y'all want to see the full json.
Let me know if this looks good and would be workable! @AdhamAH @pavel-ilin @SomeMoosery
There has been tremendous work done on this by @kristianr on discord - thank you!!
We have one more step - converting key strings with >20 characters into shorter strings using common nlp filters, stemming, removing stop words, etc.
The goal is to make a representative and short key string for the education facts and quizzes. This would be a medium priority issue that would be an easy integration into the current algorithm that @nickg and @kristianr have on discord.
Can someone one one work on this?