This is a model of the project, designed to simplify the study of foreign languages. For now, just a sketch of the integration of Spring Boot 2 and JavaFX 8.
Use either pom.xml
or build.gradle
to import project.
TODO read TRN TODO start to read M1 in TRN. Try to identify all TRN with only one M1
https://mvnrepository.com/artifact/org.apache.commons
Слова для заучивания и словарный запас - добавить колонок
*** Memorization algorithm (simplified ANKI):
watchscript - Mark as Known update current view
v Ask to recall x meanings of word v Increase meanings if need LearnWordMeaningsController
TODO how to know number of meanings for each word??? En_Ru_Muller_(18Mb).7z, [m3] or Google http://wiki.fu-lab.ru/index.php/DSL#.D0.98.D1.81.D0.BF.D0.BE.D0.BB.D1.8C.D0.B7.D0.BE.D0.B2.D0.B0.D0.BD.D0.B8.D0.B5_.D1.81.D0.BE.D0.B7.D0.B4.D0.B0.D0.BD.D0.BD.D1.8B.D1.85_.D1.81.D0.BB.D0.BE.D0.B2.D0.B0.D1.80.D0.B5.D0.B9_.D0.B2_Abbyy_Lingvo http://lingvo.helpmax.net/ru/%d0%b2%d0%be%d0%bf%d1%80%d0%be%d1%81%d1%8b-%d0%b8-%d0%b7%d0%b0%d1%82%d1%80%d1%83%d0%b4%d0%bd%d0%b5%d0%bd%d0%b8%d1%8f/dsl-compiler/%d0%ba%d0%be%d0%bc%d0%b0%d0%bd%d0%b4%d1%8b-dsl/ https://lingvoboard.ru/store/html/DSLReference_HTML/dict.html LingvoUniversalEnRu
TODO work with archives, unpack all sources directly
DSL - prepare all classes, domains, ...
Split by words. Starts from "". not from "/t" [m1]: main block [trn] [m2][c brown]: part of speech [m2][ex][c teal]: examples [m3][c saddlebrown]: meanings
trim ['], [/'], [lang id=1033], [/lang]
[m1]: [c lightslategray]{{t}}[ded]{{/t}}[/c] [p]a[/p]
[m2] [c brown]1.[/c] [p]a[/p]
Вытащить:
- название
- транскрипция [
- аббревиатура (verb)
- переводы
- примеры ]
https://ru.glosbe.com/en/ru/easy%20money
TODO get scripts from YouTube || allow to upload them save use
TODO 100% get all meanings from google translate
one base with places (???), frequency (google), level (gse), ... calculate place, level for unknown words/phrases mark them if approximate frequencies for phrases (google)
v UserWordBank - also store wordLevel v Refactor RepeatWordsController. v Use filter by level v Show all/filtered runtime v 20 - global value
ShowCards: v Refactor v Center buttons v TODO count
Copy to LearnWordsController
- v LevelsTemplate::todo attach, test + Controller + Object for selected levels
- RepeatWordsController - beta v2
- LearnWordsController
TODO learn idea from VocabHunter
Идея такая:
- v UserWordBank - единственная база.
- v Добавить статус слова (enum)
- v Шаблон показа карточек
- v Два окна: повторять или учить новые. В первом случае фильтр
- v Ограничение 20
ANKI //TODO import cards and learn words Vanilla SQLite + files https://github.com/ankidroid/Anki-Android/wiki/Database-Structure https://decks.fandom.com/wiki/Anki_APKG_format_documentation https://github.com/dae/anki https://play.google.com/store/apps/details?id=io.lingvist.android&hl=ru
https://controlsfx.bitbucket.io/ http://fxexperience.com/ https://vocabhunter.github.io/
https://www.youtube.com/watch?time_continue=68&v=UHGWre0Nq4g https://ru.wikipedia.org/wiki/Mnemosyne https://github.com/helloworld1/AnyMemo https://4pda.ru/forum/index.php?showtopic=182901 https://www.supermemo.com/en/apps
TODO Mueller24 is truncated. Either 7, or import from En_Ru_Muller_(18Mb).7z Other alternatives: Fora dictionaries, but need import
Hunspell - best solution for search word worms https://github.com/LibreOffice/dictionaries https://extensions.libreoffice.org/extensions/english-dictionaries https://extensions.libreoffice.org/extensions/russian-dictionary-pack
- Splash screen (mock-up)
- Dashboard screen (mock-up)
- WordBank screen (mock-up, w/o filters)
- VideoList screen (mock-up)
- WatchScript screen (mock-up)
- WatchVideo screen (mock-up)
- Dictionary screen (mock-up)
Lemma Importer. Problem writing:
write Verb % 400 100 0.96
@ @ write 109 100 0.95
@ @ writes 26 99 0.89
@ @ writing 63 100 0.95
@ @ written 103 100 0.96
@ @ wrote 99 100 0.90
writing NoC % 64 100 0.92
@ @ writing 53 100 0.92
@ @ writings 11 89 0.89
//TODO in the future Actually we need another approach -> convert normal words to variants. For clean results, need to study again https://www.english.com/gse/teacher-toolkit/user/vocabulary?page=518&sort=vocabulary;asc&gseRange=10;90&audience=GL (has grammatical categories)
//TODO need to identify and translate phrases
Need ID
TODO precache/cache translations in WatchScriptController
-
v Allow to see different dictionaries, change by ComboBox
-
WatchScript: Tab with table & words frequency & level & sort (as in WordBank)
-
Templates to separate folder
-
v Users WordBank (randomly data for now)
-
v WatchScript: connect to it
https://github.com/akuznetsov/russianmorphology https://mvnrepository.com/artifact/org.apache.lucene.morphology
TODO sourcerer -
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' --prune-empty --tag-name-filter cat -- --all
git push origin --force --all
TODO sourcerer - followers, following 2/2 -> 0/0 :(
-
v 2 DataSources, 2 files
-
v Grab site
-
v Raw to WL, analyze
-
v Word::frequency
-
v Word::place
-
v Word::level
-
Match levels & wordbanks from different sources
-
Video - find the most matched videos (level, unknown words)
-
v English::Russian
-
v Dictionary importer
#TODO bugs
Shut down sound from browser when window is closed
Google https://www.thefreedictionary.com/appetitively Many others. Investigate lingualeo google chrome plugin
LeoLingo
https://www.readbeyond.it/aeneas/
https://stackoverflow.com/questions/46598539/how-to-synchronize-mp3-audio-with-text https://stackoverflow.com/questions/40206029/android-sync-audio-with-text https://stackoverflow.com/questions/13422673/looking-for-a-working-example-of-addtimedtextsource-for-adding-subtitle-to-a-vid https://en.wikipedia.org/wiki/LRC_(file_format)
https://github.com/synalp/jtrans
https://stackoverflow.com/questions/6970013/getting-current-youtube-video-time
https://github.com/IonicaBizau/text-to-speech
https://github.com/pettarin/forced-alignment-tools
https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/ https://github.com/cmusphinx/sphinx4
http://labbcat.sourceforge.net/
https://www.geeksforgeeks.org/converting-text-speech-java/
https://github.com/marcoagpinto/aoo-mozilla-en-dict https://github.com/LibreOffice/dictionaries https://en.wiktionary.org/wiki/recruit
https://www.universeofmemory.com/how-many-words-you-should-know/
https://www.universeofmemory.com/how-many-words-you-should-know/ https://apps.ankiweb.net/
https://puzzle-english.com/vocabulary https://puzzle-english.com/vocabulary/6737816?r=b096eea74f Полноценная платформа для изучения инглиша. Довольно круто. Много моих идей уже реализовано
https://www.english-corpora.org/coca/ https://www.corpusdata.org/purchase.asp Полная база стоит денег, но можно искать, смотреть тест где используется
https://www.victoria.ac.nz/lals/resources/academicwordlist База академических слов
http://web.archive.org/web/20070214114211/http://www.dcs.shef.ac.uk/research/ilash/Moby/
http://www.manythings.org/vocabulary/lists/a/ Слова, разнесённые по категориям
http://gen.lib.rus.ec/search.php?&req=Wordlist&phrase=1&view=simple&column=def&sort=def&sortmode=ASC&page=2 Cutting Edge 3 Edition Elementary Wordlist
https://www.wordfrequency.info/purchase.asp Бесплатно 5000 слов. Платно - все банки.
http://martinweisser.org/corpora_site/word_lists.html
http://www.bmanuel.org/clr2_et.html#Moby_Shakespeare http://www.bmanuel.org/clr/clr2_et.html
http://iteslj.org/links/ESL/Vocabulary/Lists/ http://iteslj.org/links/ESL/Vocabulary/Lists/p2.html
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#English Частота слов
http://www.lexically.net/downloads/e_lemma.zip Примерно 12000 слов (нормальная форма -> связи) book -> books,booking,booked
http://ucrel.lancs.ac.uk/bncfreq/ http://ucrel.lancs.ac.uk/bncfreq/flists.html Слова и их формы, так же частота, тип. Очень здорово book NoC % 374 100 0.95 @ @ book 243 100 0.95 @ @ books 131 100 0.94
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Тут есть распределение слов по годам (Google Books), громадные банки слов для многих распространенных языков. Очень много шлака, но можно вычислить популярность слов. Так же можно видеть фразы до 5 слов.
https://github.com/hackerb9/gwordlist Обработанные списки, шлака почти нет.
http://norvig.com/google-books-common-words.txt Ещё список с частотами
https://github.com/first20hours/google-10000-english 10000-20000 слов из того же банка, есть фильтры по матерным словам
http://number27.org/assets/misc/words.txt (the, 6.510891, 0), примерно 87000 слов
http://www.kilgarriff.co.uk/bnc-readme.html BNC database and word frequency lists. Adam Kilgarriff 4907 1222 worldwide adv: ~6000 слов (без мн. чисел)
13073 books nn2 2186 70 books nn2-vvz 64 3 books np0 3 10 books vvz 10 - все слова, указаны их состояния, частота но без связи с основным словом
TODO very cool https://www.english.com/gse/teacher-toolkit/user/vocabulary?page=1&sort=gse;asc&gseRange=10;90&audience=GL Просто очень крутой сайтъ. Уважаю. Слова, частоты, уровни, даже проценты.