Katherine Kairis, kak275@pitt.edu, 10/3/2017
Visitors log: https://github.com/Data-Science-for-Linguists/Shared-Repo/blob/master/todo10_visitors_log/visitors_log_Katherine.md
In this project, I compare the speech of native and non-native speakers of English. I use two different corpora -- the Vienna-Oxford International Corpus of English (VOICE), which contains conversations between mostly non-native English speakers, and the spoken files of the British Nation Corpus (BNC), which contain conversations between native English speakers. I make comparisons between bigrams and the hesitations between speakers in VOICE and BNC. I also make these comparisons between different L1 groups in VOICE -- Germanic L1s, Romance L1s, and Slavic L1s.
- Vienna-Oxford International Corpus of English (VOICE) -- https://www.univie.ac.at/voice/
- British Nation Corpus (BNC) -- http://www.natcorp.ox.ac.uk
- Data(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/tree/master/data) - contains data for VOICE and BNC sample
- Images(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/tree/master/images) - contains graphs from the analyses
- BNC_data.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/BNC_data.ipynb) - code for processing BNC data
- BNC_data.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/BNC_data.md) - markdown version of BNC_data.ipynb
- LICENSE.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/LICENSE.md) - contains license info for VOICE and BNC
- LICENSE_notes.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/LICENSE_notes.md) - contains justifications for sharing data
- VOICE_data.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/VOICE_data.ipynb) - code for processing VOICE data
- VOICE_data.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/VOICE_data.md) - markdown version of VOICE_data.ipynb
- analysis-L1s.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-L1s.ipynb) - code for analyzing specific L1s in VOICE
- analysis-L1s.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-L1s.md) markdown version of analysis-L1s.ipynb
- analysis-bigrams.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-bigrams.ipynb) - code for analyzing bigrams in VOICE and BNC
- analysis-bigrams.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-bigrams.md) - markdown version of analysis-bigrams.ipynb
- analysis-hesitations.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-hesitations.ipynb) - code for analyzing hesitations in VOICE and BNC
- analysis-hesitations.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/analysis-hesitations.md) - markdown version of analysis-hesitations.md
- exploring_VOICE.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/exploring_VOICE.ipynb) - preliminary data processing for VOICE
- final_report.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/final_report.md) - final report
- first_progress_report.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/first_progress_report.ipynb) -- first project report
- presentation_notes.pdf(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/presentation_notes.pdf) -- slides from presentation
- progress_report.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/progress_report.md) - progress reports updated throughout the term
- project_plan.md(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/project_plan.md) - contains plan for the project
- second_progress_report.ipynb(https://github.com/Data-Science-for-Linguists/Native_and_Non-native_English/blob/master/second_progress_report.ipynb) - second project report