/linguistic_variation_data_analyses

Data engineering and analysis of linguistic fonetic interviews of undergraduate students at the State University of Londrina, Brazil.

Primary LanguageJupyter Notebook

Linguistic Variation Data Analysis

This repository contains all data engineering and analysis from a graduate research on social linguistic variations.

ABOUT THE RESEARCH

"This research aims to analyze the linguistic and identity properties of undergraduate students at the State University of Londrina, employing the theoretical and methodological principles of Variationist Sociolinguistics. Additionally, it seeks to describe the linguistic variety of these students, considering phonetic and phonological aspects, as well as the implications of these phenomena in interaction within a geographically diverse environment where different linguistic variants are present. To achieve the proposed objectives, data were collected from students in their first and final years of undergraduate programs. These data were stratified to analyze the linguistic patterns of academic discourse. Understanding these patterns could offer insights into the social dynamics within academic communities, shedding light on how language functions as a tool for identity construction and social interaction." - Ana Paula Silva

PHENOMENON

"The four most productive were selected for this analysis: palatalization of /t/ and /d/; rhotic variants; /s/ as a plural marker; and deletion of /d/ in the gerund.

For the palatalization of /t/ (tia ~ tʃia) and /d/ (dia ~ dʒia), the analysis focused on the context following alveolar stops, considering whether they were followed by /i/. It was hypothesized that when the speaker chooses to pronounce /i/, they would also use [dʒ], and with the vowel /e/, they would prefer [d]. Additionally, the position in the word—beginning, middle, or end—was also examined.

Regarding the deletion of /d/ in the gerund (cantando ~ cantan∅o), the goal was to verify the consistent presence of this process in the academic discourse, analyzing it based on social variables.

To investigate /s/ as a plural marker (vamos ~ vamo∅), two linguistic analysis criteria were established. Following Scherre (1978; 1988), the elimination of plural markers generally occurs only in determiners or elements in the first position. Therefore, the position of the word in the phrase was chosen as the first criterion. For the second criterion, the following context was examined to test the hypothesis that /s/ initiating the next word would favor the presence of /s/ as a plural marker for the previous word (a[s] saias).

Rhotic variants, on the other hand, were required to be in the coda position (partir), medial, or final. It's important to note that final syllable coda contexts with resyllabification and the occurrence of the tap variant as a result of /r/ ceasing to be a coda and becoming an onset, as in "mar aberto," were excluded from the analysis. The occurrence of deletion (comer ~ come∅) was also examined to test the hypothesis of a gradation in rhotic deletion. The analysis categorized words based on their size and the position of /r/ in the word: beginning, middle, or end. Furthermore, words were divided into two categories: verb and noun." - Ana Paula Silva

DATA ENGINEERING

Within the data folder can be found all the .docx files with the fonetic transcriptions of the interviews used for this research. Within the data_structure folder, each fonetic phenomenom data sets were organized and created in seperate jupyter files. What happens there is the extraction of the .docx files into text files where the re library, also known as the python's regular expressions library, comes in great use to create specific expressions that can identify words with each phenomenon. Combining this with pandas library it was possible to meticulously create organized dataframes that in the future comes handy for the data analysis. The end results for each phenomenon can be found below. One important note observe is that each word for each table shown below have data regarding the subject that was interviewed which are subject number, undergrad major, year of college (either first or last) and sex.

PALATALIZATION

image

DELETION OF /d/ IN GERUNDS

image

/s/ AS A PLURAL MARKER

image

RHOTICS VARIANTS

image

DATA ANALYSIS

Within the same jupyter files the analysis of each of the data sets shown above was preformed. Here are the interesting results found for each phenomenom

PALATALIZATION

Below are the counts of the combination of each patalization and the acompanied target vowel with their percentages.

image

Next up are the counts of the combination of patalizations with the acompanied target vowel and the positions (alone, beginning, middle or ending) where the patalization is found within the word. Included are also the percentages of these ocurrences.

image

DELETION OF /d/ IN GERUNDS

The below table shows the counts of words that did and didn't have the deletion of the /d/ in gerunds with their respective percentages.

image

Next up is the same case but now it will be divided by undergrad major, year of college and sex. Each case of division will be in seperate tables.

image

image

image

/s/ AS A PLURAL MARKER

The table belows shows the count of words where the /s/ in plural words were included or deleted and the percentages.

image

Next up is the same but seperating by undergrad major, year of college and sex.

image

image

image

RHOTIC VARIANTS

The table belows shows the count of words with rhotic variants and their percentages.

image

Next up are respectively the counts and percentage tables of the cases where the /r/ was deleted and if these cases were for nouns or verbs. Count:

image

Percentage:

image

Next up are the counts and percentage tables of the cases where the /r/ was deleted and if these cases were big (more than 8 characters), medium (Between 5 and 7 characters) or small (less than 5 characters). Count:

image

Percentage:

image

Finally below are tables are the cases where the /r/ is deleted but now divided by undergrad major, year of college, sex and also the postition in were the deletion ocorres within the word in which it was found.

image

image

image

image

CONTACT

DATA ENGINEER AND ANALYSIS - Douglas Sanini

RESEARCH, DATA COLLECTION, ANALYSIS - Ana Paula Silva