A program for counting the occurrence of male and female Swedish names in a text file.
Try the live web version: genuskollen.se
A series of functions for counting gendered Swedish names in text. Import the gendercounter.py script along with the tsv files containing names and frequencies:
female250.tsv
male250.tsv
The first two dictionaries list female and male names respectively, along with the frequency of occurrence according to Statistics Sweden (2015). Also, the following pronouns can be counted.
- hen/henom/hens
- hon/henne/hennes
- han/honom/hans
import gendercounter # Just import as usual.
example = """Karl och Lisa promenerar på gatan och
tycker om Ove och Stina och Bertil.
Hon, han, hen, henom. Hon hens"""
text = gendercounter.from_string(example)
text.genderfrequency()
{'Men': 3, 'Women': 2}
text.pronounfrequency()
{'han': 1, 'hans': 0, 'hen': 1, 'henne': 0, 'hennes': 0, 'henom': 1, 'hens': 1, 'hon': 2, 'honom': 0}
# Returns names and the number of people with that name in Sweden.
text.names()
{'Men': {'Bertil': '68261', 'Karl': '209908', 'Ove': '33731'}, 'Women': {'Lisa': '31611', 'Stina': '19071'}}
textfile = gendercounter.from_textfile("testtext.txt")
textfile.genderfrequency()
{'Men': 15, 'Women': 7}
gendercounter.from_textfile("testtext2.txt").genderfrequency()
{'Men': 43, 'Women': 29}
from os import listdir
for file in listdir('.'):
if file.endswith('txt'):
print(file)
for k, v in gendercounter.from_textfile(file).genderfrequency().items():
print(k, v)
for k, v in gendercounter.from_textfile(file).pronounfrequency().items():
print(k, v)
print("-" * 10)
testtext2.txt Men 43 Women 29 han 1 henom 0 hennes 0 henne 0 hon 0 hen 0 honom 0 hens 0 hans 0
testtext.txt Men 15 Women 7 han 1 henom 2 hennes 2 henne 2 hon 2 hen 2 honom 0 hens 2 hans 0
If you work with large datasets, check out the concurrency.ipynb
notebook for an example of concurrent computing.
- Some names are also frequent Swedish words, for example "De".
- Uncommon names (less than 250 occurrences) were excluded.
- Some names are gender neutral ("Charlie", "Mario", "Alex" etc.)
- With the pronoun counter, the words "Han" and "Hans" can also be Swedish male names.
- ??? (Please let me know if you find other sources of error)
- "De"
- "Del"
- SexMachine (English language, written in Python)