rOpenGov/finpar

Gender variable?

briatte opened this issue · 8 comments

Quick question – I'm looking at these pages (MP biographies on the Finnish Parliament website), and I can't find a way to impute gender from the information provided.

I wrote some rough imputation code, based on first names and photos, but some first names might fit both males and females.

Needless to say, I don't speak Suomi. Is there an easy way to impute gender from first names? Did someone bother to do this for MPs from recent parliamentary sessions (35-36)?

I'm asking in relation to this repo, which builds legislative cosponsorship networks from the last two sessions of the Finnish Parliament.

Thanks in advance for any pointers!

Needless to say, I don't speak Suomi. Is there an easy way to impute gender from first names?

None that I know of, then again it's really not my field on expertise. Gender information would be a useful addition to the main kansanmuisti database, we'll look into this and see if there's any quick fix.

Did someone bother to do this for MPs from recent parliamentary sessions (35-36)?

No, at least I'm not aware of it.

Ok, here it is: https://www.avoindata.fi/data/fi/organization/vaestorekisterikeskus Väestötietojärjestelmän suomalaisten nimiaineistot -> should be usable after some preprocessing. This would be quite generally useful, perhaps we should consider contributing it to the package "gender", or making a function in sorvi that can download and convert this data in our desired format (name -> sex mappings)

I just learned that there is a service called http://genderize.io/ that "determines the gender of a first name". E.g.

http://api.genderize.io/?name=joona&language_id=fi
http://api.genderize.io/?name=leo&language_id=fi
http://api.genderize.io/?name=fran%C3%A7ois&language_id=fr

What's even better, rOpenSci has a R package genderizeR for the API. Will need to test this on the list of current MPs.

Super ! I was only aware of the "gender" R package.

Great! I have other use for this as well.

On Mon, Feb 9, 2015 at 11:12 AM, Leo Lahti notifications@github.com wrote:

Super ! I was only aware of the "gender" R package.


Reply to this email directly or view it on GitHub
#2 (comment).

This rocks!!

Here's a test on 830 Finnish MPs. It seems to work really well: only 11 (~ 1%) names are not "genderized". Problem solved.

Thanks again, this is a really great pick.

Neat! Since this information is generally useful and thus far unavailable, I did a manual check on the names based on your gist. Following female names are categorically classified as male:

sisko

Following male names are categorically classified as female names:

assar
aulis
eeli
eino
jani
janne
kari
lauri
ola
pauli
viljami

Checked and corrected csv of Finnish MPs can be found in finpar/inst/extadat/mp_genders.csv (column correct_gender). Corrected names (as produced by findGivenNames() can be found in finpar/inst/extadat/name_genders.csv (column correct_gender).

I added the Finnish name-gender mappings derived from the Finnish population register (VRK) in the fennica R package. This contains 11230 first names (according to VRK), with gender, gender probability, and name total counts. This may or may not be more comprehensive than genderizeR data for Finnish but the free genderizer API has a limit of 1000 queries a day so this might be helpful with instant analyses.

library(devtools)
install_github("ropengov/fennica")
library(fennica); 
tab <- get_gender_fi()