Country stats
Closed this issue · 3 comments
Where does the data in countryStats.csv come from and what does it mean? I've been adding a few additional countries and in some cases there is overlap in names and I get a failure when there is no entry here so I'd like to update this as well.
This was the number of Stack Overflow participants from the particular countries back when we worked on genderComputer. This information is used in genderComputer.py
during the initialisation phase:
'''Distribution of StackOverflow users per different countries'''
fd = open(os.path.join(self.dataPath, 'countryStats.csv'), 'r')
reader = csv.reader(fd, delimiter=';', dialect=csv.excel)
self.countryStats = {}
total = 0.0
for row in reader:
country = row[0]
numUsers = float(row[1])
total += numUsers
self.countryStats[country] = numUsers
for country in self.countryStats.keys():
self.countryStats[country] = self.countryStats[country] / total
Thanks! I had some idea of how it was used, but I was curious where the numbers came from. My understanding is that if a country is not explicitly specified and a name appears in lists for multiple countries, than the more one will break the tie, correct?
I am not sure but I guess that it comes from applying the countryNameManager to the StackOverflow data.