Country stats

Question

Country stats

Closed this issue 4 years ago · 3 comments

Where does the data in countryStats.csv come from and what does it mean? I've been adding a few additional countries and in some cases there is overlap in names and I get a failure when there is no entry here so I'd like to update this as well.

Answer 1 · 2020-09-22T21:04:08.000Z

This was the number of Stack Overflow participants from the particular countries back when we worked on genderComputer. This information is used in genderComputer.py during the initialisation phase:

'''Distribution of StackOverflow users per different countries'''			
		fd = open(os.path.join(self.dataPath, 'countryStats.csv'), 'r')
		reader = csv.reader(fd, delimiter=';', dialect=csv.excel)
		self.countryStats = {}
		total = 0.0
		for row in reader:
			country = row[0]
			numUsers = float(row[1])
			total += numUsers
			self.countryStats[country] = numUsers
		for country in self.countryStats.keys():
			self.countryStats[country] = self.countryStats[country] / total

Answer 2 · 2020-09-22T23:37:44.000Z

Thanks! I had some idea of how it was used, but I was curious where the numbers came from. My understanding is that if a country is not explicitly specified and a name appears in lists for multiple countries, than the more one will break the tie, correct?

Answer 3 · 2020-09-23T07:49:57.000Z

I am not sure but I guess that it comes from applying the countryNameManager to the StackOverflow data.