smcgivern/cricket-query

Annoyance: players with the same name

Closed this issue · 0 comments

https://sean.mcgivern.me.uk/cricket-query/help/#players-with-the-same-name

Players do not have a unique identifier beyond their name. For most players this is OK as their names are unique in their field, at least when combined with the team they play for. However, there are some who are not unique even with that qualifier. For instance, there are two JP Duminys who played for South Africa: Jacobus Petrus Duminy and Jean-Paul Duminy.

Often these players can be further disambiguated by also grabbing their debut (MIN(start_date)), but it's not particularly convenient.

We could fix this in https://github.com/obrasier/cricketstats by collecting the player's URL from Statsguru too (maybe a numeric ID would be better, as this would survive Cricinfo URL scheme changes better).

An initial hack might be to do this post-processing using the last sentence: in https://github.com/smcgivern/cricket-query/blob/main/scripts/create-db, generate a player sequence ID based on the player, MIN(start_date) pair. But that will not necessarily be as stable.