djcunningham0/multielo

Use simpler data structure for Tracker.player_df

djcunningham0 opened this issue · 1 comments

There's not really a need for the players to be stored in a pandas dataframe in the Tracker object. A list or set would work just as well since we're really just using it to store the players in the tracker. This will require refactoring of most of the methods in the Tracker class, but the changes should be straightforward.

This change would probably cause an improvement in performance by removing the overhead from creating a Pandas dataframe. This improvement would only be important/noticeable on very large datasets -- the calculations are fast enough that performance is not an issue for small-scale applications.

Note: This would be a breaking change to the Tracker API. We would be removing the player_df attribute (and changing it to a list or set), so any code that directly accesses that attribute would break. This is probably not common.

Done in the develop branch. Replaced Tracker.player_df (a pandas dataframe) with Tracker.players (a list of Player objects). In testing, this change made Tracker.process_data about twice as fast.