In-Memory cache fills up indefinitely by default
halcy opened this issue · 4 comments
Right now, the in-memory cache only ever gets fuller. When pulling a lot of data from the API, this can start to be a problem. I'd propose either explicitly mentioning this and how to reset the cache in documentation, or (ideally) having a maximum size for the cache / maximum number of stored objects, with some expulsion policy (LRU?), and a sane default value to start off with (but this might be hard to do properly).
Hey - been planning on updating the caching stuff eventually to have configurable expiration periods for each type + some eviction policy. Haven't gotten around to it yet. For now you can just set the datastore to a new instance of Cache to clear it.
After almost two years we finally get to close this issue. This was fixed / added over a series of recent commits.
Some details:
The data sinks now support expiration timeouts! I.e. you can specify how long you want data to "live" in the cache or diskstore. If a piece of data is accessed that has expired, it will be removed from the data sink and the data will be refreshed. Currently, data will not be removed from the data sink until it is accessed, and it's up to you as the user to remove "old" data if it's taking up too much space / memory. You can do this via one of the following two methods: The settings (available via cass.configuration.settings
) now has 1) .clear_sinks
and 2) .expire_sinks
which 1) remove things from all data sinks and 2) remove all expired data from all data sinks. You can specify how long you want data to live in a datasink in your settings.
Hi @jjmaldonis, I have encountered the same issue but even after doing .clear_sinks
and .expire_sinks
the issue persists. After maybe 5 minutes the memory usage grows from 70-120MB.
Here is some code which I am using to reproduce the issue, I have removed all other parts which could cause memory leaks:
def crawl_region(region):
print('Starting crawling {}'.format(region))
last_index = 0
summoners = []
challenger_league = cass.get_challenger_league(queue=Queue.ranked_solo_fives, region=region)
master_league = cass.get_master_league(queue=Queue.ranked_solo_fives, region=region)
for s in challenger_league:
summoners.append(int(s.summoner.id))
for s in master_league:
summoners.append(int(s.summoner.id))
while last_index < len(summoners):
try:
summoner = Summoner(id=summoners[last_index], region=region)
last_index += 1
end = datetime.now()
start = end - timedelta(days=7)
match_history = cass.get_match_history(summoner, queues={Queue.ranked_solo_fives, Queue.blind_fives,
Queue.ranked_flex_fives, Queue.normal_draft_fives},
begin_time=start, end_time=end, region=region)
for match in match_history:
participants = [participant for team in match.teams for participant in team.participants]
for p in participants:
side = 0 if p.team.side == Side.red else 1
role = derive_position(p.timeline.lane, p.timeline.role)
cass.configuration.settings.clear_sinks()
cass.configuration.settings.expire_sinks()
except:
cass.configuration.settings.clear_sinks()
cass.configuration.settings.expire_sinks()
To get this code to work I had to wrap part of the expire_sinks
method using a try catch block, namely:
for sink in self.pipeline._sinks:
for type in types:
try:
sink.expire(type)
except:
pass
This was done because I got the following error:
File "C:\ProgramData\Anaconda3\lib\site-packages\cassiopeia\datastores\cache.py", line 137, in expire
self._cache.expire(type)
AttributeError: 'Cache' object has no attribute 'expire'
Even after I dug around the code in the library and set the default expiration in cache.py
to 0 the issue persists.
Versions used:
cassiopeia: 3.0.15
merakiommons: 1.0.2
python: 3.6 (anaconda)
Thanks for the detailed bug report. We'll get a new release out asap to fix this.