bug: rrule is very slow
Closed this issue · 13 comments
rrule is very slow, for a 1.7MB ical evaluating 7days of events with with ~80 recurring events in calendar
for dayidx in range(7):
recurring_ical_events.of(calendar).at((day.year, day.month, day.day))
I get the following profiling with i5 CPU:
ncalls tottime percall cumtime percall filename:lineno(function)
635 5.837 0.009 5.837 0.009 {method 'read' of '_ssl._SSLSocket' objects} // google calendar download 5.8sec
966170/963055 5.584 0.000 10.756 0.000 rrule.py:774(_iter) // RRULE eval 5.6s
925050 2.324 0.000 2.453 0.000 rrule.py:1276(ddayset)) // RRULE eval 2.3s
35101 1.395 0.000 14.259 0.000 rrule.py:1381(_iter)/ RRULE eval 1.4s
1965581/1033825 0.844 0.000 12.126 0.000 {built-in method builtins.next}
933067 0.798 0.000 0.798 0.000 {built-in method combine}
71890 0.771 0.000 2.094 0.000 parser.py:321(parts)
933058 0.619 0.000 0.619 0.000 {built-in method fromordinal}
I have many old/ended recurring items.
Is ";UNTIL=" checked before iterating RRULE period?
Hi, could you provide the ICS file and the script so people trying this talk about the same results as you do?
This module uses the dateutil. I wonder then, if this is also relevant for them.
Hi, sorry I only have my private calendar.
Others are also saying builtin rrule can be very slow.
https://stackoverflow.com/questions/1336824/python-dateutil-rrule-is-incredibly-slow
Can we prefilter with ;UNTIL= parameter before we pass anything over to rrule?
Otherwise if I have a biweekly task from 2010 it will have to crawl over 10years if it's not smart enough.
Reading the question, it seems that using the between
function might be fast.
Also, using rrule.between() to get dates within a given interval is very fast.
Currently, we use the iteration, see
Maybe using between()
would speed it up?
rrule only:
- .at: 9.128sec
- .between: 9.094sec
with prefiltering I would expect 0.1sec.
How would prefiltering work?
Also, with using between()
, I meant rrule.between
, not this module's between
function.
eg
FREQ=WEEKLY;UNTIL=20191023;BYDAY=TH;WKST=SU
UNTIL part already parsed in the code:
rule_list = rule_string.split(";UNTIL=")
rule_list[1]
if UNTIL >= datetime.now():
pass event/line over to rrule
else:
ignore event
I think, there are some optimizations which can be taken:
- change the UNTIL parameter in the string
- use the
rrule.between()
function instead of plain iteration (inc=True should be tested)
Also, having a test event would be great. Is it possible that you identify the event which takes so long and post it here with the code which takes long? This way, we can really optimize - at the moment, I am still not sure how to properly address it.
If you like to contribute code, you can also start adding a (failing) test and create a pull request, see the issue template.
Please find test case attached, querying 28days takes 31sec
ncalls tottime percall cumtime percall filename:lineno(function)
206871 2.193 0.000 4.556 0.000 rrule.py:774(_iter) <-- slowest 2sec
133784 2.119 0.000 12.167 0.000 recurring_ical_events.py:131(__init__) <-- slow 2sec
581423 2.023 0.000 2.023 0.000 {method 'replace' of 'datetime.datetime' objects} <-- slow 2sec
238440 1.611 0.000 7.950 0.000 rrule.py:1381(_iter)
612392 1.128 0.000 1.965 0.000 caselessdict.py:56(get)
133784 1.071 0.000 2.799 0.000 recurring_ical_events.py:197(make_all_dates_comparable)
238440 1.015 0.000 10.964 0.000 recurring_ical_events.py:228(__iter__)
878548 0.833 0.000 2.481 0.000 recurring_ical_events.py:45(convert_to_datetime)
34264 0.826 0.000 1.111 0.000 rrule.py:426(__init__)
Isnt that issue solved now?