Improve time of parsing larger .ics files
Closed this issue · 9 comments
The ICS file of my personal Google calendar, which includes entries dating back to 2012, is approximately 680 KB and encompasses nearly 1700 events. It takes approximately 14 seconds for python-ics
to parse all entries which considerably delays the startup time of calcure
.
It would be great if calcure
gains the ability to optionally filter entries and discard those which are too far from today's date.
This is a very basic (and likely buggy) implementation which only considers entries from current year:
# loaders.py
def read_file(self, path):
...
with open(path, 'r', encoding="utf-8") as file:
lines = self.read_lines(file)
lines = self.filter_lines(lines.splitlines())
return lines
def filter_lines(self, lines):
event_lines = list()
out_lines = list()
in_event = False
it = iter(lines)
for line in it:
if line in 'BEGIN:VEVENT':
in_event = True
event_lines.append(line)
continue
if not in_event:
out_lines.append(line)
continue
if line.startswith('DTSTART') and not str(datetime.datetime.today().year) in line:
event_lines = []
in_event = False
for x in it:
if 'END:VEVENT' in x: break
continue
event_lines.append(line)
if line in 'END:VEVENT':
out_lines.extend(event_lines)
event_lines = []
in_event = False
return '\n'.join(out_lines)
In my case it resulted in a greatly reduced file size and parse time:
# import ics; ics.Calendar(open('file').read())
Time to read 679 kB file: 12.71 s
Time to read 62 kB file: 0.91 s
Calcure loading time:
original file: 13.2 s
reduced file: 1.25 s
no external ics file read: 0.3 ms
Limiting my view to only events from the current year is a price I am willing to pay if the program starts in about a second.
It may also be worth mentioning that perl
is able to process the ICS file significantly faster:
use iCal::Parser;
my $parser=iCal::Parser->new();
my $hash=$parser->parse(shift);
# original file: 5.4 s
# reduced file: 0.4 s
Hi, sorry for the delay. I think your idea looks pretty good, I'll implement it with some user parameters to control the range. Feel free to make a PR with this snippet.
About perl, it's cool, but I'd prefer not to introduce an additional dependency to improve in a niche feature, let's keep it in python.
p.s. although I wish the range feature was implemented directly in pyics library.
About perl, it's cool, but I'd prefer not to introduce an additional dependency to improve in a niche feature, let's keep it in python.
The reason I mentioned perl
was that perhaps it would be better to look for other tools providing a way to filter events (if there are any) - ideally by multiple criteria and then optionally ask calcure
to run them after load.
Today I made a (hopefully) interesting discovery: first I configured vdirsyncer
to sync the data from Google Calendar .ics
file to a local vdir format. The original 680kB .ics
file was converted into 1681 individual .ics
files stored in a single directory, combined size of all files grew to 2.7 MBs (some extra lines/sections were added).
Then I installed khal
and ran ikhal
to start an interactive session. Time of the first start was 4.67 seconds (which is quite decent already), but subsequently as khal
uses sqlite
for caching (?) it got under a second each run.
To me it seems there's definitely a lot of room for improvements.
As I was digging through khal
code I noticed that unlike calcure
they're using icalendar
(not ics
). I decided to make a simple comparison and the differences are stunning.
First the code that loads the same original
(~600kB) file from previous examples:
import icalendar
import ics
import timeit
def ics_load():
cal = ics.Calendar(data)
print(f'Number of events loaded: {len(cal.events)}')
def ical_load():
cal = icalendar.Calendar.from_ical(data)
events = []
for component in cal.walk():
if component.name == 'VEVENT':
event_name = component.get('summary')
event_start = component.get('dtstart').dt
events.extend([f'Event: {event_name} {event_start}'])
print(f'Number of events loaded: {len(events)}')
with open('basic.ics') as f:
data = f.read()
ics_time = timeit.timeit("ics_load()", globals=globals(), number=1)
ical_time = timeit.timeit("ical_load()", globals=globals(), number=1)
print(f'ICS load: {ics_time}')
print(f'ICAL load: {ical_time}')
And here is result of an execution:
Number of events loaded: 1695
Number of events loaded: 1695
ICS load: 12.023302923000301
ICAL load: 0.4829574479999792
As a matter of fact I found the numbers too good to be true but on the other hand I do see the event names and times. I would very much welcome having your opinion.
Wow, that's interesting! Initially I went with ics
library because I got a working version quicker, and syntax is cleaner, but indeed this library has its issues and clearly loading time is too long. Basically, we only need to parse the following fields in loaders.py
:
event.name
event.all_day
event.begin.year
event.begin.month
event.begin.day
task.name
task.priority
task.due.year
task.due.month
task.due.day
So if it is possible with icalendar
library, we might switch to it. That would solve this issue without creating filters.
So if it is possible with
icalendar
library, we might switch to it. That would solve this issue without creating filters.
All that seems to be supported with icalendar
:
# if component.name == 'VEVENT'..
event.name # str(component.get('summary'))
event.all_day # component.get('dtstart').params.get('VALUE') == 'DATE'
event.begin.year # component.get('dtstart').dt.year
event.begin.month # component.get('dtstart').dt.month
event.begin.day # component.get('dtstart').dt.day
# if component.name == 'VTODO'..
task.name # str(component.get('summary'))
task.priority # component.get('priority')
task.due.year # component.get('due').dt.year
task.due.month # component.get('due').dt.year
task.due.day # component.get('due').dt.day
Made an experimental branch with event parsing handled by icalendar
- startup performance-wise it looks quite promising. I am not currently using any todo (.ics
with VTODO
) items. @anufrievroman Could you perhaps add a few to the repo itself? Thanks!
Here is a little example of tasks.ics
file with a few tasks following nextcloud standard:
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Sabre//Sabre VObject 4.4.2//EN
PRODID:-//Nextcloud Tasks v0.14.2
PRODID:-//Nextcloud Tasks v0.14.5
BEGIN:VTODO
UID:6cb1fd92-2eb5-43a1-a9d2-a3bbf9dc7b7c
CREATED:20230219T100637
LAST-MODIFIED:20230219T100708
DTSTAMP:20230219T100708
SUMMARY:Task with deadline from nextcloud
DUE;VALUE=DATE:20230224
END:VTODO
BEGIN:VTODO
UID:7eb4a1e2-4dd3-4629-8be7-b1c84f5db465
CREATED:20220116T203728
LAST-MODIFIED:20230218T161231
DTSTAMP:20230218T161231
SUMMARY:Unimportant task from nextcloud
PRIORITY:6
PERCENT-COMPLETE:18
STATUS:IN-PROCESS
END:VTODO
BEGIN:VTODO
UID:bc2f8f98-44ba-4003-ae23-b27b77facf78
CREATED:20230218T154114
LAST-MODIFIED:20230218T161138
DTSTAMP:20230218T161138
SUMMARY:Normal task from nextcloud
STATUS:NEEDS-ACTION
END:VTODO
BEGIN:VTODO
UID:c4dcf921-e819-4a4b-b331-0f017c0df558
CREATED:20230218T154053
LAST-MODIFIED:20230218T161154
DTSTAMP:20230218T161154
SUMMARY:Cancelled task from nextcloud
STATUS:CANCELLED
END:VTODO
BEGIN:VTODO
UID:e5f32ad6-efe0-4e7c-8c8c-ea64afada253
CREATED:20220116T203724
LAST-MODIFIED:20230218T161147
DTSTAMP:20230218T161147
SUMMARY:Completed task from nextcloud
STATUS:COMPLETED
PERCENT-COMPLETE:100
PRIORITY:2
COMPLETED:20230218T154018
END:VTODO
BEGIN:VTODO
UID:e66dec63-f3d3-4f47-b0da-e7ba362195e6
CREATED:20230218T154154
LAST-MODIFIED:20230218T161111
DTSTAMP:20230218T161111
SUMMARY:Important task from nextcloud
PRIORITY:4
END:VTODO
END:VCALENDAR