google/timesketch

Slow page loads for sketches with high datasource count

mbartle-sf opened this issue · 0 comments

Describe the bug
If a sketch is comprised of more than a few dozen datasources, the requests to /api/v1/sketches/<sketch_id> start to slow down as the server issues dozens of database queries to compile information about all of the datasources related to the sketch. This is exacerbated by #3052 when dozens of timelines must also be loaded and added to the response. Consider removing the datasource from the sketch response, and loading it on demand, instead.

To Reproduce
Use the following script to produce 1000 datasources in a sketch.

from timesketch_api_client import client as timesketch_client
from timesketch_import_client import importer


def upload_n_events(sketch, n):
    for i in range(1000):
        entry = {"message": i, "datetime":"1970-01-01T00:00:00.000Z", "timestamp_desc": "test"}
        with importer.ImportStreamer() as streamer:
            streamer.set_sketch(sketch)
            streamer.set_timeline_name('uploads')
            streamer.add_dict(entry)        


def main():
    client = timesketch_client.TimesketchApi(host_uri='http://127.0.0.1:5000', username='dev', password='dev')
    sketch = client.get_sketch(1)
    upload_n_events(sketch, 1000)


if __name__ == "__main__":
    main()

Then attempt to load the sketch. If Postgres is on the same machine, you'll see the request to /api/v1/sketches/<id> takes a couple of seconds. If the database is on a remote server, the time to load is much higher, approaching the order of minutes.

If you enable postgres logging, you can see that Timesketch is issuing a SELECT query per object related to the sketch, i.e., 1000 queries for 1000 datasources (plus Timeline and sketch queries).

Expected behavior
The sketch loads instantaneously with a database-on-disk, or in a couple of seconds with the database on a remote server.

Desktop (please complete the following information):

  • OS: macOS Sonoma 14.4.1
  • Browser: Firefox
  • Version: 124.0.2 (64-bit)

Additional context
We prefer to load large timelines to our Timesketch server in batches, to make request sizes more reasonable, which is how we can end up with hundreds or thousands of datasources.