snowplow/snowplow-python-tracker

Change default method to POST in emitter

matus-tomlein opened this issue · 3 comments

As we have been doing on other trackers, we should also move to POST as the default method for making requests to the Collector.

I think there is one thing to consider before we do that: If using GET, the behaviour is intuitive – an event is sent right after it is tracked. If we move to POST with default buffer size 10, users might get confused because the tracker won't send any events if they track less than 10 events unless they call flush(). Also, since we only have an in-memory event queue, some events may get lost of users are not aware of the buffering. I can see several options on how to make this experience less confusing for users:

  1. Set the default buffer size to 1
  2. Document better that one needs to call flush when they want to see events sent to the collector and also before closing their app
  3. Add a timer to the emitter that automatically flushes the events after some time (similar like mobile trackers but I don't think that this pattern is suitable for server-side apps)

I think that we should go with option 1 as that is the most intuitive when users are starting with the tracker. We should also document better how to make use of the buffer and how to call the flush. Opening this for discussion if there are other opinions.

Is there a reason we have different buffer size (is this the same as batch size in the Java tracker?) in the different SS trackers?

It makes sense to have a default of 1 to me, but I'm also not 100% clear on the benefits of the buffer (presumably it stops too many requests being sent and slowing things down?) .

Yeah, we should have the same buffer/batch size in all server-side trackers. When we made POST default in the Java tracker, we didn't change the batch size, we kept it at 50. So 50 events have to be tracked or the flush called before a request is made. I guess if it's documented well, it's probably not a problem...

Right, the benefits of having a larger buffer size is that you make less requests over the network to the Collector which is more efficient – less network connections to estabilish and manage on the client. If there are a lot of events being tracked, it might exhaust the client resources and potentially reach the limit for the number of open network connections. But it is an optimization strategy to increase the buffer size, that's why I think we should start with 1 and let users optimize it when they need to.

Just noting here that we decided to keep the default buffer size 10 for POST requests since buffer size 1 is not a suitable setting for server-side apps (would cause too many network connections for busy servers) and a dangerous default.