segmentio/analytics-python

The lib analytics in Databricks not delivery 100% of calls

gevored opened this issue · 7 comments

Hello

I am trying to iterate from a table using python in Databricks and send a identify and track call to a python source in Segment, but I noticed is not every rows/calls that arrive in segment, I didn't receive any erro when I execute analytics.identify() or nalytics.track(), only a message on console "analytics-python queue is full" .

in my tests the volume of data is 50k of rows with 8 columns (properties)

only 37k is arriving in Segment
flow rate: 5.8k/min

our goal is send around 10 MM of calls by day from our data lake

My questions :

  1. Is there a way to garantee which every calls arrive in Segment ?
  2. Is there a way to increase the flow rate ?

Thanks

@gevored Is this in a prod env or are you using the debugger in app.segment.com?

prod env, I am using this lib to send calls directly to Segment

Thank you for that clarification. Let us review and get back with you shortly.

@gevored Without asking for other specific info, I can only assume that you are looking at the segment debugger to validate the messages you are sending? If this is the case, the debugger by design does not record every single entry sent to it.
Screen Shot 2022-11-30 at 3 12 09 PM

Yeah the Debbuger have just a sample of the data,but actually I compared through the Schema data source tab, before starting and after finishing
image

in parallel I am using a http post and this solution is delivering 100% of the calls, but I expected this python lib solution to have a better result

@gevored Can you enable logging and verify the records that are being sent are accurate and let us know the results?
https://segment.com/docs/connections/sources/catalog/libraries/server/python/#logging

Thank you,

Closing this issue. Please reopen if you keep seeing this behavior, we would like to get some logs to understand the issue.