unytics/airbyte_serverless

The catalog file is not written in full by the time the connector starts

zxqfd555-pw opened this issue · 3 comments

Hi!

We're using airbyte-serverless in the Pathway framework as a connector to airbyte sources.
Recently we've run into an issue with the internally serialized catalog file not being JSON-readable. We're using an airbyte github connector, but it doesn't seem an important detail.

I've analyzed the stack trace we've got and found a suspicious place there:

ValueError: Could not read json file /mnt/temp/catalog.json: Expecting ':' delimiter: line 1 column 8192 (char 8191).

So, what happens is that the code tries to read the catalog file which is created here with the usage of json.dump, but stumbles on the character 8192 (out of ~65K chars - I did output it locally to estimate the size we should have) which looks like the end of a filesystem block/chunk.

My guess for the reason is the fact that the opened file is not closed straight away, hence leaving some random amount of time for the file not to be fully written, which results in the airbyte connector's docker image starting before this is done in some rare unlucky cases.

If so, the explicit close/context manager usage should help here. Could you please look into the issue and confirm or reject my assumptions? I can send a PR with the supposed fix if that helps.

Thank you in advance!

Thanks a lot Sergey for opening this issue.

Being explicit in closing the file being written in any case is a good idea.

If you open a PR on this, I'll merge it.

@unytics I've created a PR with a little fix for that: #6.

Hi Sergey,

I finally merged the PR.
Hope it solved your problem.

It it's not. Please reopen the issue.

Cheers,