apify/apify-cli

`actor:push-data` doesn't work in the Apify Platform

netmilk opened this issue · 4 comments

I found out the apify actor:push-data command works locally, but doesn't work when I apify push it to the platform and then I apify run iit. I tried my best to replicate the remote environment locally by exporting the same environment variables I logged in the Actor Run in the platform - and it fails even locally.

I also found the problem is that the CLI never picks the ActorStorage, but always uses the MemoryStorage because this condition is always true. I overrode it by setting the forceCloud variable to true and it works as expected. I assume non of the apify actor:* commands acutally work in the Apify Platform, because they use this code as well.

What would be the best course of action here? Would you accept a PR for something along the lines of forceCloud = Boolean(Number(process.env.APIFY_IS_AT_HOME))

My example is as simple as this: curl http://httpbin.org/ip | apify actor:push-data and it fails with the following error message in the Apify Platform Run:

2024-04-18T14:24:08.937Z Error: Dataset with id: GpWJMhs1EcaSl0bDo does not exist.
2024-04-18T14:24:08.940Z     at DatasetClient.throwOnNonExisting (/usr/local/lib/node_modules/apify-cli/node_modules/@crawlee/memory-storage/resource-clients/common/base-client.js:15:15)
2024-04-18T14:24:08.942Z     at DatasetClient.pushItems (/usr/local/lib/node_modules/apify-cli/node_modules/@crawlee/memory-storage/resource-clients/dataset.js:158:18)
2024-04-18T14:24:08.944Z     at async PushDataCommand.run (/usr/local/lib/node_modules/apify-cli/src/commands/actor/push-data.js:25:9)
2024-04-18T14:24:08.946Z     at async PushDataCommand._run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
2024-04-18T14:24:08.948Z     at async Config.runCommand (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/config/lib/config.js:173:24)
2024-04-18T14:24:08.950Z     at async Main.run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/main.js:28:9)
2024-04-18T14:24:08.952Z     at async Main._run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
2024-04-18T14:24:08.954Z     at async /usr/local/lib/node_modules/apify-cli/src/bin/run:7:9

for the record, I'm following up on my question on Discord

Would you accept a PR for something along the lines of forceCloud = Boolean(Number(process.env.APIFY_IS_AT_HOME))

Makes sense to me, but I would first take a closer look at why this is needed, the forceCloud (as I remember it) was meant only for local development - to enforce usage of the Apify API. The storage folders shouldn't be present on the platform at all, but I recall the apify run creates them, or maybe you had them locally and they ended up in the docker image (but I can see you have this line commented out, so it would be quite weird on its own).

cc @vladfrangu

Thank you for your swift response @B4nan.

I report sucess! It works, although building the entireapify-cli fork repo in the Actor Dockerfile makes the Actor Build painfully slow and the container then bloated.

But hurray! Will you help me with merging this upstream, please? Writing a test case is the next step for me now I guess. How would you approach testing this?

image

Could be just a unit test, you could set the env var and call getApifyStorageClient() method to see what prototype it returns. Such test could go here.

Alternatively, it could be a command test to test this e2e, in such case take a look at the secrets namespace tests, we'd have another one for actor namespace (as we apparently don't have such tests at the moment). But I am not sure how to assert this easily, since there are no classes in the background where you could just spy on their prototype to see things (my favorite way to assert things in integration tests).


Btw you don't need npm install here, we use yarn and yarn install is enough. This will surely add some overhead as npm is pretty slow and you are fetching all the deps twice (second time with yarn).

Also, the code in master is for the upcoming v1 release, once it lands there, we can cherry pick the fix to v0.19 if we want to (or maybe just send a separate PR, as the code in there is quite different, v1 is a TS rewrite with significant dependency bumps, so it wont be very cherry pick friendly). We plan to ship v1 in the following month.