`actor:push-data` doesn't work in the Apify Platform
netmilk opened this issue · 4 comments
I found out the apify actor:push-data
command works locally, but doesn't work when I apify push
it to the platform and then I apify run
iit. I tried my best to replicate the remote environment locally by exporting the same environment variables I logged in the Actor Run in the platform - and it fails even locally.
I also found the problem is that the CLI never picks the ActorStorage
, but always uses the MemoryStorage
because this condition is always true. I overrode it by setting the forceCloud
variable to true and it works as expected. I assume non of the apify actor:*
commands acutally work in the Apify Platform, because they use this code as well.
What would be the best course of action here? Would you accept a PR for something along the lines of forceCloud = Boolean(Number(process.env.APIFY_IS_AT_HOME))
My example is as simple as this: curl http://httpbin.org/ip | apify actor:push-data
and it fails with the following error message in the Apify Platform Run:
2024-04-18T14:24:08.937Z Error: Dataset with id: GpWJMhs1EcaSl0bDo does not exist.
2024-04-18T14:24:08.940Z at DatasetClient.throwOnNonExisting (/usr/local/lib/node_modules/apify-cli/node_modules/@crawlee/memory-storage/resource-clients/common/base-client.js:15:15)
2024-04-18T14:24:08.942Z at DatasetClient.pushItems (/usr/local/lib/node_modules/apify-cli/node_modules/@crawlee/memory-storage/resource-clients/dataset.js:158:18)
2024-04-18T14:24:08.944Z at async PushDataCommand.run (/usr/local/lib/node_modules/apify-cli/src/commands/actor/push-data.js:25:9)
2024-04-18T14:24:08.946Z at async PushDataCommand._run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
2024-04-18T14:24:08.948Z at async Config.runCommand (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/config/lib/config.js:173:24)
2024-04-18T14:24:08.950Z at async Main.run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/main.js:28:9)
2024-04-18T14:24:08.952Z at async Main._run (/usr/local/lib/node_modules/apify-cli/node_modules/@oclif/command/lib/command.js:43:20)
2024-04-18T14:24:08.954Z at async /usr/local/lib/node_modules/apify-cli/src/bin/run:7:9
for the record, I'm following up on my question on Discord
Would you accept a PR for something along the lines of forceCloud = Boolean(Number(process.env.APIFY_IS_AT_HOME))
Makes sense to me, but I would first take a closer look at why this is needed, the forceCloud
(as I remember it) was meant only for local development - to enforce usage of the Apify API. The storage folders shouldn't be present on the platform at all, but I recall the apify run
creates them, or maybe you had them locally and they ended up in the docker image (but I can see you have this line commented out, so it would be quite weird on its own).
cc @vladfrangu
Thank you for your swift response @B4nan.
I report sucess! It works, although building the entireapify-cli
fork repo in the Actor Dockerfile
makes the Actor Build painfully slow and the container then bloated.
But hurray! Will you help me with merging this upstream, please? Writing a test case is the next step for me now I guess. How would you approach testing this?
Could be just a unit test, you could set the env var and call getApifyStorageClient()
method to see what prototype it returns. Such test could go here.
Alternatively, it could be a command test to test this e2e, in such case take a look at the secrets namespace tests, we'd have another one for actor
namespace (as we apparently don't have such tests at the moment). But I am not sure how to assert this easily, since there are no classes in the background where you could just spy on their prototype to see things (my favorite way to assert things in integration tests).
Btw you don't need npm install
here, we use yarn
and yarn install
is enough. This will surely add some overhead as npm is pretty slow and you are fetching all the deps twice (second time with yarn).
Also, the code in master is for the upcoming v1 release, once it lands there, we can cherry pick the fix to v0.19 if we want to (or maybe just send a separate PR, as the code in there is quite different, v1 is a TS rewrite with significant dependency bumps, so it wont be very cherry pick friendly). We plan to ship v1 in the following month.