zerostep-ai/zerostep

Flaky actions

SmartALB opened this issue · 1 comments

Given that page (snippet):

image

and given that test:

test("ai test", async ({ freeSlotCountPage }) => {
  const options = {
    debug: true, // If true, debugging information is returned from the ai() call.
  };

  await ai(
    "Click in table on plus icon button of lowest entry",
    { page: freeSlotCountPage, test },
    options
  );

  await ai(
    "Click in table on plus icon button of displayed location",
    { page: freeSlotCountPage, test },
    options
  );

  await freeSlotCountPage.waitForTimeout(5000);
});

Apart from expecting the first action to click on the plus icon of the bottom entry ('24.12.2023'), unfortunately, the next action is rarely successful.

a
Log:

 ws send: {"type":"task-start","packageVersion":"v0.1.5","taskId":"018c808d-508a-7a10-99ee-40370c259251","task":"Click in table on plus icon button of entry with next date","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"conte
< ws recv: {"type":"command-request","taskId":"018c808d-508a-7a10-99ee-40370c259251","index":0,"name":"clickElement","arguments":{"id":"616"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c808d-508a-7a10-99ee-40370c259251","index":0,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c808d-508a-7a10-99ee-40370c259251","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}
> ws send: {"type":"task-start","packageVersion":"v0.1.5","taskId":"018c808d-625a-7c54-b445-69984f0f3c5c","task":"click in grid on plus icon button of location alfa","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"contentLangua
< ws recv: {"type":"command-request","taskId":"018c808d-625a-7c54-b445-69984f0f3c5c","index":1,"name":"clickElement","arguments":{"id":"748"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c808d-625a-7c54-b445-69984f0f3c5c","index":1,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c808d-625a-7c54-b445-69984f0f3c5c","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}

b
Log:

"type":"task-start","packageVersion":"v0.1.5","taskId":"018c808b-2d7b-7f29-a6b7-0a7c464c7c9e","task":"Click in table on plus icon button of entry with last date","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"conte
< ws recv: {"type":"command-request","taskId":"018c808b-2d7b-7f29-a6b7-0a7c464c7c9e","index":0,"name":"clickElement","arguments":{"id":"879"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c808b-2d7b-7f29-a6b7-0a7c464c7c9e","index":0,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c808b-2d7b-7f29-a6b7-0a7c464c7c9e","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}
> ws send: {"type":"task-start","packageVersion":"v0.1.5","taskId":"018c808b-3f87-75fe-b284-ffa6277adda4","task":"click in grid on plus icon button of location alfa","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"contentLangua
< ws recv: {"type":"command-request","taskId":"018c808b-3f87-75fe-b284-ffa6277adda4","index":1,"name":"clickElement","arguments":{"id":"895"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c808b-3f87-75fe-b284-ffa6277adda4","index":1,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c808b-3f87-75fe-b284-ffa6277adda4","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}

Even though it appears that nothing is clicked in the second action, the test is marked as 'passed'.

Unfortunately, I am not receiving any debug output anywhere, even though I have set debug to true in the options.

Edit: Here another run - both commands doing the expected:

...
  await ai(
    "Click in table on plus icon button of entry with next date",
    { page: freeSlotCountPage, test },
    options
  );

  await ai(
    "click in grid on plus icon button of location alfa",
    { page: freeSlotCountPage, test },
    options
  );
...

c

> ws send: {"type":"task-start","packageVersion":"v0.1.5","taskId":"018c8093-d19b-7f54-afe9-b0b4ccafc0d1","task":"Click in table on plus icon button of entry with next date","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"conte
< ws recv: {"type":"command-request","taskId":"018c8093-d19b-7f54-afe9-b0b4ccafc0d1","index":0,"name":"clickElement","arguments":{"id":"621"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c8093-d19b-7f54-afe9-b0b4ccafc0d1","index":0,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c8093-d19b-7f54-afe9-b0b4ccafc0d1","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}
> ws send: {"type":"task-start","packageVersion":"v0.1.5","taskId":"018c8093-e530-7a74-af74-1323bf018403","task":"click in grid on plus icon button of location alfa","snapshot":{"dom":"{\"documents\":[{\"documentURL\":0,\"title\":1,\"baseURL\":0,\"contentLangua
< ws recv: {"type":"command-request","taskId":"018c8093-e530-7a74-af74-1323bf018403","index":1,"name":"clickElement","arguments":{"id":"638"}}
> ws send: {"type":"command-response","packageVersion":"v0.1.5","taskId":"018c8093-e530-7a74-af74-1323bf018403","index":1,"result":"null"}
< ws recv: {"type":"task-complete","taskId":"018c8093-e530-7a74-af74-1323bf018403","model":"gpt-3.5-turbo","wasSuccessful":true,"result":{"actions":["Click"]}}

Hey @SmartALB - thanks for opening an issue here.

You can think of the AI here as acting like a skilled manual tester. Consequentlly, a good prompt to ZeroStep is an instruction that you could give to a handful of skilled manual testers, and each of them would take the same action. A few of your prompts don't meet that bar, IMO:

Click in table on plus icon button of lowest entry

Since this is a list of dates, I am unsure of what "lowest entry" means. Does it mean the earliest date? (That's what the AI clicks on). Does it mean the last date in the table? (I assume that's what you expected the AI to click on). I don't think you'd have consensus if you asked a few skilled testers what they were supposed to do with this prompt.

Click in table on plus icon button of displayed location

The AI is likely confused by what "displayed location" means. It seems to infer some specific location on the page. Can you refer to that explicitly like you did in your last example, "click in grid on plus icon button of location alfa". A prompt like "In the table, click the plus icon for the location that's displayed beneath the first date" might yield better results, since it would be clearer to a manual tester what exact action you want performed.

Click in table on plus icon button of entry with next date

The AI doesn't have any context on what previous actions were performed, so referring to the "next date" is likely what's confusing it. I would try to be more explicit here.

click in grid on plus icon button of location alfa

I would expect the AI to know what to do for this one, but again I think the phrasing here could be improved. "In the grid, click on the plus icon for the location 'Alfa'" should yield better results.

Please try incorporating these improvements into your prompts and let us know how it goes!