INPUT.json keep overriding after 0.19.5
tugkan opened this issue ยท 16 comments
Hey there,
After version 0.19.5, apify-cli
keeps overriding the INPUT.json
no matter what we tried. It is prefilling the results from INPUT_SCHEMA.json
, and overriding the input on the execution time. Once the execution is done, it reverts the input.
INPUT.json location: ./storage/key_value_stores/default/INPUT.json
INPUT_SCHEMA.json location: ./INPUT_SCHEMA.json
The commands we tried:
apify run
apify run -p
apify run -p --input-file
Project is running locally. Package versions:
"apify": "^3.2.0",
"@crawlee/cheerio": "^3.5.4",
"@crawlee/core": "^3.9.2",
"@crawlee/playwright": "^3.5.4",
"@crawlee/utils": "^3.9.2",
```
This is intended in order to have a valid, prefilled input file ๐ Is it causing you issues?
Hey @vladfrangu ! Yes. We prefill everything just to show users all the possible options. However, most of the input fields are optional, allowing the users to change the logic of the actors. As an example, there are multiple actors that are working in the following way;
- Optional field of
startUrls
contains all the possible URLs that users can enter. - Optional field of
search
in which if user doesn't want to take care of the URL entering. - Optional field of
maxItems
allows users to limit their requests. If the number is not there, it will fetch infinitely.
In this case, both startUrls
and search
gets prefilled immediately and cause problems on our end. Also, maxItems
are overridden in the development.
Can you share your input schema here please? Just so I can test stuff out locally and see what to do
Although what should we do in such cases? Throw an error if a prefilled optional value is missing in the existing input json? Stick to the prefilled values?
We should merge the values, preferring those from users, only adding the defaults/prefills where they are missing.
We should merge the values, preferring those from users, only adding the defaults/prefills where they are missing.
@B4nan This is currently what it does actually. However, we do not want the values to be prefilled at all because they might not exist in the INPUT.json
. Possibly introducing a flag to disregard this option would be perfect.
@vladfrangu an input schema from a prod actor is below. As you can see there is something called search
, mode
, and startUrls
which can be used individually. If all introduced, the actor will use both.
{
"title": "Realtor Scraper",
"description": "An actor that scrapes property details from realtor.com",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"description": "URLs to start with. It should be listing, agent detail or property detail URLs",
"prefill": [
"https://www.realtor.com/realestateandhomes-search/Las-Vegas_NV",
"https://www.realtor.com/realestateandhomes-detail/8209-Spring-Arts-Ave_Las-Vegas_NV_89129_M18560-54834",
"https://www.realtor.com/realestateagents/5a28883df695ab0010dfe28b"
],
"editor": "stringList"
},
"includeFloorplans": {
"title": "Include floor plans",
"type": "boolean",
"description": "Fetch the floor plans of the properties when available.",
"default": false
},
"maxItems":{
"title": "Maximum number of properties",
"type": "integer",
"description": "Maximum number of properties that you want as output. Default is all",
"editor": "number",
"prefill": 5
},
"endPage": {
"title": "Listing end page",
"type": "integer",
"description": "The page number that you want to end with. By default there is no end page.",
"editor": "number",
"prefill": 1
},
"search": {
"title": "Search terms",
"type": "string",
"description": "Terms for fulltext search in Realtor page",
"editor": "textfield",
"prefill": "las vegas",
"sectionCaption": "More Options (Search & Mode)"
},
"mode":{
"title": "Search modes (Required only when search is presented)",
"type": "string",
"description": "Modes of actor while search keyword is presented.",
"enum": ["BUY", "RENT", "SOLD"],
"default": "BUY",
"enumTitles": ["Search for buy", "Search for rent", "Search for sold"]
},
"extendOutputFunction": {
"title": "Extend output function",
"type": "string",
"nullable": true,
"description": "Function that takes a JQuery handle ($) as argument and returns data that will be merged with the default output",
"prefill": "($) => { return {} }",
"editor": "javascript",
"sectionCaption": "Advanced Options"
},
"proxy":{
"title": "Proxy configuration",
"type": "object",
"description": "Select proxies to be used by your crawler.",
"prefill": { "useApifyProxy": true },
"editor": "proxy"
}
},
"required":["proxy"]
}
However, we do not want the values to be prefilled at all because they might not exist in the INPUT.json.
I believe the platform behaves exactly like that too, or not? If you run via API, you will get the prefills/defaults for missing props too, right? (maybe just defaults, I think that was the difference between prefill and a default)
@B4nan As far as I know, it only puts the defaults on API. prefill
is only for the UI, in which if you remove the property from the UI, it doesn't put anything.
Hmm, right, maybe we should do the same in the CLI too.
Yeah definitely, we'll get some feedback from the rest of the company on how to approach this. We will either ignore the prefills by default, or make it configurable.
Consensus reached, we will PR a fix for this (no more prefilling prefilled values in input.json) ๐ซก. Ty for reporting
Would be nice to generate the initial INPUT.json based on the prefills though. E.g. when a user uses apify create
I think apify create
creates a default INPUT_SCHEMA.json
which doesn't have any properties in it with prefill or default. To put on top @mnmkng 's idea, I can suggest if the INPUT.json
is not created locally, the CLI can generate it using the prefill values - which would be the ideal.
That depends on the template. E.g. see here the prefill for startUrl
https://github.com/apify/actor-templates/blob/master/templates/js-crawlee-puppeteer-chrome/.actor/input_schema.json
Create should already be handling that request, and if not, please submit another issue for it ๐