datawhores/OF-Scraper

Feature Request: Execute command on scraping completion (not for each user)

Closed this issue · 8 comments

I see that there's already the capability to run a command after scraping each model, but as far as I can tell there is no option to run a given command after scraping finishes altogether. I guess it's only really useful in daemon mode (otherwise, you could just run ofscraper with && otherCommand to achieve this.

It would be super helpful if this could be added, so that I can write up a quick bash script to automatically trigger Stash to rescan my scraped dir upon scraping completion.

you can have Stash scan the folder for each user, because the arguments sent to the given command are:
1 - username
2 - user-id
3 - media
4 - posts
as can be seen in line 59 - 67 of ofscraper/download/download.py

subprocess.run(
[
settings.get_post_download_script(),
username,
model_id,
mediadump,
postdump,
]
)

So with that in hand you could then have your script call something like stashapi.stashapp.metadata_scan with the one element list for the path to that user's directory
https://github.com/stg-annon/stashapi/blob/b1580be2afdbfe15b7d051311c1012dd81c158c2/stashapp.py#L294

Or even better, like I am starting to work on over in the of-scraper-post subproject of my of-tools repo, have the post download script trigger not just the scan, but the full generate command, and then (not complete in my script as of writing this) even put the scraped metadata directly into Stash.

This has been added but it doesn't have as much data as the post_download_script

just a simple json

 out_dict={"users":users,
             "dir_format":config_data.get_dirformat(),
             "file_format":config_data.get_fileformat(),
             "metadata":config_data.get_metadata()
             }

just for clarity, this was added to 3.11.1, and it is post_script which can be in either of 3 locations in the config file:
post_script
advanced_options.post_script
script_options.post_script

@wrapper.config_reader
def get_post_script(config=None):
if config is False:
return constants.POST_SCRIPT_DEFAULT
val=None
if config.get("post_script") is not None:
val= config.get("post_script")
elif config.get("advanced_options", {}).get("post_script") is not None:
val=config.get("advanced_options", {}).get("post_script")
elif config.get("script_options", {}).get("post_script") is not None:
val=config.get("script_options", {}).get("post_script")
return val if val is not None else constants_attr.getattr("POST_SCRIPT_DEFAULT")

The confusing bit is the config updater moved post_download_script and post_script into

    "scripts": {
        "post_download_script": "",
        "post_script": ""
    },

so it doesn't seem to be being used.

so after adding

    elif config.get("scripts", {}).get("post_download_script") is not None:
        val = config.get("scripts", {}).get("post_download_script")

into get_post_download_script
and

    elif config.get("scripts", {}).get("post_script") is not None:
        val = config.get("scripts", {}).get("post_script")

into get_post_script the scripts started being called, however the format of the passed arguments is now completely different; but on top of that it looks like post_script is only called by ofscraper.final.final.final() by ofscraper.commands.commands.scraper.manager.execute.runner() not by daemon mode.

Some of the

just for clarity, this was added to 3.11.1, and it is post_script which can be in either of 3 locations in the config file: post_script advanced_options.post_script script_options.post_script

@wrapper.config_reader
def get_post_script(config=None):
if config is False:
return constants.POST_SCRIPT_DEFAULT
val=None
if config.get("post_script") is not None:
val= config.get("post_script")
elif config.get("advanced_options", {}).get("post_script") is not None:
val=config.get("advanced_options", {}).get("post_script")
elif config.get("script_options", {}).get("post_script") is not None:
val=config.get("script_options", {}).get("post_script")
return val if val is not None else constants_attr.getattr("POST_SCRIPT_DEFAULT")

The confusing bit is the config updater moved post_download_script and post_script into

    "scripts": {
        "post_download_script": "",
        "post_script": ""
    },

Some of that is just for backwards compatibility
The function there is responsible to for finding where in the config the data the desired data is at

The final format will always follow the schema found in

ofscraper/utils/config/schema.py

So you can always put it in those three locations but it will always end up in the same place once the config finishes processing

one of those locations is actually for the prompt menu so that a flatten config can be processed before formatting. Otherwise I would have to write into the prompt menu were to update the config

so after adding

    elif config.get("scripts", {}).get("post_download_script") is not None:
        val = config.get("scripts", {}).get("post_download_script")

into get_post_download_script and

    elif config.get("scripts", {}).get("post_script") is not None:
        val = config.get("scripts", {}).get("post_script")

into get_post_script the scripts started being called, however the format of the passed arguments is now completely different; but on top of that it looks like post_script is only called by ofscraper.final.final.final() by ofscraper.commands.commands.scraper.manager.execute.runner() not by daemon mode.

I realized there were a lot of places were it wasn't being called
so I redid it for 3.11.2

Nice, thanks! In my particular case I don't really need much info, I just need it to trigger a script upon scraping completion so I can trigger a scan in my stash instance that holds this content.

I haven't tried this yet but I will as soon as I get a chance, and I'll report back. Thanks for adding it!

Closing because the feature was already added