datawhores/OF-Scraper

Post User Process not running consistently

Opened this issue · 15 comments

Describe the bug

Using both version 3.11.1 and 3.11.2 with the new post* scripts I am running into two issues. First it isn't pulling in the scripts from the config file. I have to add extra code.

In get_post_download_script:

def get_post_download_script(config=None):
if config is False:
return constants.POST_DOWNLOAD_SCRIPT_DEFAULT
val=None
if config.get("post_download_script") is not None:
val= config.get("post_download_script")
elif config.get("advanced_options", {}).get("post_download_script") is not None:
val=config.get("advanced_options", {}).get("post_download_script")
elif config.get("script_options", {}).get("post_download_script") is not None:
val=config.get("script_options", {}).get("post_download_script")
return val if val is not None else constants_attr.getattr("POST_DOWNLOAD_SCRIPT_DEFAULT")

I need to add:

    elif config.get("scripts", {}).get("post_download_script") is not None:
        val = config.get("scripts", {}).get("post_download_script")

and in get_post_script:

def get_post_script(config=None):
if config is False:
return constants.POST_SCRIPT_DEFAULT
val=None
if config.get("post_script") is not None:
val= config.get("post_script")
elif config.get("advanced_options", {}).get("post_script") is not None:
val=config.get("advanced_options", {}).get("post_script")
elif config.get("script_options", {}).get("post_script") is not None:
val=config.get("script_options", {}).get("post_script")
return val if val is not None else constants_attr.getattr("POST_SCRIPT_DEFAULT")

I need to add:

    elif config.get("scripts", {}).get("post_script") is not None:
        val = config.get("scripts", {}).get("post_script")

But even then, the script doesn't run in 3.11.2

To Reproduce

Run ofscraper -u ALL -l DEBUG -p STATS -o all,labels -a download -d 120 -ts -up -st expired

Expected behavior

After every user the post_download_script command should fire, and at the end of the loop the post_script should fire.

Screenshots/Logs

With 3.11.2 The error I get for every performer is:

 2024-08-08 18:25:50:[level.inner:11]  expected str, bytes or os.PathLike object, not int
 2024-08-08 18:25:50:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 18, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1885, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not int

Config

{
    "main_profile": "main_profile",
    "metadata": "{save_location}/meta/OnlyFans/{model_username}/Metadata",
    "discord": "",
    "file_options": {
        "save_location": "/Volumes/FileAccess/OnlyFans/",
        "dir_format": "sites/OnlyFans/{model_username}/{responsetype}/{value}/{mediatype}/",
        "file_format": "{date}-{filename}.{ext}",
        "textlength": 0,
        "space_replacer": " ",
        "date": "YYYY-MM-DD_HH-mm",
        "text_type_default": "letter",
        "truncation_default": true
    },
    "download_options": {
        "filter": [
            "Images",
            "Audios",
            "Videos",
            "Text"
        ],
        "auto_resume": false,
        "system_free_min": 0,
        "max_post_count": 0
    },
    "binary_options": {
        "ffmpeg": "/opt/homebrew/bin/ffmpeg"
    },
    "cdm_options": {
        "private-key": null,
        "client-id": null,
        "key-mode-default": "keydb",
        "keydb_api": "{redacted}"
    },
    "performance_options": {
        "download_sems": 6,
        "thread_count": 2,
        "download_limit": 0
    },
    "content_filter_options": {
        "block_ads": false,
        "file_size_max": 0,
        "file_size_min": 0,
        "length_max": null,
        "length_min": null
    },
    "advanced_options": {
        "code-execution": true,
        "dynamic-mode-default": "datawhores",
        "backend": "aio",
        "downloadbars": true,
        "cache-mode": "json",
        "appendlog": false,
        "custom_values": {
            "OLD_DEVIINT": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/new/dynamicRules.json",
            "XAGLER": "https://raw.githubusercontent.com/xagler/dynamic-rules/main/onlyfans.json",
            "RAFA": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "DIGITALCRIMINALS": "https://raw.githubusercontent.com/DATAHOARDERS/dynamic-rules/main/onlyfans.json",
            "DATAWHORES": "https://raw.githubusercontent.com/datawhores/onlyfans-dynamic-rules/main/dynamicRules.json",
            "DEVIINT": "https://raw.githubusercontent.com/rafa-9/dynamic-rules/main/rules.json",
            "MAXFILE_SEMAPHORE": 10,
            "SHOW_AVATAR": false,
            "import": "exec('import ofscraper.filters.models.selector as selector23')",
            "list": "exec('modelObjs=C)')",
            "model_price": "'fallback' if len(modelObjs)==0 else 'Paid' if modelObjs[0].final_current_price>0 else 'Free'"
        },
        "sanitize_text": false,
        "temp_dir": null,
        "remove_hash_match": true,
        "infinite_loop_action_mode": false,
        "enable_auto_after": true,
        "default_user_list": "main",
        "default_black_list": ""
    },
    "scripts": {
        "post_download_script": "/Users/your_username/Development/of-scraper-post/post-user.sh",
        "post_script": "/Users/your_username/Development/of-scraper-post/post-loop.sh"
    },
    "responsetype": {
        "timeline": "Posts",
        "message": "Messages",
        "archived": "Archived",
        "paid": "Messages",
        "stories": "Stories",
        "highlights": "Stories",
        "profile": "Profile",
        "pinned": "Posts",
        "streams": "Streams"
    },
    "overwrites": {
        "audios": {},
        "videos": {},
        "images": {},
        "text": {
            "file_format": "{date}-{post_id}.{ext}"
        }
    }
}

System Info

  • OS: macOS 14.5 (M1)
  • pipx
  • python 3.12

Additional context

This happens on multiple OF accounts; here are some examples: couple_of_perverts, lilithinlatexxx, rubberdoll, lola-saint, sophie_x_elodie, trainingj, tightlacedchaos, doe-eyes-official

I think it is because model_id needs to be converted into a string if not already one

I forced model_id to string in final_user.py:

        run(
            [
                settings.get_post_download_script(),
                username,
                str(model_id),
                json.dumps(media_dump),
                json.dumps(post_dump),
                json.dumps(master_dump),
            ]
        )

and it did change the error message

 2024-08-08 20:47:57:[final_user.post_user_process:13]  Running post script for lilithinlatexxx
 2024-08-08 20:47:58:[level.inner:11]  [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'
 2024-08-08 20:47:58:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_user.py", line 24, in post_user_process
    run(
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/system/subprocess.py", line 9, in run
    t=subprocess.run(*args, stdout=subprocess.PIPE,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: '/Users/your_username/Development/of-scraper-post/post-user.sh'

just to make sure

getconf ARG_MAX
1048576

Yeah I've never had to worry about this

I have a partial solution
but some information is still too long

also, I hadn't gotten the post script to actually fire off earlier, just came back to this waiting:

 2024-08-09 01:27:34:[final_script.final_script:27]  Running post script
 2024-08-09 01:27:34:[level.inner:11]  Object of type Model is not JSON serializable
 2024-08-09 01:27:34:[level.inner:11]  Traceback (most recent call last):
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 92, in inner
    raise E
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/commands/managers/scraper.py", line 47, in runner
    final(normal_data , scrape_paid_data ,user_first_data,userdata)
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final.py", line 17, in final
    final_script(users or [])
  File "/Users/your_username/.local/pipx/venvs/ofscraper/lib/python3.12/site-packages/ofscraper/runner/close/final/final_script.py", line 42, in final_script
    json.dumps(out_dict)
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Model is not JSON serializable

Found what to blame:

for ele in users:
if isinstance(ele,Model):
data.append(ele.model)
elif isinstance(ele,dict):
data.append(ele)
out_dict={"users":users,
"dir_format":config_data.get_dirformat(),
"file_format":config_data.get_fileformat(),
"metadata":config_data.get_metadata()
}

you create a variable data, and append the Model.model for each ele, but then pass users to the out_dict instead of data

Yeah that only works for that one since the amount of data is small

for the download script my other solution won't work for larger creators

The user will have to read and process the data in there script

I think the only possibility is to redirect the data with >
then the user would have to read the input_

Update: I think the solution is to write a single json to a temporary file, pass that path off to the script

I will fix the post_script

for the post_download_script
I made this change

        master_dump=json.dumps({"username":username,"model_id":model_id,"media":media,"posts":posts})
        with tempfile.NamedTemporaryFile() as f:
          with open(f.name, "w") as g:
              g.write(master_dump)
          run([settings.get_post_download_script(),f.name])

I think the post_script will be okay, but just to be safe and to put things in sync I think I will do the same for that as as well

So far it been working on my system
I've been testing with --post-script cat and --download-script cat to make sure the output is shown on the console

Tested in

  • manual mode
  • check mode
  • normal downloading

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

 2024-08-21 22:49:16:[final_script.final_script:31]  Running post script
 2024-08-21 22:49:16:[level.inner:11]  unhashable type: 'dict'
 2024-08-21 22:49:16:[level.inner:11]  Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/your_username/utils/run.py", line 88, in daemon_run_helper
    job_func()
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 92, in inner
    raise E
  File "/venv/lib/python3.11/site-packages/your_username/utils/context/exit.py", line 85, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/your_username/commands/managers/scraper.py", line 50, in runner
    final(normal_data, scrape_paid_data, user_first_data, userdata)
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final.py", line 20, in final
    final_script(userdata or [])
  File "/venv/lib/python3.11/site-packages/your_username/runner/close/final/final_script.py", line 36, in final_script
    data = value
    ~~~~^^^^^
TypeError: unhashable type: 'dict'

looks like in some of the work between 3.11.2 and 3.11.6 there seems to have been a change in final_script.py that caused a crash:

+1 on this, I'm seeing the same issue on 3.11.6

should be fixed

In which release? 3.11.7? Could you please generate the package for that version if so? I can't pull the docker image right now