Flowko/website-shot

Would it be possible to auto install plugins like uBlock Origin into the Website-Shot browser?

Leopere opened this issue ยท 13 comments

Would it be feasible to have the scraper auto download and install uBlock Origin? Since it seems you're using Chrome it would be slick if it could be added so that the annoyances like Email Sign Up modals and those absurdly useless cookie intent notifications that the EU forced on the world.

I know that Google's new insane regime decided that we're not going to have effective ad blocking going forward. However, that said it would be keen to use uBlock because it has the Annoyances list.

image

hi sure thing, i'll see what i can do about it when i have some free time

@Leopere can u send me that link, i would like to test against it

Firstly this is the URL you asked for.
https://www.advisory.com/daily-briefing/2022/06/02/covid-transmission

Thanks heaps for all of the help! I think this will drastically improve how this all works. One of the things I've noticed from ArchiveBox (technically a competitor app) is that they ship with a PiHole instance as an option in their Docker-Compose.yml example in case someone wanted to use DNS black holing to stop adverts from loading when snapshots are being taken. https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#L50-L74

Might be able to potentially stitch that in here. If you don't hate the idea I can make a ticket for it as well.

version: "3.9"
services:
  webshot:
    image: flowko1/website-shot:latest
        # dns:                                  # uncomment to use pihole below for ad/tracker blocking during archiving
            # - pihole
    volumes:
      - /REDACTED/data:/usr/src/website-shot/screenshots
    deploy:
      replicas: 1
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.webshot.tls=true"
        - "traefik.http.services.webshot.loadbalancer.server.port=3000"
        - "traefik.http.routers.webshot.rule=Host(`webshot.redacted.com`)"
        - "traefik.http.routers.webshot.entrypoints=websecure"
        - "traefik.http.routers.webshot.tls.certresolver=letsencryptresolver"
        - "traefik.http.routers.webshot.service=webshot"
        - "traefik.docker.network=traefik"
        - 'traefik.http.routers.webshot.middlewares=authelia@docker'
    networks:
      - traefik
#    environment:
#      - USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"

    ### Example: To run pihole in order to block ad/tracker requests during archiving,
    # uncomment this block and set up pihole using its admin interface

    # pihole:
    #   image: pihole/pihole:latest
    #   ports:
    #     - 80:80       # uncomment to access the admin HTTP interface on http://localhost:80
    #   environment:
    #     WEBPASSWORD: 'set a secure password here or it will be random'
    #   volumes:
    #     - ./data/pihole:/etc/pihole
    #     - ./data/dnsmasq:/etc/dnsmasq.d

hi, thanks for taking the time to test this out, yeah im totally okay with adding that in, its just that my first impression about this repo, was to create a tool that only generates screenshots, never thought it would helpful for some people or even used for archiving, but yeah here we are

as for PiHole i need to learn more about it, but if u can open a ticket for it specifically that would be great, also if u can add steps/ways to test it, that would be helpful as well

Yeah I can probably even just whip up a docker-compose.yml for testing it with just default ports.

I think that this tool so far has been pretty reliable for my uses I like the fact it screenshots stuff so well it seems like the right tool compared to more complex alternatives. It would be pretty slick if that other ticket with the envvar settings could be tweaked.. I'll update that ticket as well.

sorry, which ticket, is it the one u created above or this? #43

so about the fix for this, i didnt go the uBlock extension way, i used a ad blocker made for the tool im using to spin up the browser, and it works great for ur case
image

if u can pull the latest version, and test it by adding BLOCK_ADS to ur compose file, that would be great.

i'll redo most of the stuff i added recently, and probably redo the whole thing, i have some ideas that i wanna start implementing, just that i had no time to do so

Just to be clear

BLOCK_ADS=true

For public reference and my own sanity.

I'm not sure how this function works in the code but I'm guessing == 1 will accept other boolean equivalents like true. For now I'll test with 1 however.

It seems to work mostly I still get that weird annoyance in the bottom right but I think that even with the uBlock Origin annoyances list it won't be blocked regardless. Also you've set this envvar as a default within the Dockerfile which is sufficient to use it by default.

I'm curious about what sorts of things you'll want to remake I hope the app won't be broken going forward its a really nice tool as it is.

it'll be mostly quality-of-life changes, I'll update the stack I'm using to a new version, as well as the design, plus remove some of the unused stuff in the code, adding some docs in place, just mostly for my sanity to fix stuff easily and for anyone wanting to add stuff, plus I'm thinking of adding a DB support to this, so u can store ur screenshots and view them, like history, or even sharing them or something, u can just link to a specific id, plus what u suggested here #36, and whatever next i think of adding

I have noticed that when switching from Image capture to PDF capture or back again it clears the URL bar which is kind of annoying.

noted, ill create a ticket for that, and fix asap