philbot9/youtube-comment-scraper

YouTube comment scraper deactivated

Yakabuff opened this issue · 24 comments

502 bad gateway

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users 😉

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

Not everyone is a programmer and not everyone has Linux. If it was an exe, no problem. This is desperate. Many comments from YT disappear. I only found your service last month. It was great help!

@sevecose Neither programming skills nor Linux are required to run the CLI. As per the installation instructions in the README you do have to have node.js with npm installed (and I would suggest PowerShell on Windows).

If that’s not an option for you keep an eye on this GitHub issue. When I have the time to fix this problem and the site is back up I will close this Issue.

The site is back up for the short term!

http://ytcomments.klostermann.ca

Depending on usage it may go down again. I'm exploring options to find a more permanent solution.

Unfortunately, I had to disable the scraper again.

It looks like we have reached critical mass. Due to the high number of comments being scraped by users, YouTube has blocked the server IP. I'm looking at alternative options such as Proxy servers or something like AWS lambda.

Until I have found a solution the scraper will remain deactivated. As an alternative, users can use this project to run the scraper locally: https://github.com/philbot9/youtube-comment-scraper-cli

Unfortunately, I had to disable the scraper again.

It looks like we have reached critical mass. Due to the high number of comments being scraped by users, YouTube has blocked the server IP. I'm looking at alternative options such as Proxy servers or something like AWS lambda.

Until I have found a solution the scraper will remain deactivated. As an alternative, users can use this project to run the scraper locally: https://github.com/philbot9/youtube-comment-scraper-cli

What is the volume or requests that triggered the BAN ?

@everythinginitsrightplace Please post issues with the CLI on that repo: https://github.com/philbot9/youtube-comment-scraper-cli

This has actually come up before though: philbot9/youtube-comment-scraper-cli#19

Hello.Does anyone have an ideea on how fast will the site be back? I tried installing nodes.js but I'm pretty sure it's not my cup of tea.If anyone could help me set it up keeping in mind that I'm a total noob at programing I would really appreciate it :)

I tried installing nodes.js but I'm pretty sure it's not my cup of tea.

I feel your pain; I had it going once for a while, but even I couldn't get it working again under Windows.

If you're stuck on Windows, you might have better luck trying things within Windows Subsystem for Linux. I got things running on pure Linux. If you go the Linux route, then my notes might help:
https://blog.spiralofhope.com/?p=45279

If you have no idea what you're doing with Windows Subsystem for Linux, I have notes here:
https://blog.spiralofhope.com/?p=39613

I can update my stuff if you have any breakthroughs, but I'm not in a position to mentor for this problem since it's solved for me.

thanks for this scraper script! It works very well, even when a YT video has many comments. Previous year i used your online version but your script also works fine locally, so i use this terminal method since the online version is disabled due to the described issue (which i discovered just recently).

it's really doubtfull what we can do with all that many comment info of certain YT videos, and your script helps in still finding good info!

but i wonder: should YT ever change their comment page (layout / navigation) code, your script might no longer function properly..? Do you check their changes? Can i keep using the script in future? I'm thinking of writing a program / script which rebuilds an HTML / JS page from the json/csv comment data and maybe even create a PDF of all comments & replies of a certain YT video .. this way searching text parts is easy - i often found many stunning info an links .. an HTML // JS version can even sort all comments on 'newest' and hide replies.

so, are you still working on this script / keeping updates regarding any changed YT code in future?
About the "disabling issue", hope you'll solve this.

but i wonder: should YT ever change their comment page (layout / navigation) code, your script might no longer function properly..? Do you check their changes? Can i keep using the script in future?

The author is alive and the project is active. Maybe it can be updated, maybe it can't. Maybe YouTube will make it harder in the future. Nobody knows.

I'm thinking of writing a program / script which rebuilds an HTML / JS page from the json/csv comment data and maybe even create a PDF of all comments & replies of a certain YT video .. this way searching text parts is easy

Maybe the author could implement a way to dump only certain fields into the file and make it much easier for you.

About the "disabling issue", hope you'll solve this.

The author said this is a YouTube server IP block, so there is nothing he can do.

rboye commented

So sad to see the tool get disabled. For now I had to switch to this one as a backup solution: https://seobots.io/bots/youtube-comment-scraper ; it works in a similar way.

I'd bet it's just a matter of time until that website also gets blacklisted.

Using philbot9's commandline scraper is working great for me, since I won't overuse an IP like a website/service would.

NetLab is a Danish national infrastructure for research use of archived web content. In order to support our target group, I have just created a tutorial - it went live last night. People here may also find it useful.

The tutorial covers Windows and Mac (the latter with thanks to my colleagues for details and feedback).

I believe that one key issue for those having trouble with Philip's excellent script may be the need to use admin privileges in order to get everything running correctly.

The tutorial may be found on this page:
http://www.netlab.dk/services/tools-and-tutorials/youtube-comment-scraper/

I hope it will prove helpful to some.

Thank you @AsgerH for taking the time to create this tutorial. I have added a link to the youtube-comment-scraper-cli.

You are very welcome, Philip. I'm happy to see that you added it to your repository.

On a side note thank you for handling the strange attack. You handled it exactly as I asked GitHub support to do, by deleting it all.

Could it be possible that the client browser request the pages from youtube and then send them to the server for the extraction of data ? Then youtube wouldn't see only one ip making a lot of requests. I know a first obstacle would be to bypass cors policies on chrome (but I've seen that's posible with some flags on the executable).

Another approach could be to use this as a chrome extension, if you browse to the web page that contains the video I guess you could extract the html without cors issues.

I'm sure somebody has already thought of this, just want to know your opinion.

@M-Y-bit This is an issue tracker, not a support forum.

If you are having problems with the youtube-comment-scraper-cli, please refer to the information in that repository: https://github.com/philbot9/youtube-comment-scraper-cli

There is no further information or support available.

@andrscyv Thanks for your suggestion.

I considered a completely client side solution at one point (fetch and parse) but as you point out, CORS makes this impossible. YouTube does not include any CORS headers in their response so we can't fetch the data client-side.

A browser extension might work, but that's not a path I'd like to go down. I would like to support most browsers and maintaining several extensions would be cumbersome as browser platforms change.

hi while i am trying local cli
i am facing this
can i know why this occur
API response does not contain a "content_html" field

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

while using this i am encountered "API response does not contain a "content_html" field"
plzz let me know why it happnes and the solution thankyou for your script

previously youtube comment scraper-cli worked correctly. unfortunately, it gives an error now.
that error was "API response does not contain a "content_html". @philbot9 can you help me to solve that error.

@PiyumithaNirman

previously youtube comment scraper-cli worked correctly. unfortunately, it gives an error now.
that error was "API response does not contain a "content_html". @philbot9 can you help me to solve that error.

If you have an issue with that other program, then you should check its issues list for your problem. I think this is the one you should subscribe to:

philbot9/youtube-comment-scraper-cli#47

pke commented

Getting 404s only. Has the YT download API URL changed?