joetats/youtube_search

Weird empty result response

Closed this issue · 14 comments

On Python 3.7

>>> serp = 'asereje'
>>> YoutubeSearch(serp, max_results=1).to_dict()
[{'title': 'Las Ketchup - The Ketchup Song (Asereje) (Spanish Version) (Official Video)', 'link': '/watch?v=V0PisGe66mY', 'id': 'V0PisGe66mY'}]
>>> YoutubeSearch(serp, max_results=1).to_dict()
[]
>>> YoutubeSearch(serp, max_results=1).to_dict()
[{'title': 'Las Ketchup - The Ketchup Song (Asereje) (Spanish Version) (Official Video)', 'link': '/watch?v=V0PisGe66mY', 'id': 'V0PisGe66mY'}]

So I'm understanding correctly, is this a case where the same search string is returning different results? Trying to figure out how to recreate this.

I was doing my own python script scrapping search results and found out that sometimes youtube returned a html without the list or results, just the navbar. Perhaps you are having the same problem?

In the URL used in the YoutubeSearch.search method you have "pbj" as a query param , what's that for ?

That's interesting! I wonder if it has something to do with the user-agent that makes the response and if that needs to be set.

I added the pbj query param as all my browser-based searches had it, not sure if it's needed or not but wanted to make the request look as 'browser-like' as possible (without setting other headers)

I tried setting the user-agent. If I set it as a mobile device the scrapper stopped working completely, since YouTube always return html with just a loader

Then I tried using a IE user agent and It seems it returns empty html less often, but It still had the problem

What I ended up doing was just retrying the request and checking if the text of the page contained an “ol” tag so it wouldn’t need to parse the whole html with beautiful soup each time. I was thinking about using selenium to execute the JavaScript of the empty html but I don’t know if it will be too slow to be used as a real time search api

I see. Let me set up a test case and run it a bit overnight and see if I can get some stats on performance for different methods.

same problem I have, if you solve this problem please call me. Thanks for share this good script to us.

In the URL used in the YoutubeSearch.search method you have "pbj" as a query param , what's that for ?

When I try it in the browser with pbj it returns a json file instead of HTML.

I also have the same problem, it seems that a JSON file only containing {"reload":"now"} is returned most of the time.

same problem I have, if you solve this problem please call me. Thanks for share this good script to us.

for some reason it started to act like this for me today too, it only responds with empty dictionaries, no matter the query

nm17 commented

Same thing too, please fix @joetats

I'm seeing these issues on my end as well, it looks like youtube had a change to their front-end that has changed how we'll scrape the data. Might have to get a bit creative but this is top priority right now.

Hey guys, I've had the same issues and tried a new approach (no web scrapping ) on the script I was working on . Right now it is working, you can check it here https://github.com/andrscyv/fast_youtube_search

I got the endpoint that youtube uses to load the results of a search, it's actually the same url but it needs other headers. It returns json but with a very weird structure although it is possible to recover the results from it.

nm17 commented

Hey guys, I've had the same issues and tried a new approach (no web scrapping ) on the script I was working on . Right now it is working, you can check it here https://github.com/andrscyv/fast_youtube_search

I got the endpoint that youtube uses to load the results of a search, it's actually the same url but it needs other headers. It returns json but with a very weird structure although it is possible to recover the results from it.

@joetats maybe you can add those headers to your code?

Now YouTube randomly returns two types of responses, one within JavaScript and other as HTML... Now we cannot scrape using BeautifulSoup.

I have separately written another script here completely from scratch (without any third party library) and is working with both types of responses.

Feel free to use:
https://github.com/alexmercerind/youtube-search-python

It has lot more information in its search result.

works ok by pulling out that "initial data" field, so I'm going to close this for now. updated to 1.0.0 as the new returned fields will break the old interface