get-pytube/pytube3

404 when trying to iterate through streams provided by YouTube/Video object

Closed this issue · 3 comments

Some videos have streams that are returning 404 when trying to access attributes on the stream.

For instance, this video here https://www.youtube.com/watch?v=3ejPqYn1gOY

python.exe C:/Scripts/YouTubeDownload/qt_gui.py
get video id
3ejPqYn1gOY
Worker 0: Loading video
Worker 0: 33532:  thread starting...
loading streams
Worker 0: Loading stream 0
Worker 0: Loading streams for video 0
Worker 0: Loaded video: checkra1n on Raspberry PI
Worker 0: Loaded stream 0
Worker 0: Loading stream 1
Worker 0: Loading stream 2
Worker 0: Loaded stream 1
Worker 0: Loading stream 3
Worker 0: Loaded stream 2
Traceback (most recent call last):
  File "C:\Scripts\YouTubeDownload\qt_assets\tabs\downloader.py", line 139, in load_streams
    f'Res: {stream.resolution}, FPS: {stream.fps}, '
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\streams.py", line 143, in filesize
    headers = request.head(self.url)
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\request.py", line 57, in head
    response_headers = _execute_request(url, method="HEAD").info()
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\request.py", line 19, in _execute_request
    return urlopen(request)  # nosec
  File "C:\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
<class 'urllib.error.HTTPError'> HTTP Error 404: Not Found <traceback object at 0x0000026452469748>
>>> from pytube3 import YouTube
>>> yt = YouTube('https://www.youtube.com/watch?v=3ejPqYn1gOY&feature=youtu.be')
>>> for stream in yt.streams.all():
...   print(
...     f'Codec: {stream.audio_codec}, '
...     f'ABR: {stream.abr}, '
...     f'File Type: {stream.mime_type.split("/")[1]}, '
...     f'Size: {stream.filesize // 1024} KB'
...   )
...
Codec: mp4a.40.2, ABR: 96kbps, File Type: mp4, Size: 606 KB
Codec: mp4a.40.2, ABR: 192kbps, File Type: mp4, Size: 1919 KB
Codec: None, ABR: None, File Type: mp4, Size: 1714 KB
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\streams.py", line 143, in filesize
    headers = request.head(self.url)
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\request.py", line 57, in head
    response_headers = _execute_request(url, method="HEAD").info()
  File "C:\Scripts\YouTubeDownload\venv\lib\site-packages\pytube\request.py", line 19, in _execute_request
    return urlopen(request)  # nosec
  File "C:\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

@jslay88 I'll take a look... it's happening because the filesize property has to be retrieved from a network call.

  • Looks like @movraxrsp and @swiftyy-mage did some digging into this issue a while back: pytube#543
  • I took what they found and pushed a PR here: https://github.com/hbmartin/pytube3/pull/48
  • The above PR lets you filter these streams out (see example in link)
  • Also, for listing lots streams size, please use the new filesize_approx property which is very accurate and avoid HTTP call overhead of filesize
  • I will continue investigating the possibility of decrypting these URLs to a non-error

Sweet. Will take a look probably tomorrow. Pretty slammed with work at the moment. Thanks for your efforts.