JuanBindez/pytubefix

pytubefix.exceptions.RegexMatchError, 3.1-rc1

Closed this issue · 2 comments

Describe the bug
Getting a REGEX Error on extracting the video URL

--> python3
Python 3.11.5 (main, Sep  7 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pytubefix
>>> YOUTUBE_URL = str('https://www.youtube.com/@pbsspacetime/videos')
>>> x = pytubefix.Channel(YOUTUBE_URL)
>>> for index, VIDEO in enumerate(x.video_urls, start=1):
...     yt = pytubefix.YouTube(str(VIDEO), use_oauth=True, allow_oauth_cache=True)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/projects/mytube/venv/lib64/python3.11/site-packages/pytubefix/__main__.py", line 96, in __init__
    self.video_id = extract.video_id(url)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/mytube/venv/lib64/python3.11/site-packages/pytubefix/extract.py", line 133, in video_id
    return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/projects/mytube/venv/lib64/python3.11/site-packages/pytubefix/helpers.py", line 129, in regex_search
    raise RegexMatchError(caller="regex_search", pattern=pattern)
pytubefix.exceptions.RegexMatchError: regex_search: could not find match for (?:v=|\/)([0-9A-Za-z_-]{11}).*

Expected behavior
I expect it to parse the URL normally
It seems able to do this normally on a staticaly typed URL, but extracting it from a channel name seems to be an issue

Desktop (please complete the following information):

  • OS: Red Hat Enterprise Linux release 9.3 (Plow)
  • Python Version 3.11.5
  • pytubefix 3.1-rc1

Additional context
Testing a release candidate

The video_urls method is inherited from the Playlist class. Try using the .videos method, it returns YouTube object.

If you just need the url of each video, try this:

YOUTUBE_URL = str('https://www.youtube.com/@pbsspacetime/videos')
x = Channel(YOUTUBE_URL)
for index, VIDEO in enumerate(x.videos, start=1):
     print(VIDEO.watch_url)

Ultimately, that worked however -- this code has been working since its inception and only became an issue after I upgraded to 3.0.0 ... so something changed. This was the change that worked for me:

    # Iterate through the playlist
    for index, VID in enumerate(x.videos, start=1):

        VIDEO = VID.watch_url

I had to keep VIDEO intact, because of how other function are depending on that holding the video URL, otherwise, the VIDEO variable, just contained the YouTube object.