karlicoss/HPI

my.media.youtube returning []

mdroidian opened this issue · 4 comments

So I've set up HPI on Windows WSL2 Ubuntu 20.04.
But for the life of me I can't seem to get my.media.youtube to work

python3 -c 'import my.media.youtube as yt; print(yt.watched())'
Just returns []

Also:
python3 -c 'import my.media.youtube as yt; print(yt.stats())'
{'watched': {'count': 0, 'warning': 'THE ITERABLE RETURNED NO DATA'}}

python3 -c 'import my.google.takeout.paths as tk; print(tk.get_last_takeout())' does return the takeout zip file.
And that file does have \Takeout\My Activity\YouTube\My Activity.html

hpi doctor my.google.takeout.paths
Comes back all green

but
hpi modules --all lists [disabled: marked explicitly (via __NOT_HPI_MODULE__)] for my.google.takeout.paths

I'm not sure where to go from here!
Any ideas what I could try next? 😎

Thanks for the details!

python3 -c 'import my.media.youtube as yt; print(yt.stats())

just fyi, this would be the same as hpi doctor my.media.youtube

[disabled: marked explicitly (via __NOT_HPI_MODULE__)

yeah, that's expected, since my.google.takeout.paths is an auxiliary module, not a 'user-facing' one'

Regarding the actual problem: hmm. My hunch would be that maybe there is some issue with the path within the zip archive on Windows? i.e this bit

HPI/my/media/youtube.py

Lines 21 to 25 in 8abe665

path = 'Takeout/My Activity/YouTube/MyActivity.html' # looks like this one doesn't have retention? so enough to use the last
# TODO YouTube/history/watch-history.html, also YouTube/history/watch-history.json
last = get_last_takeout(path=path)
if last is None:
return []
Although judging by Zipfile code it just always uses forward slashes internally, so should be portable.. https://github.com/python/cpython/blob/a4e7d5f750e06e31a80a83c2af02b1a40cecd0ff/Lib/zipfile.py#L355-L356

Can you put a breakpoint there, or a print(last) statement on to see if it actually finds the file there?
Or maybe just python3 -c "import my.google.takeout.paths as P; print(P.get_last_takeout(path='Takeout/My Activity/YouTube/MyActivity.html'))" should work too

Yup, those returned None
But I figured it out! The filepath in my takeout the file is My Activity.html but in ln21 of youtube.py it's MyActivity.html

Bleh, can't believe I missed that! 🙈

oh wow.. I checked mine and it's definitely MyActivity.html. Damnit :)
I guess the easiest 'proper' fix would be to add another get_last_takeout call under that if condition, with My Activity.html to give it a chance to find a different file.
Somewhat related to karlicoss/kompress#10, if there was a nice uniform interface, would be possible to use something like glob('*Activity.html')...