The CSV files generated from these lists are almost empty (more details inside)
champorado86 opened this issue · 3 comments
Hi, first off, hats off to you for creating this scraper. I'm trying to build a dataset from the Letterboxd users I follow and this has been a timesaver. I was just manually scraping LOL :)
I ran into an issue with this link and this link as both of them return almost empty CSV files. I've tried other links before with a similar option and they came out OK. The first one should have returned a CSV with 87 titles and the second one should return 145 titles. What got generated are 1kb CSV files with only 1 title each. I also noticed there was no notification of "Written to xxx-film.csv" either.
I'd like to understand what's causing the issues for these 2 particular links. Again, thank you for creating this scraper tool and I hope you have other development plans for this in the future.
Hi! Glad the project is of use to you 😄
I had a quick look and this bug seems to be a very specific case where the first film in the list is unreleased (e.g. https://letterboxd.com/film/furiosa-a-mad-max-saga/ and https://letterboxd.com/film/wicked-2024/ in your case). Because it is unreleased, there are no official ratings and the histogram stats page https://letterboxd.com/csi/film/{title}/rating-histogram/
is empty.
Scraping is done correctly, but the program crashes during writeout as no ½,★,★½, etc.
columns were created for the unreleased film. Moreover, because the writeout function extracts the CSV/JSON header from the first film entry, this only happens when the unreleased film is the first entry in the list. Congrats on finding this very specific bug!
Anyway a clumsy oversight on my part, but I did not realise that you could add unreleased films to lists (TIL 😃). I have added a fix to the program and your lists should now scrape correctly.
Thank you! That was fast :)