The CSV files generated from these lists are almost empty (more details inside)

Question

The CSV files generated from these lists are almost empty (more details inside)

champorado86 opened this issue 9 months ago · 3 comments

Hi, first off, hats off to you for creating this scraper. I'm trying to build a dataset from the Letterboxd users I follow and this has been a timesaver. I was just manually scraping LOL :)

I ran into an issue with this link and this link as both of them return almost empty CSV files. I've tried other links before with a similar option and they came out OK. The first one should have returned a CSV with 87 titles and the second one should return 145 titles. What got generated are 1kb CSV files with only 1 title each. I also noticed there was no notification of "Written to xxx-film.csv" either.

I'd like to understand what's causing the issues for these 2 particular links. Again, thank you for creating this scraper tool and I hope you have other development plans for this in the future.

Answer 1 · 2024-04-23T21:18:21.000Z

Hi! Glad the project is of use to you 😄

I had a quick look and this bug seems to be a very specific case where the first film in the list is unreleased (e.g. https://letterboxd.com/film/furiosa-a-mad-max-saga/ and https://letterboxd.com/film/wicked-2024/ in your case). Because it is unreleased, there are no official ratings and the histogram stats page https://letterboxd.com/csi/film/{title}/rating-histogram/ is empty.

Scraping is done correctly, but the program crashes during writeout as no ½,★,★½, etc. columns were created for the unreleased film. Moreover, because the writeout function extracts the CSV/JSON header from the first film entry, this only happens when the unreleased film is the first entry in the list. Congrats on finding this very specific bug!

Anyway a clumsy oversight on my part, but I did not realise that you could add unreleased films to lists (TIL 😃). I have added a fix to the program and your lists should now scrape correctly.

Answer 2 · 2024-04-23T21:51:57.000Z

I've added the fix in v2.1.0 so I'll close this issue for now.

Answer 3 · 2024-04-24T00:12:28.000Z

Thank you! That was fast :)