minamotorin/twint

the scraper skip some tweets when using --until

minamotorin opened this issue · 4 comments

I found that this part of the code in url.py

    if "win" in platform:
        return f'\"{date.split()[0]}\"'

sometimes makes the scraper skip some tweets when using --until "%y-%m-%d %H:%M:%S" on Windows. It starts from some hours before the specified one. Removing these lines seems to achieve better results.

Originally posted by @Tortar in #8 (comment)

@Tortar
The code you pointed out was originally added as a workaround for twintproject#597.
When I fixed twintproject#1136, I changed the one (188e521).
But I doesn't remove the code because I can't deny the possibility that twintproject#597 has recur.

If someone helps debugging on Windows and twintproject#597 resolves, I would be able to remove the code.

I will work on it, for now --until seems to work fine while --since has some problem

Actually...on Windows works perfectly without

    if "win" in platform:
        return f'\"{date.split()[0]}\"'

I tried

twint -s pineapple --until "2020-11-09 9:00:00" --since "2020-11-09 1:00:00"

and

twint -s pineapple --until "2020-11-09" --since "2020-11-08"

and they work as expected, stopping at the right point. So, I think deleting those lines could be an improvement.

I removed the code. 0b07327

@Tortar
Thanks for your help!