hm-seclab/YAFRA

Timestamp RSS and Twitter

DerAlexmeister opened this issue · 3 comments

Add a timestamp in the RSS-Feed and Twitter data which will be put into kafka topics.

p2h5 commented

If I am appending an event with the timestamp, when it got scraped, there is a good chance, that we are getting the same event twice.
e.g. if we are scraping the last thirty tweets of a twitter account every day, the chances are high, that we are getting the same tweet on both days, but with different timestamps.
Maybe we can check by the name, if there is already such an event and get rid of the duplicate, if we scrape one?

We should discuss this.

Event sourcing. Might this be an option.