Example counting prevalence of tweeted images
lintool opened this issue · 1 comments
lintool commented
This example works with Warcbase (on rho):
import org.warcbase.spark.matchbox._
import org.warcbase.spark.matchbox.TweetUtils._
import org.warcbase.spark.rdd.RecordRDD._
import org.json4s._
import org.json4s.jackson.JsonMethods._
val tweets = RecordLoader.loadTweets("/mnt/vol1/data_sets/elxn42/ruest-white/elxn42-tweets-combined-deduplicated.json", sc)
val counts = tweets.flatMap(tweet => tweet \\ "media_url_https" \ classOf[JString] )
.countItems()
.collect()
Results:
counts: Array[(org.json4s.JString#Values, Int)] = Array((https://pbs.twimg.com/media/CRvL6hnVEAE_mvv.jpg,11558), (https://pbs.twimg.com/ext_tw_video_thumb/635933769208193025/pu/img/ZrrpFszwfGfdUZuR.jpg,8876), (https://pbs.twimg.com/media/CRj91ZqUcAAr4KS.jpg,7896), (https://pbs.twimg.com/media/CRqFEyCWEAAj9VK.jpg,6258), (https://pbs.twimg.com/media/CRDXt1CU8AAoiWA.jpg,6122), (https://pbs.twimg.com/media/CRn4WnhWEAAmaSB.jpg,5776), (https://pbs.twimg.com/media/CRpE6D6UEAA_8zB.png,5430), (https://pbs.tw...
jrwiebe commented
Added to docs. Closing.