
Continuously updated Sina Weibo Public Opinion Datasets / 持续维护的微博舆情数据集

Continuously updated Sina Weibo Public Opinion Datasets (only for research)

Sina Weibo is Chinese largest public social media platform. The latest and most popular social events will be disclosed and discussed on Weibo as soon as possible. Therefore, it is of great significance to build a real-time and full-scale Weibo public opinion dataset.

At present, given specified keywords and a specified period, there are two kinds of methods for constructing Weibo tweet datasets: (1) Applying advanced search API given by Weibo; (2) Traversing all Weibo users, collecting all their tweets in the specified period, and then filtering tweets with specified keywords.

However, for the first kinds of method, due to the limitation of the Weibo search API, the result of one time search contains up to 1000 tweets, making it difficult to build large-scale datasets. As for the second kinds of method, although we could build large-scale datasets with almost no omissions, traversing all billions of Weibo users requires very long time and large bandwidth resources. In addition, a large number of Weibo users are inactive, and it makes no sense to traverse their homepages, because they may not post any tweets in the specified period.

To alleviate these limitations, we propose a novel method to construct Weibo tweet datasets, which can build large-scale datasets with high construction efficiency. Specifically, we first build and dynamically maintain a high-quilty Weibo active user pool (just a small part of all users), and then we only traverse these users and collect all their tweets with specified keywords in the specified period.

Weibo Active User Pool

Based on init seed users and continuous expansion through social relationships, we first build a Weibo user pool including more than 250 million users. The active Weibo user pool is constructed based on Weibo user pool and follow 4 rules:
Item Rule Item Rule
Follows number > 50 Tweets number > 50
Fans number > 50 Recent post < 30 days

Finally, we build a Weibo active user pool with 20 million users, accounting for 8% of the total number of weibo users.

Weibo Public Opinion Datasets

⚠️Note that the following datasets all have been desensitized


Paper: Weibo-COV: A Large-Scale COVID-19 Social Media Dataset from Weibo


Weibo-COV V2

Compared with Weibo-COV, Weibo-COV V2 has longer time span, bigger data size and more refined keyword filtering method. We also released 20 million Weibo active user pool after desensitization to promote extensive and in-depth researches.

  • Time Period: 2019-12-01 00:00 - 2020-12-30 23:59 (GMT+8)
  • Keywords: Common keywords and monthly different keywords. For one month, we use common keywords and this month's specific keywords to filter this month's all original tweets.
  • Amount: 65,175,112 tweets filtered from 2,615,185,101 original tweets by keywords.
  • Sample: Weibo-COV V2
  • Sample: 20 million Weibo active user pool


  • Time Period: 2019-12-01 00:00 - 2020-04-30 23:59 (GMT+8)
  • Keywords: totle of 179 selected keywords
  • Amount: 40,893,832 tweets filtered from 692,792,816 original tweets by keywords. In addition, we also release all original tweets with GEO tag without keywords filtering, covering 45,901,994 tweets.
  • Sample:
If you have good ideas on social media computing or public opinion analysis, and want to communicate with me. You can feel free to email me: huyong@bit.edu.cn