This program use some geolocation tech to find PTT replyers' location.
Get the ptt post, and draw the statistic to a pie chart.
python3 genReplyPieChart.py INPUT
Note that this method does not provide any way to change any config. It should be easy enough to get the resultss from findLocation.main([...parameters...]) and call pyplot.
When calling findLocation.main, the return value is a dictionary with the following keys:
- 'ip_record_ratio': a floating point number of the ratio of record ip.
- 'poster': the poster dictionary. It has keys like 'title', 'city', etc.
- 'foreign_push': all the foreign pushes
- 'taiwan_push': all the taiwan pushes
- 'taiwan_push_acc' and 'foreign_push_acc': the number of pushes for each country/city.
- 'foreign_push_id': all the foreigh push id.
It is important to include a chinese font since bydefault pyplot does not support chinese. '思源宋體' is by defualt used by the program, so please download it from 思源宋體 and place the 'regular' one in the root directory (should be named as 'NotoSerifCJKtc-Regular.otf'). It is hard coded in genReplyPieChart.
The output plot will be stored in 'gen'.
Or just print the statistic to stdout or whatever place you like.
python3 findLocation.py [-h] [-k KEYFILE] [-db DATABASE] [-shp SHAPEFILE] [-qt QUERYTIMES] [-o OUTPUT | -silent] INPUT
Use -h for more details.
Users need to place the api services they want to use and other required information in a file. By default, its called "key.txt", but can be changed via '-k'.
Each line contains a service, formatted as follow:
SERVICE_NAME [other required information when making requests, like key]
I will assume you are using a free trial of all the api, so there will be limitation on each api service. Also, all limitations are hard coded, so if the services change their limits...
Supported services:
Input is a web ptt post url. For example: Gossiping Board November 2018 Chat page.
These packages cab be installed through pip3.
-
pandas - Reading and parsing csv file and put into sqlite.
-
beautifulsoup4 - Web scraping.
-
fiona - Read shapefile.
-
shapely - Administration area matching.
User has to provide the shapefile of Taiwan administration area. The shapefile can be found on the Taiwan government open data platform. Choose the download with SHP file. Extract all the files and place them together.
By default, the shapefile is called "./shp/COUNTY_MOI_1070516.shp", but this can be changed with optional arguments '-shp' by passing the filename of the .shp file.
If -shp option is used, the program will automatically handle the path to all the other files.
User should place IP2Location database in the project.
All it does is filter the ip that is not in Taiwan.
The database must also include longitude and latitude information.
Please download the CSV one since I don't know what that bin file is.
This site or product includes IP2Location LITE data available from http://www.ip2location.com.