Pinned Repositories
add_corporate_information_daily_of_china
**大陆 31 个省份最近几日新增工商企业注册信息以及其他部分企业数据,大概100余万信息,包含企业名称、注册地址、统一社会信用代码、省份、城市、注册日期、经营范围、负责人、邮箱、注册资金、企业类型等资料。 In 31 provinces in mainland China, About 1000000 messages,new business registration information has been added in recent days, including company name, registered address, unified social credit code, province, city, registration date, business scope, responsible person, mailbox, registered capital, and type of business.
aistudio-doc2vec-for-investigative-journalism
How Quartz used AI to help reporters search the Mauritius Leaks
aistudio-dochate-public
Learning text classification for journalists through DocHate tips
aistudio-fbdb
aistudio-searching-data-dumps-with-use
searching large heterogenous data dumps with Universal Sentence Encoder
aistudio-workshops
Workshops created by the Quartz AI Studio
awesome-iptv
A curated list of resources related to IPTV
bad-data-guide
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
Crawling-Infrastructure
Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
crypto-cyoa
The complete guide to investing in cryptocurrencies
BorderlessData's Repositories
BorderlessData/add_corporate_information_daily_of_china
**大陆 31 个省份最近几日新增工商企业注册信息以及其他部分企业数据,大概100余万信息,包含企业名称、注册地址、统一社会信用代码、省份、城市、注册日期、经营范围、负责人、邮箱、注册资金、企业类型等资料。 In 31 provinces in mainland China, About 1000000 messages,new business registration information has been added in recent days, including company name, registered address, unified social credit code, province, city, registration date, business scope, responsible person, mailbox, registered capital, and type of business.
BorderlessData/aistudio-doc2vec-for-investigative-journalism
How Quartz used AI to help reporters search the Mauritius Leaks
BorderlessData/aistudio-dochate-public
Learning text classification for journalists through DocHate tips
BorderlessData/aistudio-fbdb
BorderlessData/aistudio-searching-data-dumps-with-use
searching large heterogenous data dumps with Universal Sentence Encoder
BorderlessData/aistudio-workshops
Workshops created by the Quartz AI Studio
BorderlessData/awesome-iptv
A curated list of resources related to IPTV
BorderlessData/bad-data-guide
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
BorderlessData/Crawling-Infrastructure
Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
BorderlessData/datadonkey
DataDonkey handles XML, CSV and Excel files
BorderlessData/german-gov-domains
An incomplete listing of german government domains
BorderlessData/GlobaLeaks
GlobaLeaks - The Open-Source Whistleblowing Software
BorderlessData/government.github.com
Gather, curate, and feature stories of public servants and civic hackers using GitHub as part of their open government innovations
BorderlessData/govt-urls
Most government websites end in .gov or .mil, but many do not. This repo contains USA.gov's list of public government domains and URLs that don't end in .gov or .mil.
BorderlessData/hstspreload.com
An API to determine if a domain is included in HSTS preload lists.
BorderlessData/infosechiring.com
Open jobs and job seekers in the information security field.
BorderlessData/iptv
Collection of 8000+ publicly available IPTV channels from all over the world
BorderlessData/nomenklatura
Data de-deuplication tool
BorderlessData/pol-ad-dashboard
Political Ad Dashboard
BorderlessData/proxy_pool
Python爬虫代理IP池(proxy pool)
BorderlessData/qccspider
企查查企业信息爬虫 ,企查查app每日新增企业抓取,可以进行每日的增量抓取、企业数据、工商数据等等。
BorderlessData/quackbot
BorderlessData/salesforce-ssrf
BorderlessData/Save-to-the-Wayback-Machine
Browser extension for quickly saving web pages to the Internet Archive's Wayback Machine.
BorderlessData/scrapy-wayback-machine
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
BorderlessData/terraform-aws-dynamic-subnets
Terraform module for public and private subnets provisioning in existing VPC
BorderlessData/wayback-machine-chrome
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
BorderlessData/wayback-machine-downloader
Download an entire website from the Wayback Machine.
BorderlessData/wayback-machine-scraper
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
BorderlessData/waybackurls
Fetch all the URLs that the Wayback Machine knows about for a domain