/spbmetro

Extract information from http://t.me/s/spbmetro using NLP techniques

Primary LanguageJupyter Notebook

Goal

I want to parse the @spbmetro channel and make a text classifier to get information:

  • which subway station was opened / closed
  • when did it happen
  • what was the reason

Then it will be possible to notify citizens more efficiently: make predictions in 2GIS / Yandex.Maps while building A-B routes; or just for a nice infographics.

About

  • metro.json — names of subway stations, lines, colors, transfers
  • export.py — script to export channel messages from telegram
  • history.json — dumped messages, refined by myself
  • validate_time.py — used for validation of my refinements
  • structure.py — data objects with typings, describing subway and history

Run this project via PyCharm, cause I don't really get, how to run it from CLI.

Usage

Create new telegram application on the official website. Read Pyrogram docs.

[pyrogram]
api_id = 356428
api_hash = ...