- Intro / Motivation 😕
- Data retrieval
- How do we export our dialog/chat
- Each message contains fields:
- Real world view VS Extracted data view
- Authors
Nowadays, advertisement has become an integral part of business as such.
It is no longer a question of whether one needs to promote their business or not.
The question now is how and where to promote one's product.
There is a variety of methods and facilities for advertisement.
According to Instamber: "telegram marketing can be a productive method to promote your business, as you have millions of active Telegram users around the globe."
Nevertheless, all the given articles' methods assume those entrepreneurs would either spend lots of time developing their channel or pay money to have their ads spread.
So we come with an alternative approach to promoting a business via Telegram.
Let us assume that we have to promote paid CS courses and materials like books, site subscriptions via Telegram at no cost. In other words, we can pay popular channels for ads and have no time to spare developing any channels. So, naturally, we would choose thematic groups for promoting our products. For example, we can take a Python programming group.
It is the right choice because the majority of members of this group are interested in CS. Everything seems to be perfect for our purpose, but there is usually a low level of spam tolerance, which ads are considered to be. So we need to embed the advertisement in the message and send them in a way that will not make admins suspect us of spamming or advertising.
The way of embedding an advertisement into a message is neither a matter of probability theory nor statistics.
Let us assume that we come up with an idea of how to do that. Nevertheless, our plan might have drawbacks. For example, admins can be overly suspicious and uncover our scheme if we send too many messages with a hidden advertisement. However, that is not a big problem for us, and we can send one message a week, not risking being kicked from the group.
So, an interesting question arises, on what day of the week and at what time should we send hidden ads to influence as many group members as possible.
This is where the Statistics may come in handy.
For this mini-research, we will be using our own retrieved data from a relatively big (~10k members) Telegram channel.
The channel specializes in programming, specifically Python, and is mostly in the Russian language, however because of being the tech chat, there are lots of daily active users, which is what we exactly need.
One of the team members has been working on a private project that processes messages in telegram chats/dialogs/channels.
We are using that small tool to download all the data we need.
It is written in Python and mainly depends on telethon
package for Python.
We also generated an API keys, having which we could access any message in any chat/channel/dialog (if only our acc. is the valid member)
In our case it is messages in a specific chat "Python" (we also could use any other chat/channel or a dialog)
The snippet of the actual data retrieval function:
async def download_dialog(client, id, MSG_LIMIT):
try:
tg_entity = await client.get_entity(id)
messages = await client.get_messages(tg_entity, limit=MSG_LIMIT)
except ValueError:
errmsg = f"No such ID found: #{id}"
raise ValueError(errmsg,)
dialog = []
for m in messages:
msg_attrs = msg_handler(m)
dialog.append(
{
"id": m.id,
"date": m.date,
"from_id": m.from_id,
"to_id": msg_attrs["to_id"],
"fwd_from": m.fwd_from,
"message": msg_attrs["message"],
"type": msg_attrs["type"],
"duration": msg_attrs["duration"],
}
)
dialog_file_path = os.path.join(config["dialogs_data_folder"], f"{str(id)}.csv")
df = pd.DataFrame(dialog)
df.to_csv(dialog_file_path)
Let's first take an overview of the data we have. So in the end, our dateset contains every message chronologically ordered.
id
— the id of the messagemessage
— the actual messagedate
— the precise date & timefrom_id
— the id of a user which sent the messageto_id
— the id of the user to which the message was senttype
— type of message [sticker/video/voice]duration
— if type of video/voice then its durationfwd_from
— the id of the user from which the message was forwarded
You can view raw data here link
The same message string "Сваггер схемы, прото файлы вполне могут быть"
in Telegram application and in extracted .csv file.
id | date | from_id | to_id | fwd_from | message | type | duration |
---|---|---|---|---|---|---|---|
... | ... | ... | ... | ... | ... | ... | ... |
112 | 2021-01-04 16:46:16+00:00 | PeerUser(user_id=123109378) | PeerChannel(channel_id=1007166727) | Спасибо. В таком случае немного раздражает в каждом сервисе писать сериализацию/десереализацию | text | ||
116 | 2021-01-04 16:39:07+00:00 | PeerUser(user_id=214334796) | PeerChannel(channel_id=1007166727) | А зачем общие дто, если бд разные, и языки могут быть разные? Общие .proto файлы или схемы разве что | text | ||
115 | 2021-01-04 16:40:29+00:00 | PeerUser(user_id=123109378) | PeerChannel(channel_id=1007166727) | Ну возможно, если обобщить мой вопрос то: как описывают и следят за контрактами на уровне сервисов | text | ||
114 | 2021-01-04 16:41:21+00:00 | PeerUser(user_id=214334796) | PeerChannel(channel_id=1007166727) | Документацией, end to end тестами, общими схемами | text | ||
113 | 2021-01-04 16:43:04+00:00 | PeerUser(user_id=43022119) | PeerChannel(channel_id=1007166727) | Сваггер схемы, прото файлы вполне могут быть | text | ||
... | ... | ... | ... | ... | ... | ... | ... |