Clone the project in a path you wish
git clone https://github.com/sarismet/codeway-sarismet
Go to the project directory
cd codeway-sarismet
Make sure you have already npm installed in your machine. To check if you have or not
node -v
If you do not have and reasonable response then install node from here
After that install dependencies
npm i express
npm i @google-cloud/pubsub
npm i @google-cloud/bigquery
or
npm install
To use the google cloud services we have to login our account. To login you might follow these instructions
We must prepare the dataset, table, topic and subscription before running our servers. You need to give the names of them as argument.
node init.js topic_test sub_test dataset_test table_test
After this command we will have our bigquery dataset and table to store our data and topic and subscription on that topic to communication for main server with the server inserting data.
We have two different services. The one under directory named main-server is to catch API request. If the request is 'GET' then we start to analyze data executing some queries. On the other hand, if the request is 'POST' then we transmit the request data to our second server. We established the comminication via google pub/sub. The other server under path named info-register-server is to insert the data taken from main-server into Bigquery table.
To run main server
cd main-server
node index.js topic_test dataset_test table_test
To run info register server
cd info-register-server
node index.js sub_test dataset_test table_test
- Since we have two different servers you need to wait for the other server responsible for insertion if you send a post request to main server to send another request to analyze. The first request can last a little bit longer than the other due to the SSL handshake.
- Daily average session duration can be 0. We substract the last one from the first one and since first one and the second one are the same the substraction result equals to 0. Therefore it can affect the average session duration. Consider the case we have average session duration 500 then if another session with different id comes to the server then our analysis would have average session duration as lower than 500 since the denominator increases even though numerator remains as it is.
- In the server we insert the data coming from main server we run the insertion mechanism every second.
- We get the post request from user or another server ve transmit the data without processing it to the server running under info-register-server. Since the main do not have to wait that server the main server can be available to get another post request. Besides, since we use google cloud's pub/sub server, it helps us run our server under huge traffic.
Returns a basic analysis of data in Bigquery.
Request
GET http://localhost:8080/analyze
Parameter | Type | Description |
---|---|---|
- |
- |
- |
Response
Body | Type | Description |
---|---|---|
total_users |
int |
total user count |
daily_stats |
array |
daily stats informations |
{
"total_users": 2,
"daily_stats": [
{
"date": "9/7/2021",
"average_session_duration": 32,
"active_user_count": 3,
"new_user_count": 1
},
{
"date": "8/7/2021",
"average_session_duration": 24,
"active_user_count": 2,
"new_user_count": 2
}
]
}
Request
POST http://localhost:8080/insert
Body | Type | Description |
---|---|---|
type |
string |
type of event |
session_id |
string |
session id |
event_name |
string |
event namet |
event time |
int |
event time |
page |
string |
page location |
country |
string |
country code |
region |
string |
region |
city |
string |
city |
user_id |
string |
user id |
{
"type": "event",
"session_id": "9FDA743232C2-AB57-483240-87D0-64324772B5A2",
"event_name": "click",
"event_time": 1589627711,
"page": "main",
"country": "TR",
"region": "Marmara",
"city": "Istanbul",
"user_id": "Uu1qJzlfrxYxOSsds5z2321kfAbmSA5pF3"
}
Response
Body | Type | Description |
---|---|---|
Message |
String |
The message id, which is to the server running under info-register-server directory |
Message {messageID} sent.
I left a dockerfile for deployment process however you need to login the google cloud in that container. When I built it and run it the error message was "Unable to detect a Project Id in the current environment.".
Server: Nodejs, express, @google-cloud/pubsub, @google-cloud/bigquery
I have used nodejs and expressjs for the server side. We have two different servers and I needed to establish communication between them so the google cloud's pub/sub meets this requirement. In order to store data I used google cloud's bigquery.
- The analysis report plot the analsis for each date. However, the dates are not in order. While you scroll down you can see all the analysis of 07/07/2021 earlier than analysis of 05/07/2021. I should have order the list before response.
- We execute five different queries for each field of analysis. We can enhance this process to speed up by developing two complex queries to extract all the data from Bigquery. - We could use Redis to cache the data so that we do not have to execute query for the dates that we already have in Redis.
I have used many piece of code from readme tutorials of these repositorys below.