A quick model fitting example
Fits a model to sample training web activity data. Also demonstrates the usage of euclidean distance to detect anomalies from a test dataset.
The Python script reads in CSVs that have the following header: date,free,pro,ent,platform,total - where the date is the five minute intervals that we store our data as. My logic only looks at the total column and I didn't investigate plan type usage differences but that would be very interesting to look at.
So for example a CSV file could look like this:
The total is the number of unique users that visited the web platform in that 5 minute span. That data aggregation was generated using Spark.