Single endpoint for POST
requests: http://<ip_address_to_be_provided>/analyze
-
Request:
{ "TASK" : "get_items_map", "CONTEXT_ID" : <str: mandatory>, # the id of the map / context }
Response if the map is available:
{ "SUCCESS" : true, "DATAFRAME_MAP" : <str>, #the table that provides ML-generated information about the products "INTERESTS_CENTERS" : <list[float, float, int]> # [[x1,y1,cluster1], [x2,y2,cluster2], ...] }
You can download an example response from: https://www.dropbox.com/s/n51sxn2gi0x6fog/response_example.json?dl=0
Response if the map is not generated yet:
{ "SUCCESS" : false, "REASON" : "Try again later. The map is not generated yet (maybe there is a trraining in progress)" }
Understanding the response:
-
The string returned for
"DATAFRAME_MAP"
represents the json codification of a table with x columns and y rows. All needed to recreate the table is to generate the dictionary,dct = json.loads(response["DATAFRAME_MAP"])
and then use a library that handles dataframes (in python the best way to do it is usingpandas
:df = pandas.DataFrame(dct))
. You can download a csv file that is generated based onresponse["DATAFRAME_MAP"]
from: https://www.dropbox.com/s/jz9euj2lw6oi3tt/decoded_dataframe_example.csv?dl=0The head of the table looks as follows:
ItemId IDE Freq InterestId X Y 0 545535 0 26079 -1 -3.14 11.37 1 398648 1 20894 0 -39.36 8.45 2 563633 2 15816 1 9.70 21.34 3 576732 3 14902 -1 37.50 24.44 4 406083 4 12334 -1 -16.86 8.51 5 602874 5 10537 -1 37.44 24.51 6 589207 6 9816 -1 31.49 13.40 7 258901 7 9507 2 -59.15 -20.47 8 486656 8 9500 -1 37.44 32.09 9 527499 9 9138 9 33.00 15.15 10 533559 10 9117 -1 -7.31 -43.69 11 524657 11 8949 -1 25.81 27.62 12 599979 12 8623 3 35.00 46.06 13 651356 13 8516 -1 13.96 25.45 14 466968 14 8056 -1 36.96 26.99 15 213201 15 7944 4 37.13 30.41 16 616607 16 7915 -1 39.56 20.04 17 259150 17 7634 -1 -56.05 5.88 18 589279 18 7560 77 36.03 31.60 19 538942 19 7553 5 35.04 15.56
The tail of the table looks as follows:
ItemId IDE Freq InterestId X Y 142937 732521 142937 1 -1 NaN NaN 142938 803789 142938 1 -1 NaN NaN 142939 801273 142939 1 -1 NaN NaN 142940 785036 142940 1 -1 NaN NaN 142941 690413 142941 1 -1 NaN NaN 142942 375936 142942 1 -1 NaN NaN 142943 780657 142943 1 -1 NaN NaN 142944 803929 142944 1 -1 NaN NaN 142945 801901 142945 1 -1 NaN NaN 142946 786783 142946 1 -1 NaN NaN
Properties of the table:
- the number of rows is equal to the number of the unique items/products in the provided training dataset
- The column
ItemId
contains the items/products skus provided in the training dataset - The column
IDE
is an internal continous id in range[0, nr_products)
given based on the frequency (the higher the frequencyFreq
, the lower theIDE
) - The column
Freq
is the frequency of the item in transactions (an item with frequency 1 was ever bought just once) - The column
InterestId
is the id of the need/interest/category/cluster in which the model places the item. Based on this field the points are colored on the map. IfInterestId
is -1, it means that the model did not handled to place the item in a certain category (whether because there are very few transactions involving that item, or because the pattern cannot be identified well). - The columns
X
andY
are the items coordinates on the 2D map. The NaN values are for the items that cannot be displayed on the map (always these are the ones with the lowest frequencies). Even they cannot be displayed on the map, they are taken into consideration when we compute similarity, complementarity or other metrics
-
For each generated interest/cluster, the model will compute also the center in terms of x and y coordinates. Therefore, the value for
“INTERESTS_CENTERS“
looks as follows:[ [1.2455, 2.83844, 0], [2.6724, 1.24341, 1], ... [31.8912, 67.12134, 1788] ]
The single purpose of this field is when the center of each cluster should be drawn on the map. Otherwise, it can be discarded.
Properties of the list:
- The number of elements in the list is equal to the number of the generated clusters (i.e.
len(response["INTERESTS_CENTERS"]) == df['InterestId'].max()+1
) - Each element in the list is a list with 3 objects: x, y, and id of the cluster.
- The ids of the clusters (found on the 3rd object of each list element) are identical to the ones found in the table on the
"InterestId"
column
- The number of elements in the list is equal to the number of the generated clusters (i.e.
Info:
From our experience, the easiest way to integrate the map in a website is using bokeh: https://docs.bokeh.org/en/latest/docs/user_guide/bokehjs.html
Code snippet for rendering a map in python:
from bokeh.plotting import figure from bokeh.models import ColumnDataSource, HoverTool,\ BoxZoomTool, ResetTool, WheelZoomTool, PanTool,\ Arrow, VeeHead, CustomJS, Label, LassoSelectTool, Slider from bokeh.models.widgets import TableColumn, DataTable, TextEditor,\ Button, MultiSelect HEIGHT = ... WIDTH = ... source_data = dict(x=[], y=[], color=[], name=[], item_id=[], interest_id=[]) source_data["x"] = df["X"].values source_data["y"] = df["Y"].values source_data["color"] = compute_colors(df["InterestId"].values) source_data["name"] = get_names(df["ItemId"].values) source_data["item_id"] = df["ItemId"].values source_data["interest_id"] = df["InterestId"].values tooltips = [("Nume", "@name"), ("ItemId", "@item_id"), ("Category", "@interest_id")] source = ColumnDataSource(data=source_data) p = figure( plot_height=HEIGHT, plot_width=WIDTH, title="", tools=[BoxZoomTool(), WheelZoomTool(), ResetTool(), PanTool(), LassoSelectTool()] ) p_scatter = p.scatter( x="x", y="y", source=source, radius = 0.25, color="color", line_color = None ) hover = HoverTool(tooltips=tooltips, renderers=[p_scatter]) p.add_tools(hover) ...
-
-
-
Even though this calculus can be done based on the provided table, we also provide this functionality via API.
TODO
-
For training new maps, training data is needed (transactions). We will want to train new maps when new filters are generated (e.g. the map of the products for the Valentine’s day season, the map of the products for location x, etc.). The filtered transactions should be dumped in a database view in order to fetch the data using a simple query "SELECT * FROM <view>"
.
After the filtered data is dumped in the view, you can make an API request (POST http://<ip_address_to_be_provided>/product_explorer
)
Request:
{
"TASK" : "train_map",
"CONTEXT_ID" : <str: mandatory>, # the id of the map / context
"TABLE_INP" : <str: mandatory>, # the name of the table / view in the database where the filtered transactional data is stored
"TABLE_OUT" : <str: mandatory>, # the name of the table / view in the database where the map will be saved after training
"HOUR" : <int: optional [default 19]>, # the hour when this particular training will be scheduled
"MINUTE" : <int: optional [default 0]> # the minute when this particular training will be scheduled
}
Observation: The fields "HOUR"
and "MINUTE"
are used to schedule the training. By default any training is scheduled to start in the same day when the request was sent at 16:00. Attention! The time of the machines were the training is performed is usually GMT.
Response:
{
"message" : "success"
}