Word-Embedding-Recommendation

With the increasing number of data tables collected by the big data platform from the government of Nanhai District, the more complex the attributes in the tables, the lower the accuracy of the data tables obtained by users who utilize a simple search algorithm. This is not conducive to the application in the actual situation. In order to solve this problem and make full use of the advantages of data intelligence. In this paper we consider designing data table recommendation algorithms based on the data table subscription records of departments in the government and trained keyword embeddings. First, we established a ternary heterogeneous information network with the elements of subscription departments, tables, and source departments. Then we conducted random walk algorithm on the node sequences in this network. Finally we obtained trained user departments embeddings and data table embeddings based on subscription records. In addition, we used keywords (data table names) and attributes in the corresponding data table as input layers to construct a classification neural network, and we got trained keyword embeddings finally. Experiments proved that these word embeddings effectively retain the original semantics. Then we proposed new collaborative filtering algorithms based on user department embeddings and data table embeddings. Experiments showed that the recommendation algorithms with embeddings as input peformed better than the traditional recommendation algorithms. Finally, we built a recommendation system combinined keyword embeddings and the proposed collaborative filtering algorithms. The results of the experiments proved that this model can make full use of the information in the government dataset to provide user departments with highly relevant data tables. This paper can train the professional terminology from the big data platform of government into word embeddings, which provides new ideas for training the embeddings of professional terminology, and this model can provide personalized recommendations for different departments and keywords.

Keywords: Heterogeneous information network; Random walk; Collaborative filtering; Word embeddings; Subscription records