This contains the codebases for the paper An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation that is accepted at the ACM Transactions on the Web.
Authors : Soumyadeep Roy, Shamik Sural, Niyati Chhaya, Anandhavelu Natarajan, Niloy Ganguly
This codebase contains all the codes and data files corresponding to the "Sentence Ranking Tool" section. We provide detailed instructions to reproduce the complete set of results.
-
MT-large dataset (29 GB): It is available in Zenodo. It contains the MT-large data, both the raw version as well as the cleaned version, divided into static and dynamic pages.
-
Human-annotated brand personality data and MT-high is not made available due to permission issues. This work is a part of a collaboration with Big Data Experience Lab, Adobe Research, Bangalore, India.
-
The human annotated data for the "Sentence Ranking Tool" is made available under the directory "data"
This codebase contain all the resource files related to the "Sentence Ranking Tool" section of the paper undert the "src" directory. Each subdirectory contains the descriptions of the files and its functionalities. We use both Python and R codes, we also provide the associated data files, that is required by the coding files.