-> Data science Project : The data has been collected from different sources and has irrelevant or wrong information. The “derm.csv” dataset requires preprocessing for mining purposes. I have to applied following operation on it. 1. Data Cleaning: Apply data cleaning by use of imputation and KNN. I have showed the achieved results by applying both mechanisms. 2. Noise Removal: The data may contain an incorrect value which is known as noise. For this I have applied two smoothing methods and after that I have performed entropy based discretization and showed achieved results on each operation. 3. Data normalization: The normalization is also a very necessary step for applying any algorithm. Showed the results by bringing two attributes on same scale. 4. Cosine Similarity: Designed the similarity and dissimilarity matrix of any 20 most frequently used words from the document. You can see the code and further details link provided.

MIT

mirza76/Data-Science-Project

Data-Science-Project