- Data Visualization, NLP
- use text as input and analyse text with nlp methods (tokenize, punctuation, stop words, stemming, lemmatizing etc.)
- text visualization
- Find How much of the text reducal
- Find How big is the dictinary
- train word2vec models to find robust networks
- measure the variation in word distances
- vectorize words
- sentence -> tokenize -> count frequency
- train a word2vec neural network
- visulaize the results
- n-dimension vector -> 2-dimension vector -> visulaize
- fetch all articles from pubmed with keywords ("antibiotic resistant")
- parse articles (title, year, abstract)
- save data as json
- Read all articles and find country count with geotext and pycountry
- Filter by publication type and exclude reviews
- Then we have to word2vec model to vectorize text and find word embeddings
- visualise data and extract insights from data
- Write an interactive geographical map to show the number of studdies on the map
- Use bokeh and seaborn to develop the visulization tool
- How much is antimicrobial resistance reported at different geographical scales over time?
- How does the emergence of AMR vary across time for different classes of antimicrobials?
- Data preparation and Data integration
- unsupervised clustering
- simple linear regression
- how to find relationship between countries publications with countries gdp ?
- prepare input data
- read all wdi excel files, filter by country and merge all of them
- create correlation matrix and Visualize with R language (Spearman test and Hierarchical Clustering)
- correlation between vector distance and metadata difference
- create/train random forests model and desicion trees
- RMSE – shows error about how my model Works, because we are doing regression not classfication, our aim to predict.
- create feature importance – how often the feature is used in the model for the predict
- we want to see what paramters are more important? Visualise importance of features, how much you can predict that
- find Local Importance with using SHAP
- firstly create clustering algorithm to capture information about clusters then Dimension reduction algorithm
- Check for Countries what factors are most important ?
- create shap summary plot
- local importance visualize for some countries