/information_visualization

Collection of dataset and links that can be used to create a cohesive visualization for the module CS5346

Information visualization

Collection of dataset and links that can be used to create a cohesive visualization for the module CS5346

Dataset

  1. It is a dataset bank that contains multiple datasets along with a few sample visualization examples:https://data.fivethirtyeight.com/
  2. Domain based dataset related to earth science, atmosphere, cloud surface area related domains: https://data.nasa.gov/data_visualizations.html
  3. Spotify Play Count for Billboard's 1990 Top 100 https://www.kaggle.com/zacharykauz/spotify-play-count-for-billboards-1990-top-100, https://www.kaggle.com/yasserh/song-popularity-dataset, https://www.kaggle.com/nenamalikah/nft-collections-by-sales-volume, https://www.kaggle.com/ahemateja19bec1025/musicgenerationdataset, https://www.kaggle.com/ektanegi/spotifydata-19212020, One of the ideas is the collect all the historical dataset and generate a most viewed, depict the evolution of music over decades, etc. to generate a web publishable dashboard for how Billboard’s Top Hits Changed over decades [for all the music lovers!]
  4. Google Trends: https://trends.google.com/trends/explore
  5. Global fundamental dataset for the banking & financial domain https://data.nasdaq.com/search?query=KAUFFMAN
  6. FBI Fireman background check dataset:https://github.com/BuzzFeedNews/nics-firearm-background-checks
  7. Iris Data Set http://archive.ics.uci.edu/ml/datasets/Iris
  8. HR Data Set- Visuals & Predictions https://www.kaggle.com/joshuaswords/awesome-hr-data-visualization-prediction
  9. Mall Customer Segmentation Data https://www.kaggle.com/joshuaswords/data-visualization-clustering-mall-data
  10. COVID-19 Vaccination Data Visualization:https://www.kaggle.com/joshuaswords/uk-covid-19-vaccination-progress-data-vis (UK),https://data.gov.sg/dataset/covid-19-vaccination (Singapore)
  11. Student Performance Visualization https://www.kaggle.com/joshuaswords/awesome-data-visualisation-student-results?scriptVersionId=57181038
  12. Netflix Data Visualization: https://www.kaggle.com/joshuaswords/netflix-data-visualization , https://www.kaggle.com/meetnagadia/netflix-stock-price-data-set-20022022
  13. FIFA Dataset: https://www.kaggle.com/stefanoleone992/fifa-21-complete-player-dataset This one is specific to FIFA21, maybe one can combine historical data to generate interesting visualization outcomes
  14. Airline Safety Dataset:https://github.com/fivethirtyeight/data/tree/master/airline-safety
  15. USA weather history Dataset:https://github.com/fivethirtyeight/data/tree/master/us-weather-history One can try doing web scraping of weather data
  16. USA Government Surveillance Planes Dataset https://github.com/BuzzFeedNews/2016-04-federal-surveillance-planes
  17. Political advertisement dataset on meta platform https://www.propublica.org/datastore/dataset/political-advertisements-from-facebook
  18. Wine and its quality dataset:http://archive.ics.uci.edu/ml/datasets/Wine+Quality
  19. Car Evaluation dataset: http://archive.ics.uci.edu/ml/datasets/Car+Evaluation,https://www.kaggle.com/ebrahimhaquebhatti/75000-used-cars-dataset-with-specifications
  20. Video Game industry dataset: https://www.statista.com/topics/868/video-games/ One can generate various interesting visualization like shown here https://studentwork.prattsi.org/infovis/visualization/visualizing-video-game-data-2007-2016-with-tableau/ [ one may need to create an account with statista to download the dataset If anyone one you find better resources please do let me know!]
  21. Social Impact Dataset: One can use web scraping and extract the data from https://www.buzzfeed.com/, https://www.data.gov/,https://www.reddit.com/r/datasets/
  22. USA Air quality dataset https://www.epa.gov/environmental-topics/air-topics
  23. open-source platform for the crowdsourced reporting and triaging of infrastructure-related issues https://github.com/taarifa/TaarifaWaterpoints
  24. Global climate dataset per continent https://en.tutiempo.net/climate
  25. UN dataset for various domains like Greenhouse Gas Inventory Data, World Development Indicators, etc. http://data.un.org/Explorer.aspx?d=CLINO
  26. Electricity Dataset: https://www.eia.gov/electricity/data/eia923/ (USA),https://www.singstat.gov.sg/find-data/search-by-theme/industry/energy-and-utilities/latest-data (Singapore); Sample visualization example https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=5379&context=sis_research
  27. Electricity Consumption & Occupancy dataset http://www.vs.inf.ethz.ch/res/show.html?what=eco-data (USA)
  28. USA FNS dataset https://www.fns.usda.gov/snap-retailer-data
  29. Some open USA state datasets: https://data.seattle.gov/, https://data.austintexas.gov/,https://datasf.org/opendata/, https://opendata.cityofnewyork.us/
  30. Road Traffic monitoring dataset: https://www.kaggle.com/shawon10/road-traffic-video-monitoring
  31. Insect Egg Evolution Dataset: https://github.com/shchurch/Insect_Egg_Evolution
  32. Dataset on tv shows, movies, documentary series, and all other forms of content available on HBO as of 2020 -https://www.kaggle.com/rishidamarla/hbo-tv-shows-documentaries-movies-as-of-2020
  33. Hollywood Theatrical Market Synopsis dataset for 1995 to 2021 https://www.kaggle.com/johnharshith/hollywood-theatrical-market-synopsis-1995-to-2021
  34. Web scrape data from anime-planet with 18000+ animes dataset https://www.kaggle.com/vishalmane10/anime-dataset-2022
  35. GDP of all countries dated from 1960- 2020: https://www.kaggle.com/holoong9291/gdp-of-all-countries19602020
  36. Uber fare dataset: https://www.kaggle.com/yasserh/uber-fares-dataset
  37. Forest Fire dataset: https://www.kaggle.com/balavashan/forest-fire-dataset
  38. Singapore Transport dataset: https://datamall.lta.gov.sg/content/datamall/en.html - It has both static eg Annual Bus Population By Passenger Capacity, etc and dynamic dataset eg Bus Routes, etc
  39. E-Commerce related dataset: https://imerit.net/blog/25-best-retail-sales-and-ecommerce-datasets-for-machine-learning-all-pbm/
  40. Singapore Government who aim to deliver insightful statistics and trusted statistical services that empower decision making: https://www.tablebuilder.singstat.gov.sg/publicfacing/selectVariables.action [cleaned dataset for use https://docs.google.com/spreadsheets/d/1a2uZKydzbP-vTdrXrdcGmfxTFnZSXdDS65XOpjWY0SE/edit#gid=1367742117]
  41. Police Department Incident Reports from 2018 to Present:https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783
  42. COVID-19 World vaccination status progress: https://www.kaggle.com/gpreda/covid-world-vaccination-progress
  43. UCS Satellite Dataset: https://www.ucsusa.org/resources/satellite-database [Independent science to solve our planet's most pressing problems checkout the sample visualization created by Union of Concerned Scientists at https://www.ucsusa.org/]
  44. E-commerce dataset https://data.world/promptcloud/fashion-products-on-amazon-com [amazon fashion product dataset], https://www.kaggle.com/c/shopee-product-detection-open [Shopee product dataset] etc where one can try creating a visualization dashboard which can show/predict/generate/suggest from which e-commerce site to buy a relevant product from [Domain: Search Relevance E-commerce application]

Resouces

  1. Basic web scraping: https://www.thepythoncode.com/article/extract-weather-data-python
  2. Data Visualization Tools that are worth looking into: Tableau[https://www.tableau.com/],Looker[https://looker.com/],Zoho Analytics[https://www.zoho.com/analytics/], Sisense[https://www.sisense.com/], IBM Cognos Analytics [https://www.ibm.com/au-en/products/cognos-analytics], Qlik Sense[https://www.qlik.com/us/products/qlik-sense], Domo[https://www.domo.com/business-intelligence/visualization] , Microsoft Power BI[https://powerbi.microsoft.com/en-us/] , Klipfolio[https://www.klipfolio.com/],SAP Analytics Cloud[https://www.sap.com/products/cloud-analytics.html] . Remember these software tools may not be free to use[some can try out the trial version too]. Feel free to contact Prof/TA's for possiblity of providing the licenses. Guide that might help in selecting the visulization tool https://callminer.com/blog/data-visualization-tools-buying-guide
  3. Go-to links to understand how to generate web-deployable dashboard: https://towardsdatascience.com/deploying-data-dashboards-automatically-reliably-and-securely-372ef802ca3c,https://topflightapps.com/ideas/how-to-create-a-dashboard-web-application/,https://topflightapps.com/ideas/how-to-create-a-dashboard-web-application/, https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/, https://help.tableau.com/current/server/en-us/web_author.htm [ Specifically if someone is using tableau], https://powerbi.microsoft.com/en-us/publishtoweb/ [PowerBI]
  4. Learning path: https://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/tableau-learning-path/[for Tableau],https://analyticsindiamag.com/8-best-free-resources-to-learn-tableau/[Free resouces to learn tableau] ,https://medium.com/javarevisited/7-best-courses-to-learn-microsoft-power-bi-for-beginners-and-experienced-developers-83695c9428dc [PowerBI], https://www.youtube.com/watch?v=3u7MQz1EyPY [PowerBI],
  5. Visualization tips: https://blog.hubspot.com/marketing/great-data-visualization-examples, https://resagratia.com/2020/07/the-differences-between-good-data-visualization-and-bad-data-visualization-part-1/ ,https://towardsdatascience.com/data-visualization-101-7-steps-for-effective-visualizations-491a17d974de,
  6. Learning tableau and getting a coursera certificate : https://www.coursera.org/learn/analytics-tableau?specialization=excel-mysql&utm_source=gg&utm_medium=sem&utm_campaign=05-ExceltomySQL-ROW&utm_content=B2C&campaignid=6558079885&adgroupid=118127838703&device=c&keyword=&matchtype=&network=g&devicemodel=&adpostion=&creativeid=507138498627&hide_mobile_promo&gclid=CjwKCAiA866PBhAYEiwANkIneJS_RfAv4PpwU4J0_QAKLLjh_OUnNsm4n5Ggm1B2b4Jxw6U2bhjHtBoCKd0QAvD_BwE [just a 18hrs lecture videos!]