A collection of nine sets of RAW textual data, in SPANISH language (8 "Recorridos Temáticos"), intended for RESEARCH & EDUCATIONAL purposes, specially TRAINING OF Text-mining, text-analytics technical skills: NLP, PCA, Corpus construction, Preprocessing of unstructured data (importing, encoding, and other commonly to raw textual data: cleaning, applying stopwords, stemming, data visualization, etc.).
CSV files-Themas:
- 1: Wine,
- 2: Love,
- 3: Gastronomie,
- 4: Sustainability,
- 5: Moda fashion,
- 6: Jewellery,
- 7: Leisure time,
- 8: Flowers.
Additionally, we binded all them to construct a common dataset with the additional COLUMN/Variable 'Tema'.
- fichas_8-recorridosTematicos_MThyssen_raw.csv
The data were collected from the website Museo Thyssen-Bornemisza using 'R' and the libraries 'rvest', 'tidyverse'.
Data source: Museo Thyssen-Bornemisza.
https://www.museothyssen.org/visita/recorridos-tematicos [Retrieved: 2020-12-01]