OndaNet
My purpouse is to learn Data Analysis and Data Mining using data from Onda Rock, an Italian music portal.
Step 1: A network of the music
Each review page of Ondarock is conneted with the other using hyperlink. I would obtain the network using to parse the pages:
- Request
- BeautifulSoap
- Htlm5lib
- nltk
To store and analyse the net:
- NetworkX
To plot the data
- D3
Step 1.5: find clusters
There are clusters? And these follow the division based on music gender?
Step 2: It's better store the data
I will chose a way to store and organize data, for example a DB, like Mongo o Couch. Any information is precious, like votes or the page reviewer
Step 3: Other data analysis
- I would use also Pandas to charge the data
- to analise. After I can think about to search correlation between data, or to developt a method to sugest me some music that I don't know but that will be like.