/mar2moon

Primary LanguageJupyter Notebook

Mar2Moon 🚀

Goal: Speech-driven sentiment analysis on podcast data

Idea: Many trends have emerged over the past year, partially due to the COVID-19 pandemic:

  1. Financial market turbulences sinking and lifting asset prices to historical lows and all-time-highs [src]. Cryptocurrencies in particular, a technologically-driven category of financial assets, have largely benefited from this unexpected volatility. Most prominently, Bitcoin (BTC), Ethereum (ETH), Dogecoin (DOGE), and derivatives of these technologies have been affected. Yet another investment trend has been the emergence of meme stocks, e.g. AMC, GameStop (GAME), and (positively said) "influencer investment", i.e. Elon Musk twittering about cryptocurrencies. These trends have not yet been able to be adequately foreseen using automated, social media-based sentiment analysis (e.g. Vaneck Social Sentiment BUZZ ETF). We believe this could be the result of the inherent properties of such data: social media data is often too noisy and unreliable, due to the promotion and emergence of trends through virality and the platform's algorithms, rather than more sound, higher quality analysis.
  2. Podcasts are (still) up and coming [src]. Since podcasting has low technological entry barriers, albeit higher than those of social media forums, there has been a surge in podcasts available. Because podcasts are typically entirely free to use (for consumers), the competition for user's attention has increased significantly. We believe this has resulted in the quality of content being more easily identifiable through simpler rating/popularity mechanisms.

We believe these factors could render podcasts a great source for sentiment analysis: Podcasts are freely available on the internet, have high quality advantages over regular social media-based data, and offer new technological challenges to tackle! We are interested in seeing to what extent we can extract this percieved higher quality from the data, which could perhaps be used in time series models to predict macro price changes of the affected assets over time.