/esg-nlp

Analysing ESG report using Natural Language Processing

Primary LanguageJupyter Notebook

Analysing ESG report using Natural Language Processing

Summary

Environment, Social and Corporate Governance (ESG) refers to the three central factors in measuring the sustainability and societal impact of an investment in a company or business. These criteria help to better determine the future financial performance of companies (return and risk).

This analysis extracts text from a ESG report in PDF format from the internet, performs NLP on these information, summaries the key ESG initiatives with Word Clouds, TDIDFs and discovers topics by building a Latent Dirichlet Allocation (LDA) model.

To keep this exercise as simple as possible, only one ESG report is being used. Specifically the Citibank's 2019 ESG report.

Given that ESG is a broad topic. Different companies focus on different aspects of ESG depending on their business operations and culture. One can potentially ingest more ESG reports from different companies across all sectors and industries to capture relevant ESG topics. This to be attempted in another analysis.

Notebook

  1. https://github.com/edgetrader/esg-nlp/blob/master/notebook/esg-report-analysis.ipynb

Reference

  1. A data-driven approach to Environmental, Social and Governance
  2. Higher ESG ratings are generally positively correlated with valuation and profitability while negatively correlated with volatility.
  3. Topic Modeling with Gensim (Python)
  4. Citibank's 2019 ESG report
  5. Databricks - ESG Reports
  6. Databricks - Data Driven ESG Score
  7. Databricks - ESG Market Risk
  8. Topic Modeling and Latent Dirichlet Allocation (LDA) in Python
  9. Evaluate Topic Models: Latent Dirichlet Allocation (LDA)
  10. Topic modeling visualization – How to present the results of LDA models?