<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRSfMKKgoQBXkE_HCRXP59uLEQqp094DeJc0V41IPIJFDveugrtfpZz_9dTpwTPh9Gn4hHM90OZvwk0/embed?start=false&loop=false&delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

Challenge

This is a repository about Covid-19 impacts on Software Engineering based on Multivocal Literature Review (MLR).

Solution

Traditional literature is a academic peer review paper. Grey literature review is a data font very important today. But, there are diverse kinds of fonts (blogs, white paper, webinars ...) and data formats (text, video, audio). Then, how to collect, analyze, and reveal the grey literature value in the age of knowledge or age of big data?

For this challenge, we proposed an analytics mindset and an analytics pipeline that combines Data Engineering, Machine Learning, and Dataviz to data analyze and to reveal Covid-19 impacts in Software Engineering through Multivocal literature Review (MLR).

Steps

  1. Collection
    1. collect grey literature
    2. merge grey datasets
    3. collect traditional literature
    4. merge traditional datasets
    5. create multivocal (join grey and traditional literature)
  2. Screening
    1. screening multivocal literature (manual process)
    2. kappa statistics
    3. get pdf
    4. get fulltext from pdf
  3. Dataviz
    1. nlp grey literature
    2. nlp traditional literature
    3. nlp multivocal
    4. eda multivocal

Benefits

Save of time, resources, computational enforcement, automation are some benefits of this methodology. This methodology is generalized, then you can test in other areas of the domain and create your own cases of studies.