freeCodeCamp/2016-new-coder-survey

Survey Datasets Transformation and Re-Configuration

evaristoc opened this issue · 8 comments

We are working on data transformation to fit the requirements for the analysis.

People who have been working on this so far:

We are currently working on @erictleung's fork but reporting through this channel to preserve history.

@erictleung identified several discrepancies, particularly related to data parsing. Those that he found, he solved. We must be following this until being sure that all corrections are complete.

These are main tasks:

Currently working on best way to encode variables and values to facilitate analysis. We are working on:

  • Variable re-naming and or Creation
  • Variable labelling
  • Value encoding
  • Value labelling
  • Missing value encoding and labelling
  • Metadata File Management

We should take care that changes made to values and variables should be updated in the metadata file manager.

@evaristoc @erictleung Before you do this, I recommend you get a final dump of the survey. We've had several hundred additional responses since I last did a dump. I will upload the files now.

@erictleung @evaristoc OK - I've added the updated data.

Just for reference, here is the branch of my fork of the repo to clean and combine the data. I'm primarily doing the data processing in R, for those interested.

We found several inconsistencies for the answers given to:

  • Resources : Others
  • PodCasts : Others
  • Code Events : Others

We are working on parsing the data and trying to give some consistency to those inconsistency based on general assumptions. This also means that answers that can not be responded for those assumptions will be considered as missing.

This procedure might not be perfect but it is the best we can do.

Sorry for the two recent messages (5 min ago): they were deleted and discussion moved to #29

Reference for questions about data #41