colombia-dev/data

Survey 2018

buritica opened this issue · 17 comments

We should create a new salary survey for 2018.

For context:

  • The post of the last survey can be found here
  • A copy of the actual survey can be found here

The easy way for us would be to just resend the same survey, maybe optimizing some of the questions but ideally, we don't change the structure of them much or we'd lose the ability to relate data to past survey.

The better way I'd like for us to try to do, is to have people opt-in to auth the survey with Github oauth, but figuring out a way that we could guarantee their anonymity (maybe hash their user id or something) but that we could keep track of their future responses so that we can see how they progress and how the industry progresses.

This would be more work upfront, but we could reuse it later.

If you have any feedback on the questions or the methods to collect the survey please chime in below.

  • is there any question we would like to add? (i.e: any question from stackoverflow's survey?)

  • shall we add:

    • "es un extranjero residente en Colombia?, de que pais?"

¿Cuanto es su salario ANUAL? *
se usará su respuesta anterior como base de moneda, por favor incluya primas y bonificaciones (salario mensual * 12)

shall we change this question into:

¿Cuanto es su salario ANUAL? *
se usará su respuesta anterior como base de moneda, por favor incluya primas y bonificaciones (salario mensual * 12. (salario mensual = despues de impuestos y prestaciones) )


+1 on the "extranjero" question

as far as the modification to ask for taxes/benefits what is the intent? woudln't it skew from our last survey?

as far as the modification to ask for taxes/benefits what is the intent? woudln't it skew from our last survey?

mm current question does not specify this(?). I reckon it might get us mixed data among colombians living abroad and those working in the country who are usually aware of their monthly income after taxes.


any volunteers willing to think / build the github/hashing integration ?

If you get something for 2018 would be great!! I will use your data for a couple of analysis. thanks guys!

Can we have a different currency for the previous job?

any volunteers willing to think / build the github/hashing integration ?

But Is it an integration with Google docs or what?

In this moment, Where can we find the results of the survey?

@jdnichollsc you can see data https://github.com/colombia-dev/data/tree/master/salaries/2016

integration i have no idea how it would be, it's an initial proposal

Can we have multiple options for some questions?, the reason? many people usually have different jobs... Or can the same person answer twice?

Why ¿Cuantos hijos tiene? is an important/relevant question here?
Can we include "Company name*? I think it's an important question, in order to determine how the company plays with salaries... If negotiation is really a relevant factor.

  • hijos, it gives us an indication on experienced engineers who have a family and have been able to - continue involved in the technical track

  • we shy away from company name because some people dont want to be identified, we coudl make it optional. the challenge there would be that we could be encouraging fake reviews from these companies to skew their results and then its a new field that we'd have to normalize, because text inputs allow for anything. eg, I work for Github, GH, GitHub, Github Inc. etc

Hello guys, As I told you. You are doing a very good job gathering this data. It's is very useful for analysis and check out trends of our careers in Colombia. However, if you allow me to let me give you a quick feedback:

  • Headers: What if you break down in two different files or at least try to post in a different way the file header. The think is that with your current structure is very heavu to handle these headers from the analitical side. Imagine that as a column name :)

  • Data type: For some questions where the output expected should be a number theres is a mix up between strings , dates or numbers would be amazing apply a clean up in the format.

thanks so much and I hope my comments helps with something.

thanks @AndresUrregoAngel. our aim is not to post clean data, we hope to do the work of collecting and posting raw and then analysts like yourself can help sanitize if necessary, as you can see in past pull requests.

in regards to the headers, the only real way to do this is to get a key map for questions <> headers, and it's something you could do right before analysis. if others feel like this is something helpful, we can just number columns and then it's on the analysts to map those back to the original questions, but we think that may be harder

regarding this survey here are my thoughts:

Anonymous

  • directly to google forms
  • fills form
  • done

User opts to login

  1. user goes to our app
  2. user auths at gihtub
  3. server hashes username
  4. 🌟server generates JWT containing hashed username
  5. users get forwarded to google form with pre-filled field with JWT token
  6. user fills form
  7. at survey closure we validate JWT tokens
    • non-valid ones are assumed to be anonymous submissions
    • valid ones, we extract the hashed username, add it to the CSV

Regarding step 🌟4, we treat usernames like passwords.
Any suggestions for this step are highly welcome!!

Just validated the idea above.
👍 gonna squash the code a bit and then share it.

Things we need:

  • hashing function: any good idea here is welcome.
  • landing page (choose to login, or go anonymous)
  • the actual survey, with an extra field, which we are going to auto-populate.

According to the current year, and current survey (2020), this issue should be closed.