Landing.jobs Hackathon

The issue

At Landing.jobs we're trying to match great jobs with the best candidates. However we're still missing a lot of information on the behavior of our users. For example, we currently have no information on why certain people apply and others don't.

You have data from the past year and a selection of our users. Can you create a model to predict which ones would apply at '2019-03-13'? You will be evaluated using the F1 Score.

Answer File

The submitted file should be a .csv that has one column

  • person_id -> With the ids of the people that applied at '2019-03-13'

The score should be based on the F1 score when comparing the submitted IDs with what actually happened.

In the data folder, there is a file called "solution.csv" containing the actual solution we are looking for with this challenge - the real list of people that applied on '2019-03-13'. You may use it (carefully) to check how you are doing and to present your final F1 score obtained.

Data Dictionary

You will have 6 tables availables, their contents are the following:

Applications:

  • person_id: ID of the candidate
  • id: ID of the application
  • job_ad_id: ID of the associated Job
  • submitted_at: When the application was submitted
  • created_at: When the application was created (They can also be drafts)

Jobs

  • id: ID of the job
  • company_id: ID of the company
  • experience_level: Experience bucket of the job
  • last_published_at: last time the job was published
  • closed_at: Time the job was closed at

Experience can be inside the following buckets:

  • 1 - Junior - Less than 2 years of experience
  • 2 - Intermediate - 2 to 6 years of experience
  • 3 - Senior - More than 6 years of experience

People

  • user_id: ID of the user
  • id: ID of the candidate
  • country_code: Country code
  • experience_level: Experience level from 0 to 10+
  • person_created_at: when the user was created
  • availability: Category of availability
  • remote: Category of remote

Availability categories

  • "I'm not really looking, just curious" => 0,
  • "I'm actively looking for a job" => 1,
  • "I'm currently employed, but open to a new challenge" => 2

Remote Categories

  • 'Yes' => 1,
  • 'Remote positions only' => 2,
  • 'No' => 0

Job Skills

  • job_id: ID of the job
  • canonical_tag_id: ID of the skill
  • tag_name: Name of the skill

People Skills

  • person_id: ID of the person
  • canonical_tag_id: ID of the skill
  • tag_name: Name of the skill

Views

  • id: ID of the view
  • time: When the visit happened
  • page: Page that was visited
  • user_id: ID of the user that has visited
  • visit_id: ID of the visit

Some information: View -> Single page click Visit -> Collection of views Users -> Both people (candidates) and employees

The page field may not make a lot of sense but it may be in part because it's anonymized. Here are some common page strings:

  • at/:company_id/:job_id -> Visiting a job page
  • ``-> Homepage

Some important questions

  • Are all users from the views people?
  • How much data do you actually need?