/gce-al-2020-public-dataset-m

Public dataset on performance of candidates in the GCE Advanced Level (AL) exam, in 2020 in Sri Lanka

⚙️ GCE AL 2020 Exam Results Dataset - public


DOI - 10.34740/kaggle/ds/2302701

This is by far my biggest and most favorite project. Take a look at the project details; you might find it interesting.

This dataset contains information on the performance of students in the GCE Advanced Level (AL) exam in Sri Lanka in 2020. It was collected by Sasika Amarasinghe and is available on Kaggle.


I removed some columns from the original dataset for ethical reasons. However, here is a sample of the data when a search query is entered.

Video Thumbnail

When a school candidate's name is provided, the system retrieves comprehensive details, including their birthdate, which is not originally disclosed on the exam result sheet. (Applicable to candidates from the 2020 AL batch 😄)


Dataset Characteristics

  • The dataset consists of over 300,000 records of student performance in the GCE AL exam in Sri Lanka.
  • The data includes information on student identification, school, district, medium of instruction, stream, and their scores in different subjects.
  • The data also includes the overall Z-score of each student, which is a standard score that indicates the number of standard deviations by which the student's exam results are above or below the mean.

Features

  • Index: A unique identifier for each student
  • School ID: Identification number of the school
  • District: District where the school is located
  • Stream: Science, Arts, or Commerce stream of the student
  • Medium: Sinhala or English medium of instruction
  • Subjects: The scores of the student in each of the subjects - Mathematics, Science, English, Buddhism, and History
  • Z-Score: The overall Z-score of the student
  • Bday: Birthday of applicant

Use Cases

  • This dataset can be used to study the performance of students in different subjects and in different streams, medium of instruction, and districts.
  • The data can also be used to study the relationship between student performance and demographic factors such as medium of instruction and district.
  • This dataset can be used to identify the factors that contribute to the performance of students in the GCE AL exam and to make recommendations for improving student performance in the future.

Usability

  • 9.41 / 10

Sources

Collection Methodology

  • The data was scraped using a Python script written by the author, using the index number as the primary key.
  • Subsequently, the national identity card numbers were decoded to extract applicants' birthdays and genders.
  • Due to privacy concerns, "Full name," "National Identity Card number," and "Index number" were removed, but the birthdays and genders were added to the dataset.
  • AWS EC2 instances were employed to collect data concurrently, reducing both the time and data usage.

I was awarded a bronze 🥉 medal for this dataset, receiving 38 upvotes in the Kaggle Community, along with very positive feedback from the community members.

What People Are Saying About the Dataset 🌟

This can be actually used to look after the academic likelihoods and whereabouts of Sri Lankan students' academics! Great job! -- VISHESH THAKUR - Datasets Expert

This data could be used for EDA, visualization and even model development! Good work and great dataset! -- RAVI RAMAKRISHNAN-Notebooks Grandmaster