/AI_Challenge_PitchAI

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

AI Challenge for Pitch.AI Project

Overview

Welcome to the Pitch.AI AI Challenge! This project is designed to evaluate your ability to work with real-world datasets, focusing on text preprocessing, feature extraction, and classification algorithms.

The dataset you'll be working with is the UCI Spambase dataset, which is a labeled collection of email messages used to train algorithms to distinguish between legitimate messages and spam.

Instructions

  1. Fork this repository to your GitHub account.
  2. Complete the missing parts of main_code.py by implementing the functions for data preprocessing, feature extraction, model training, and evaluation.
  3. Ensure your code passes the tests in test_code.py by running the test script.
  4. Create a pull request (PR) back to this repository once you're done.

Important Guidelines

  • Your GitHub email must match the email you submitted your application with (if your GitHub email is different, we recommend creating a new GitHub profile with the email you used in your application, preferably your UWaterloo email).
  • Set your email to public on your GitHub profile.
  • Do not apply any labels on your PR. We will mark your PR as reviewed with a label when it has been reviewed. If you mark this yourself, your PR will be skipped.

Task

You are provided with a partially implemented Python script main_code.py. Your task is to complete the functions that handle data preprocessing, feature extraction, model training, and evaluation. The goal is to develop a spam detection model using classification algorithms.

Requirements

  • Implement the missing functions in main_code.py.
  • Use scikit-learn or similar libraries to build your models.
  • Run test_code.py to ensure your code works as expected.

Dataset

The dataset is provided in the data/spambase.csv file. It contains features extracted from email messages, labeled as spam (1) or not spam (0). Dataset Link

Submission Process

  1. Fork this repository to your own GitHub account.
  2. Complete the main_code.py with your implementation.
  3. Run the test_code.py script to ensure that your implementation works correctly.
  4. Push your changes to your forked repository.
  5. Create a pull request back to this repository.

Make sure you follow the guidelines to avoid any issues with your submission. We look forward to reviewing your work!

Good luck!