/LUNG_CANCER-DETECTION-TFX

This Project Demonstrate End To End Solution For Machine Learning Project - A Complete Pipeline

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

LUNG_CANCER-DETECTION

Image

End-to-End TFX Pipeline for Lung Image Analysis

NOTE

This project demonstrates how an end-to-end solution can be built using TFX and MLOps techniques. The focus here is not solely on fine-tuning the model for optimal performance across different metrics, but rather on showcasing the application of MLOps principles and the creation of a fully operational pipeline.

I have trained and tuned the model on the same dataset used throughout the pipeline, If you're interested in exploring a well-tuned version of the model with significantly better performance across various metrics, you can check the link here: [https://www.kaggle.com/code/sawanrawat/lungsdetection-fulldata].

Project Overview

This project demonstrates an end-to-end TFX (TensorFlow Extended) pipeline designed for processing and analyzing lung images. The pipeline integrates several components of TFX to facilitate the ingestion, transformation, training, evaluation, and serving of a machine learning model specifically tuned for lung image classification.

image

Table of Contents

  1. Introduction
  2. Pipeline Architecture
  3. Setup and Installation
  4. Data Preparation
  5. Pipeline Components
  6. Running the Pipeline
  7. Model Serving
  8. Results and Evaluation
  9. Contributing
  10. License

Introduction

This project aims to build a robust TFX pipeline for lung image classification, leveraging TensorFlow and associated tools to streamline the ML workflow from data ingestion to model serving. The pipeline handles end-to-end processing, including data validation, transformation, model training, and deployment.

Pipeline Architecture

The TFX pipeline includes the following components:

  • ExampleGen: Ingests raw image data.

  • ExampleValidator: Performs data validation and quality checks.

  • SchemaGen: Generates a schema for data validation.

  • Transform: Transforms and preprocesses the data.

  • Trainer: Trains a TensorFlow model on the processed data.

  • TensorflowModelAnalysis: Analysis of the model.

  • Evaluator: Evaluates the model performance.

  • Resolver: Resolves the model and data artifacts.

  • Pusher: Pushes the model to the serving infrastructure.

    image

Setup and Installation

Prerequisites

  • Python 3.10+
  • TensorFlow 2.13.1
  • TFX 1.14.0
  • Docker (for containerized deployment)
  • interactivePipeline (for experimentation)

Installation Steps

  1. Clone the Repository

    git clone https://github.com/sawanjr/LUNG_CANCER-DETECTION-TFX.git
  2. Create and Activate a Virtual Environment

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install Dependencies

    pip install -r requirements.txt
  4. Set Up Docker (if applicable)

    Follow Docker installation instructions for your operating system: https://docs.docker.com/get-docker/

Data Preparation

The dataset consists of CT scanned lung images that are collected from the hospital ,are used for training and evaluation. The data should be structured in the following format:

  • Raw Images in PNG format: Stored in a directory structure by class labels.
  • Labels: Associated with each image for supervised learning.

Converting Images to TFRecord

Use the imageToTFrecord.py script to convert images and labels into TFRecord format:

python scripts/imageToTFrecord.py --input_dir path/to/images --output_dir path/to/tfrecords

Pipeline Components

ExampleGen

Ingests the TFRecord files into the pipeline.

ExampleValidator

Validates the data to ensure quality and consistency.

image

SchemaGen

Generates the schema for data validation based on the dataset.

image

StatisticsGen

image

ExampleTransform

Transforms the raw images and labels into the format required for training.

Trainer

Trains a TensorFlow model using the processed data. The model architecture is defined in model.py.

Evaluator

Evaluates the model performance on validation data and generates evaluation metrics.

Pusher

Pushes the model to a serving infrastructure for inference.

Resolver

Resolves and manages model and data artifacts throughout the pipeline.

Running the Pipeline

Initialize and Run the Pipeline

run .\pipelines\apache_beam\pipeline_beam.py

Model Serving

To serve the trained model using TensorFlow Serving:

  1. Build and Run TensorFlow Serving Docker Container

    !docker run -p 8500:8500 \
    -p 8501:8501 \
    --mount type=bind,source=".\serving_model_dir\1724395112",target=/models/my_model/1 \
    -e MODEL_NAME=my_model -t tensorflow/serving
  2. Send Prediction Requests

    Use the REST API to send prediction requests:

     import tensorflow as tf
     import requests
     import json
     import base64
     
     def serialize_example(image_path):
         # Read and serialize the image
         image = tf.io.read_file(image_path)
         feature = {
             'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image.numpy()])),
             'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[0]))  # Dummy label
         }
         example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
         return example_proto.SerializeToString()
     
     # Serialize the image into a TFRecord format
     tfrecord_data = serialize_example("Test/Malignant/047_CT_56-seg_15.png")
     # Base64 encode the serialized TFRecord
     tfrecord_data_base64 = base64.b64encode(tfrecord_data).decode('utf-8')
     
     # Create the payload for the REST API request
     data = {
         "signature_name": "serving_default",  # Ensure this matches your SavedModel's signature
         "instances": [{"examples": {"b64": tfrecord_data_base64}}]  # Send base64-encoded TFRecord data
     }
     

    Send the request to TensorFlow Serving running in Docker

     url = "http://localhost:8501/v1/models/my_model:predict"  # Adjust the model name and endpoint as needed
     headers = {"content-type": "application/json"}
     # Send the request
     response = requests.post(url, data=json.dumps(data), headers=headers)
     
     # Print the response from TensorFlow Serving
     print(response.json())
    

Results and Evaluation

The model’s performance is evaluated based on metrics such as accuracy, precision, recall, and F1-score. Evaluation results can be found in evaluation_results.md.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License.