LUNG_CANCER-DETECTION

End-to-End TFX Pipeline for Lung Image Analysis

NOTE

This project demonstrates how an end-to-end solution can be built using TFX and MLOps techniques. The focus here is not solely on fine-tuning the model for optimal performance across different metrics, but rather on showcasing the application of MLOps principles and the creation of a fully operational pipeline.

I have trained and tuned the model on the same dataset used throughout the pipeline, If you're interested in exploring a well-tuned version of the model with significantly better performance across various metrics, you can check the link here: [https://www.kaggle.com/code/sawanrawat/lungsdetection-fulldata].

Project Overview

This project demonstrates an end-to-end TFX (TensorFlow Extended) pipeline designed for processing and analyzing lung images. The pipeline integrates several components of TFX to facilitate the ingestion, transformation, training, evaluation, and serving of a machine learning model specifically tuned for lung image classification.

Introduction
Pipeline Architecture
Setup and Installation
Data Preparation
Pipeline Components
Running the Pipeline
Model Serving
Results and Evaluation
Contributing
License

Introduction

This project aims to build a robust TFX pipeline for lung image classification, leveraging TensorFlow and associated tools to streamline the ML workflow from data ingestion to model serving. The pipeline handles end-to-end processing, including data validation, transformation, model training, and deployment.

Pipeline Architecture

The TFX pipeline includes the following components:

ExampleGen: Ingests raw image data.
ExampleValidator: Performs data validation and quality checks.
SchemaGen: Generates a schema for data validation.
Transform: Transforms and preprocesses the data.
Trainer: Trains a TensorFlow model on the processed data.
TensorflowModelAnalysis: Analysis of the model.
Evaluator: Evaluates the model performance.
Resolver: Resolves the model and data artifacts.
Pusher: Pushes the model to the serving infrastructure.

Setup and Installation

Prerequisites

Python 3.10+
TensorFlow 2.13.1
TFX 1.14.0
Docker (for containerized deployment)
interactivePipeline (for experimentation)

Installation Steps

Clone the Repository

git clone https://github.com/sawanjr/LUNG_CANCER-DETECTION-TFX.git

Create and Activate a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies
```
pip install -r requirements.txt
```
Set Up Docker (if applicable)

Follow Docker installation instructions for your operating system: https://docs.docker.com/get-docker/

Data Preparation

The dataset consists of CT scanned lung images that are collected from the hospital ,are used for training and evaluation. The data should be structured in the following format:

Raw Images in PNG format: Stored in a directory structure by class labels.
Labels: Associated with each image for supervised learning.

Converting Images to TFRecord

Use the imageToTFrecord.py script to convert images and labels into TFRecord format:

python scripts/imageToTFrecord.py --input_dir path/to/images --output_dir path/to/tfrecords

Pipeline Components

ExampleGen

Ingests the TFRecord files into the pipeline.

ExampleValidator

Validates the data to ensure quality and consistency.

SchemaGen

Generates the schema for data validation based on the dataset.

StatisticsGen

ExampleTransform

Transforms the raw images and labels into the format required for training.

Trainer

Trains a TensorFlow model using the processed data. The model architecture is defined in model.py.

Evaluator

Evaluates the model performance on validation data and generates evaluation metrics.

Pusher

Pushes the model to a serving infrastructure for inference.

Resolver

Resolves and manages model and data artifacts throughout the pipeline.

Running the Pipeline

Initialize and Run the Pipeline

run .\pipelines\apache_beam\pipeline_beam.py

Model Serving

To serve the trained model using TensorFlow Serving:

Build and Run TensorFlow Serving Docker Container

!docker run -p 8500:8500 \
-p 8501:8501 \
--mount type=bind,source=".\serving_model_dir\1724395112",target=/models/my_model/1 \
-e MODEL_NAME=my_model -t tensorflow/serving

Send Prediction Requests

Use the REST API to send prediction requests:

 import tensorflow as tf
 import requests
 import json
 import base64
 
 def serialize_example(image_path):
     # Read and serialize the image
     image = tf.io.read_file(image_path)
     feature = {
         'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image.numpy()])),
         'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[0]))  # Dummy label
     }
     example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
     return example_proto.SerializeToString()
 
 # Serialize the image into a TFRecord format
 tfrecord_data = serialize_example("Test/Malignant/047_CT_56-seg_15.png")
 # Base64 encode the serialized TFRecord
 tfrecord_data_base64 = base64.b64encode(tfrecord_data).decode('utf-8')
 
 # Create the payload for the REST API request
 data = {
     "signature_name": "serving_default",  # Ensure this matches your SavedModel's signature
     "instances": [{"examples": {"b64": tfrecord_data_base64}}]  # Send base64-encoded TFRecord data
 }

Send the request to TensorFlow Serving running in Docker

 url = "http://localhost:8501/v1/models/my_model:predict"  # Adjust the model name and endpoint as needed
 headers = {"content-type": "application/json"}
 # Send the request
 response = requests.post(url, data=json.dumps(data), headers=headers)
 
 # Print the response from TensorFlow Serving
 print(response.json())

Results and Evaluation

The model’s performance is evaluated based on metrics such as accuracy, precision, recall, and F1-score. Evaluation results can be found in evaluation_results.md.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License.

sawanjr/LUNG_CANCER-DETECTION-TFX