Corporatica Full-Stack Software Engineering Task

Overview

This project is a comprehensive software application designed for advanced data analysis and manipulation. It features robust front-end and back-end components, capable of handling various data types including tabular data, RGB images, and textual data. The application includes functionalities for data processing, analysis, and visualization, implemented using Flask for the back-end and React for the front-end. The project is containerized using Docker and deployed with Kubernetes for scalability and resilience.

Segmentation

This task is segmented for data analysts, computer vision engineers, and NLP engineers. It encompasses different aspects of data handling, including tabular data processing, image processing, and natural language processing (NLP).

Overview
Time Allocation
Methodology
Technologies Used
Directory Structure
Installation and Setup
Module Details
Running the Application
Usage
Additional Information
- Contributions
- License

Time Allocation

The project was completed over a span of five days, with the following time allocations:

Research: 15 hours
Development: 45 hours
Testing: 24 hours
Deployment: 6 hours

Methodology

Development Methodology

We followed an Agile development methodology, allowing for iterative development, continuous feedback, and incremental improvements. This approach ensured that we could quickly adapt to any changes and improve the application based on user feedback.

Technology Selection Criteria

The selection of technologies, libraries, and frameworks was based on the following criteria:

Flask: For its simplicity and ease of setting up a web server and handling HTTP requests.
Web Devlopment: For creating a responsive and dynamic front-end interface.
Docker and Kubernetes: For containerization and scalable deployment.
Pandas and SQLAlchemy: For efficient data manipulation and ORM capabilities.
Pillow and OpenCV: For comprehensive image processing tasks.
Transformers, Yake, and TextBlob: For advanced NLP functionalities.

Technologies Used

Back-End: Flask, Python, Pandas, SQLAlchemy
Front-End: React, HTML, CSS, JavaScript
Containerization and Deployment: Docker, Docker Compose, Kubernetes
Data Processing and Visualization: Matplotlib, Pillow, OpenCV
NLP Libraries: Transformers (Hugging Face), Yake, TextBlob

Directory Structure

Corporatica Back-End Developer Task Phase/
├── __pycache__/
├── instance/
│   ├── data.db
│   └── default.db
├── results/
├── static/
│   ├── scripts.js
│   └── styles.css
├── templates/
│   ├── home.html
│   ├── image_processing.html
│  
│   ├── tabular_data.html
│   └── text_analysis.html
├── uploads/
├── venv/
├── app.py
├── app2.py
├── requirements.txt
├── sample_1.jpg
├── sample_2.jpg
├── sample_3.jpeg
└── Updated_Dockerfile
└── updated_docker-compose.yml

Installation and Setup

Prerequisites

Python 3.6 or higher
Virtual environment tool (venv)
Docker
Docker Compose

Steps

Clone the repository:

git clone https://github.com/Geo-y20/Corporatica--Task
cd Corporatica--Task

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On macOS/Linux
.\venv\Scripts\activate  # On Windows

Install dependencies:
```
pip install -r requirements.txt
```
Build and run Docker containers:
```
docker-compose up --build
```
Open your web browser and go to http://localhost:5000/.

Module Details

Tabular Data Processing

Overview

This module handles the uploading, processing, querying, and visualization of tabular data. Users can perform CRUD operations, run queries, and generate visualizations.

Routes and Endpoints

File Upload: /upload (POST)
- Uploads tabular data files (CSV, XLS, XLSX) to the server.
Process Data: /process/<filename> (GET)
- Processes the uploaded data and calculates statistics including mean, median, mode, quartiles, and outliers.
Query Data: /query/<filename> (POST)
- Allows users to run SQL-like queries on the data.
Manage Data: /data/<filename> (GET, POST, PUT, DELETE)
- Provides CRUD operations for the data.
Visualize Data: /visualize/<filename> (GET)
- Generates visualizations like histograms, bar charts, and box plots.

Detailed Code Snippets

File Upload and Preprocessing

@app.route('/upload', methods=['POST'])
def file_upload():
    if 'files' not in request.files:
        return jsonify({'message': 'No files part in the request'}), 400
    files = request.files.getlist('files')
    if not files:
        return jsonify({'message': 'No files uploaded'}), 400

    filenames = []
    for f in files:
        if not allowed_file(f.filename):
            return jsonify({'message': 'File type not allowed'}), 400
        original_filename = secure_filename(f.filename)
        new_filename = create_timestamped_filename(original_filename)
        save_path = os.path.join(app.config['UPLOAD_FOLDER'], new_filename)
        f.save(save_path)
        df = pd.read_csv(save_path) if new_filename.endswith('.csv') else pd.read_excel(save_path)
        
        # Store both original and cleaned dataframes
        data_store[new_filename] = {
            "original": df,
            "cleaned": preprocess_data(df.copy())
        }
        filenames.append(new_filename)
    
    return jsonify({"filenames": filenames, "status": "Files successfully uploaded"}), 200

Processing Data and Calculating Statistics

@app.route('/process/<filename>', methods=['GET'])
def process_data(filename):
    if filename not in data_store:
        return jsonify({'error': 'File not found'}), 404

    try:
        df = data_store[filename]["cleaned"]
        stats_type = request.args.get('stats', 'all')
        stats = calculate_statistics(df)

        stats = convert_nan_to_none(stats)

        return jsonify({'filename': filename, 'statistics': stats}), 200
    except Exception as e:
        return jsonify({'error': 'Error processing file: ' + str(e)}), 500

Querying Data

@app.route('/query/<filename>', methods=['POST'])
def query_data(filename):
    if filename not in data_store:
        logging.debug(f"Filename {filename} not found in data_store")
        return jsonify({'error': 'File not found'}), 404

    params = request.json
    if not params or 'query' not in params:
        logging.debug("No query provided in request")
        return jsonify({'error': 'No query provided'}), 400

    try:
        df = data_store[filename]["original"]
        query = params['query']
        logging.debug(f"Executing query: {query}")

        # Ensure the query is valid
        if not query.strip().lower().startswith("select"):
            return jsonify({'error': 'Only SELECT queries are allowed'}), 400

        result_df = psql.sqldf(query, locals())
        logging.debug(f"Query result: {result_df}")

        result_json = result_df.to_dict(orient='records')
        return jsonify(result_json), 200
    except Exception as e:
        logging.exception("Error executing query")
        return jsonify({'error': 'Error executing query: ' + str(e)}), 500

Managing Data

@app.route('/data/<filename>', methods=['GET', 'POST', 'PUT', 'DELETE'])
def manage_data(filename):
    if filename not in data_store:
        return jsonify({'error': 'File not found'}), 404

    df = data_store[filename]["original"]

    if request.method == 'GET':
        return jsonify(convert_nan_to_none(df.to_dict(orient='records'))), 200

    elif request.method == 'POST':
        new_data = request.json
        try:
            new_df = pd.DataFrame([new_data])
            df = pd.concat([df, new_df], ignore_index=True)
            data_store[filename]["original"] = df
            return jsonify(convert_nan_to_none(df.to_dict(orient='records'))), 200
        except Exception as e:
            return jsonify({'error': 'Error adding data: ' + str(e)}), 500

    elif request.method == 'PUT':
        update_data = request.json
        try:
            for index, row in update_data.items():
                df.loc[int(index

)] = row data_store[filename]["original"] = df return jsonify(convert_nan_to_none(df.to_dict(orient='records'))), 200 except Exception as e: return jsonify({'error': 'Error updating data: ' + str(e)}), 500

    elif request.method == 'DELETE':
        delete_indices = request.json.get('indices')
        try:
            df.drop(index=delete_indices, inplace=True)
            data_store[filename]["original"] = df
            return jsonify(convert_nan_to_none(df.to_dict(orient='records'))), 200
        except Exception as e:
            return jsonify({'error': 'Error deleting data: ' + str(e)}), 500
```

Visualizing Data

@app.route('/visualize/<filename>', methods=['GET'])
def visualize_data(filename):
    if filename not in data_store:
        return jsonify({'error': 'File not found'}), 404

    try:
        df = data_store[filename]["original"]
        plot_type = request.args.get('plot_type', 'histogram')
        column = request.args.get('column')

        if column not in df.columns:
            return jsonify({'error': 'Column not found'}), 404

        plt.figure(figsize=(10, 6))

        if plot_type == 'histogram':
            df[column].hist()
            plt.title(f'Histogram of {column}')
        elif plot_type == 'bar':
            df[column].value_counts().plot(kind='bar')
            plt.title(f'Bar Chart of {column}')
        elif plot_type == 'box':
            df[column].plot(kind='box')
            plt.title(f'Box Plot of {column}')
        else:
            return jsonify({'error': 'Invalid plot type'}), 400

        plt.tight_layout()

        img = io.BytesIO()
        plt.savefig(img, format='png')
        img.seek(0)
        plot_url = base64.b64encode(img.getvalue()).decode()

        return jsonify({'plot_url': f'data:image/png;base64,{plot_url}'}), 200
    except Exception as e:
        logging.exception("Error visualizing data")
        return jsonify({'error': 'Error visualizing data: ' + str(e)}), 500

Image Processing

Overview

This module allows users to upload images, generate color histograms, create segmentation masks, and perform image manipulations like resizing, cropping, and format conversion.

Routes and Endpoints

Image Upload: /upload_image (POST)
- Uploads image files to the server.
Get Histogram: /histogram/<filename> (GET)
- Generates and returns a color histogram for the uploaded image.
Get Segmentation Mask: /segmentation/<filename> (GET)
- Generates and returns a segmentation mask for the uploaded image.
Resize Image: /resize/<filename> (POST)
- Resizes the image to specified dimensions.
Crop Image: /crop/<filename> (POST)
- Crops the image to specified coordinates.
Convert Image Format: /convert/<filename> (POST)
- Converts the image to the specified format.

Text Analysis

Overview

This module provides text summarization, keyword extraction, and sentiment analysis functionalities. It leverages various NLP libraries and models to process the input text.

Routes and Endpoints

Text Summarization: /summarize (POST)
- Summarizes the input text.
Keyword Extraction: /keywords (POST)
- Extracts keywords from the input text.
Sentiment Analysis: /sentiment (POST)
- Analyzes the sentiment of the input text.
Text Search: /search (POST)
- Searches for a specific term within the input text.
Text Categorization: /categorize (POST)
- Categorizes the input text based on the presence of certain keywords.

Detailed Code Snippets

Image Upload and Storage with Batch Processing

@app.route('/upload_image', methods=['POST'])
def upload_image():
    if 'images' not in request.files:
        return jsonify({"error": "No images provided"}), 400

    files = request.files.getlist('images')
    file_info = []

    for file in files:
        if file.filename == '':
            return jsonify({"error": "No selected file"}), 400
        if file:
            filename = secure_filename(file.filename)
            file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
            file.save(file_path)
            with Image.open(file_path) as img:
                width, height = img.size
            file_info.append({
                "filename": filename,
                "width": width,
                "height": height,
                "path": url_for('uploaded_file', filename=filename),
                "mode": img.mode,
                "format": img.format
            })

    return jsonify({"message": "Images successfully uploaded", "files": file_info}), 200

Generating Color Histograms

def generate_histogram(image_path):
    image = cv2.imread(image_path)
    color = ('b', 'g', 'r')
    figure = Figure()
    axis = figure.add_subplot(1, 1, 1)

    for i, col in enumerate(color):
        hist = cv2.calcHist([image], [i], None, [256], [0, 256])
        axis.plot(hist, color=col)

    output_path = os.path.join(app.config['RESULT_FOLDER'], os.path.basename(image_path) + '_histogram.png')
    figure.savefig(output_path)

    return output_path

@app.route('/histogram/<filename>', methods=['GET'])
def get_histogram(filename):
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    histogram_path = generate_histogram(image_path)
    return send_file(histogram_path, mimetype='image/png')

Generating Segmentation Masks

def generate_segmentation_mask(image_path):
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    ret, mask = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
    
    output_path = os.path.join(app.config['RESULT_FOLDER'], os.path.basename(image_path) + '_segmentation.png')
    cv2.imwrite(output_path, mask)
    
    return output_path

@app.route('/segmentation/<filename>', methods=['GET'])
def get_segmentation_mask(filename):
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    mask_path = generate_segmentation_mask(image_path)
    return send_file(mask_path, mimetype='image/png')

Image Manipulation Tasks

Resizing Image

@app.route('/resize/<filename>', methods=['POST'])
def resize_image(filename):
    width = request.json.get('width')
    height = request.json.get('height')
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    if not os.path.exists(image_path):
        return jsonify({"error": "File not found"}), 404

    try:
        image = Image.open(image_path)
        resized_image = image.resize((width, height))
        img_io = io.BytesIO()
        resized_image.save(img_io, 'PNG')
        img_io.seek(0)
        return send_file(img_io, mimetype='image/png')
    except Exception as e:
        return jsonify({"error": str(e)}), 500

Cropping Image

@app.route('/crop/<filename>', methods=['POST'])
def crop_image(filename):
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    image = Image.open(image_path)
    left = request.json.get('left')
    top = request.json.get('top')
    right = request.json.get('right')
    bottom = request.json.get('bottom')

    if left < 0 or top < 0 or right > image.width or bottom > image.height or left >= right or top >= bottom:
        return jsonify({"error": "Invalid crop coordinates"}), 400

    cropped_image = image.crop((left, top, right, bottom))
    
    output = io.BytesIO()
    cropped_image.save(output, format='PNG')
    output.seek(0)
    
    return send_file(output, mimetype='image/png')

Converting Image Format

@app.route('/convert/<filename>', methods=['POST'])
def convert_image(filename):
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    
    if not os.path.exists(image_path):
        return jsonify({"error": "File not found"}), 404

    try:
        image = Image.open(image_path)
    except Exception as e:
        return jsonify({"error": f"Error opening image: {str(e)}"}), 500

    format = request.json.get('format').lower()
    valid_formats = ["jpeg", "png", "bmp", "gif", "tiff"]
    if format not in valid_formats:
        return jsonify({"error": f"Unsupported format: {format}"}), 400

    try:
        output_path = os.path.join(app.config['RESULT_FOLDER'], f"{os.path.splitext(os.path.basename(image_path))[0]}.{format}")
        image.save(output_path, format=format.upper())
        return send_file(output_path, mimetype=f'image/{format}')
    except Exception as e:
        return jsonify({"error": f"Error converting image: {str(e)}"}), 500

Text Summarization

@app.route('/summarize', methods=['POST'])
def summarize():
    text = request.form['text']
    summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
    return render_template('text_analysis.html', summary=summary[0]['summary_text'],

 input_text_summary=text)

Keyword Extraction

@app.route('/keywords', methods=['POST'])
def extract_keywords():
    text = request.form['text']
    kw_extractor = yake.KeywordExtractor()
    keywords = kw_extractor.extract_keywords(text)
    keywords_list = [kw[0] for kw in keywords]
    return render_template('text_analysis.html', keywords=keywords_list, input_text_keywords=text)

Sentiment Analysis

@app.route('/sentiment', methods=['POST'])
def sentiment_analysis():
    text = request.form['text']
    blob = TextBlob(text)
    sentiment_score = blob.sentiment.polarity
    if sentiment_score > 0.1:
        sentiment = "Positive"
    elif sentiment_score < -0.1:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"
    return render_template('text_analysis.html', sentiment=f"{sentiment} ({sentiment_score})", input_text_sentiment=text)

Running the Application

Activate the virtual environment:

source venv/bin/activate  # On macOS/Linux
.\venv\Scripts\activate  # On Windows

Run the Flask application:
```
python app.py
```
Open your web browser and navigate to http://localhost:5000/.

Usage

Uploading Images

Click on the "Choose files" button and select one or more images to upload.
Click on the "Upload" button to upload the selected images.
View the details of the uploaded images such as filename, width, height, mode, and format.

Generating Color Histograms

Enter the filename of the uploaded image in the input box.
Click on the "Get Histogram" button to generate and display the histogram.

Generating Segmentation Masks

Enter the filename of the uploaded image in the input box.
Click on the "Get Segmentation Mask" button to generate and display the segmentation mask.

Resizing Images

Enter the filename of the uploaded image in the input box.
Enter the desired width and height in the respective input boxes.
Click on the "Resize" button to resize and display the image.

Cropping Images

Enter the filename of the uploaded image in the input box.
Enter the desired crop coordinates (left, top, right, bottom) in the respective input boxes.
Click on the "Crop" button to crop and display the image.

Converting Image Formats

Enter the filename of the uploaded image in the input box.
Select the desired format from the dropdown menu.
Click on the "Convert" button to convert and display the image.

Text Summarization

Enter the text to be summarized in the input box.
Click on the "Summarize" button to generate and display the summary.

Keyword Extraction

Enter the text for keyword extraction in the input box.
Click on the "Extract Keywords" button to generate and display the keywords.

Sentiment Analysis

Enter the text for sentiment analysis in the input box.
Click on the "Analyze Sentiment" button to generate and display the sentiment.

Text Search

Enter the text to be searched in the input box.
Enter the search term in the input box.
Click on the "Search" button to find and display the search result.

Text Categorization

Enter the text to be categorized in the input box.
Click on the "Categorize" button to generate and display the category.

Geo-y20/Corporatica--Task

Corporatica Full-Stack Software Engineering Task

Overview

Segmentation

Table of Contents

Time Allocation

Methodology

Development Methodology

Technology Selection Criteria

Technologies Used

Directory Structure

Installation and Setup

Prerequisites

Steps

Module Details

Tabular Data Processing

Overview

Routes and Endpoints

Detailed Code Snippets

Image Processing

Overview

Routes and Endpoints

Text Analysis

Overview

Routes and Endpoints

Detailed Code Snippets

Image Upload and Storage with Batch Processing

Generating Color Histograms

Generating Segmentation Masks

Image Manipulation Tasks

Text Summarization

Keyword Extraction

Sentiment Analysis

Running the Application

Usage

Uploading Images

Generating Color Histograms

Generating Segmentation Masks

Resizing Images

Cropping Images

Converting Image Formats

Text Summarization

Keyword Extraction

Sentiment Analysis

Text Search

Text Categorization

Sample of my work