An upgraded version of the previous crosstab generator, designed to streamline and automate the crosstab generation process for survey data analysis.
Access the Crosstab Generator here: Crosstab Generator Version 3
Crosstab, short for cross-tabulation, is a two- (or more) dimensional table commonly used in data analysis to uncover deeper insights from survey data. It uses a statistical method called Cross Tabulation Analysis (or Contingency Table Analysis) to quantitatively evaluate the relationships between multiple variables. Typically, a crosstab compares variables such as survey questions and respondents' demographics (e.g., ethnicity, age group, gender) to explore correlations and patterns in the data.
This project significantly reduces the time and manual effort required for crosstab generation. Traditionally, creating crosstabs involved extensive work using Excel pivot tables, but this generator automates the process, allowing you to produce crosstabs in just a few seconds. By simply uploading the weighted data file, the crosstab is automatically generated, freeing up valuable time and resources for more impactful tasks in your survey work.
-
Automatic Pre-Selection:
- Weight Column: Automatically detects and selects the weight column.
- Basic Demographics: Automatically identifies and selects columns for age group, gender, ethnicity, income group, and urbanity if present.
- Multiple Answer Questions: Detects columns with keywords like
[MULTI]
. - Sorting Columns: Identifies columns with keywords like
[LIKERT]
for sorting.
-
Automated Column Sequencing (Malay and English):
- Gender: Sorts gender data from Male to Female/M to F/Lelaki to Perempuan/L to P.
- Age Group: Sorts in ascending alphabetical order.
- Ethnicity Group: Orders categories as Malay, Chinese, Indian, Bumiputera, or Others (Melayu, Cina, India, Bumiputera, Lain-Lain).
-
Clustered Bar Chart Generation:
- Automatically generates clustered bar charts based on the generated crosstab tables.
-
Modularized Functions:
- Organizes crosstab and chart generation functions into distinct modules for better maintainability and readability.
-
Custom Sorting Options:
- Allows users to sort the crosstab table by column name or values, with the default being value sorting.
-
REST API Endpoints:
- Deployable API endpoints for both crosstab and chart generation, developed using FastAPI.
- .streamlit: Configures Streamlit's default theme to dark mode.
- README.md: Documentation for the project.
- generator.py: Main script deployed on Streamlit, calling front-end functions.
- component.py: Contains front-end components that interface with back-end functions in Streamlit.
- photos: Contains the INVOKE Analytics and INVOKE logos used in
generator.py
. - requirements.txt: Lists libraries and their versions required for the project.
- tests: Includes scripts for unit and endpoint testing, along with test files.
- app: Contains all back-end functions, endpoints, schemas, and requirements for containerization.
- app/crosstab_module: Main functions for creating crosstabs.
- app/utils_module: Helper functions for the crosstab generator, including data processing functions.
- app/chart_module: Functions for creating clustered bar charts.
- app/component_module: Components for both front-end (Streamlit) and back-end (crosstabs and charts).
- app/schema.py: Pydantic schemas for API endpoints using FastAPI.
- app/endpoint.py: Defines crosstab and chart API endpoints with FastAPI.
- app/requirements_d.txt: Libraries and versions required for the Docker image.
.
├── LICENSE
├── README.md
├── app
│ ├── __init__.py
│ ├── chart_module
│ │ └── chart.py
│ ├── component_module
│ │ └── viz.py
│ ├── crosstab_module
│ │ └── crosstab.py
│ ├── endpoint.py
│ ├── requirements_d.txt
│ ├── schema.py
│ └── utils_module
│ ├── processor.py
│ └── utils.py
├── component.py
├── generator.py
├── photos
│ ├── invoke_icon.jpg
│ └── invoke_logo.png
├── requirements.txt
└── tests
├── backend_test.py
├── endpoint_test.py
├── test_chartgen.xlsx
└── test_crosstabs.csv
- Ensure that demographic column names in your Excel file do not exceed 10 characters to avoid Excel formatting errors.
- The
Unweighted
option in the Select column weight section has some unresolved bugs. For unweighted datasets, create an extra column namedweight
with all values set to 1, and select this column as the weight.
-
Add Chart Options:
- Introduce additional chart types such as pie charts.
-
Expand Demographic Sorting:
- Include additional demographics like state in the automated sorter.
Follow these steps to set up and contribute to the project.
- GitHub Account
- Git Bash
- Visual Studio Code
- Streamlit Cloud (for deployment)
-
GitHub Account: Ensure you have a GitHub account. If not, create one here.
-
Git Bash: Git Bash allows you to execute Git commands (e.g., clone, commit). Download it from the official Git site.
-
Visual Studio Code: Recommended IDE for this project. It integrates well with Git and has an extension for Git Bash. Download it here.
-
Streamlit Cloud: We use Streamlit Cloud for deployment due to its ease of use and cost-effectiveness. Sign up here to manage your Streamlit apps.
-
Fork the Repository: Ensure you fork the repository from INVOKE-Solutions.
-
Create a Local Folder: Set up a folder on your local machine.
-
Open with Visual Studio Code: Open the newly created folder in Visual Studio Code.
-
Clone the Repository: Use Git Bash to clone your forked repository into the folder:
git clone <your-forked-repo-url>
-
Create a New Branch: Navigate to the main Python file (
generator.py
):cd generator.py
Create a new branch for your work:
git checkout -b <branch-name>
-
Make Changes: Edit the code as needed in your new branch.
-
Commit Changes: After editing, stage and commit your changes:
git add generator.py git commit -m 'Description of your changes'
-
Update Local Main: Switch to the main branch and merge your changes:
git checkout main git merge <branch-name>
-
Push to Remote: Push your changes from your local main to your GitHub repository:
git push
- Amer Wafiy - Original Author
- Zabir Azreen - Crosstabs V2
- Sim Lin Zheng - Tablo
- Safwan Shamsir - Crosstab V3