An upgraded version of the previous crosstab generator
You can access the crosstab generator through the link below
🠊 https://invoke-analytics-crosstab-generator-v3.streamlit.app/
Crosstab is a two- (or more) dimensional table that is usually used in data analysis to uncover more granular insights from the data we gathered from the survey. It makes use of a statistical analysis known as Cross Tabulation analysis (or also known as Contingency Table analysis) to quantitatively evaluate the correlation between multiple variables. A typical cross-tabulation table compares two hypothetical variables, which are usually the survey question and the respondents' demographic (it can be either ethnicity, age group, gender, etc).
This project aims to expedite our crosstab generation process from long minutes of manual labour work using Excel pivot table to just a mere couple of seconds. With this crosstab generator, one just need to upload the weighted data file, wait for a couple of seconds and boom - the crosstab is set for you! By automating the crosstab generation process, we hope to divert the time and energy that are previously used for crosstab to other purposes, so that we can improve our overall survey work.
- .streamlit
Set the default theme to dark - README.md
Project documentation - generator.py
Main script to be deployed on streamlit that call the front-end functions. - component.py
Contains front-end composite function components that call back-end functions for streamlit. - photos
Contains INVOKE Analytics logo and INVOKE logo to be imported into generator.py - requirements.txt
List of the libraries and their respective versions required for the project to be read in streamlit. - tests
Contains scripts for unit testing, endpoint testing, and test files. - app
Contains all of the back-end functions, endpoints, schema, and requirements.txt for containerization. - app/crosstab_module
Contains all of the main functions to create the crosstabs. - app/utils_module
Contains all of the helper functions for the crosstabs generator. Also, contains the processor functions that call functions from crosstab.py and utils.py - app/chart_module
Contains all of the functions to create the clustered bar charts. - app/component_module
Contains all of the components; front-end component for streamlit and back-end component for crosstabs and charts. - app/schema.py
Pydantic schema for endpoint using FastAPI. - app/schema.py
Pydantic schema for endpoint using FastAPI. - app/endpoint.py
Contains crosstabs endpoints and chart endpoints, developed using FastAPI. - app/requirements_d.txt
List of the libraries and their respective versions required for the project to be installed in docker image.
.
├── LICENSE
├── README.md
├── app
│ ├── __init__.py
│ ├── chart_module
│ │ └── chart.py
│ ├── component_module
│ │ └── viz.py
│ ├── crosstab_module
│ │ └── crosstab.py
│ ├── endpoint.py
│ ├── requirements_d.txt
│ ├── schema.py
│ └── utils_module
│ ├── processor.py
│ └── utils.py
├── component.py
├── generator.py
├── photos
│ ├── invoke_icon.jpg
│ └── invoke_logo.png
├── requirements.txt
└── tests
├── backend_test.py
├── endpoint_test.py
├── test_chartgen.xlsx
└── test_crosstabs.csv
Some features that are available in the version 3 of crosstab generator:
1. Pre-selection on:
- Weight column. Automatically select the weight column if detected
- 5 basic demographic columns. Automatically select age group, gender, ethnicity group, income group and urbanity columns if detected
- Multiple answer questions. Automatically detect columns with keyword=[MULTI]
- Column to sort by name. Automatically detect columns with keyword=[LIKERT]
2. Automatic Column Sequence in Malay and English:
- Gender - sort from Male to Female/M to F/Lelaki to Perempuan/L to P
- Age group - sort from A-Z (ascending order)
- Ethnicity group - sort in the following sequence (Malay, Chinese, Indian, Bumiputera or Others) or (Melayu, Cina, India, Bumiputera or Lain-Lain)
3. Automatic Generate Clustered Bar Chart based on the Crosstab Tables:
- Automation of the clustered bar chart creation in reference to the generated crosstab table.
4. Modularize the function:
- Tidy up the functions for the crosstab and chart generator by putting in different modules.
5. Sort the order of crosstabs table:
- User can choose which column to sort by name [default sorting option is sorting by value].
6. REST API endpoint development:
- Developer can now deploy the endpoints of crosstabs generator and charts generator.
- Before uploading your Excel file, make sure that the demographic column name characters is less than 10 chars. Exceeding chars will lead to error due to Excel formatting.
- On Select column weight section choice,
Unweighted
choice has some unsettled bugs. Hence, if your dataset is Unweighted, create one extra column name weight with all value 1. Then choose this column name for this part.
Feedbacks for future development ideas:
1. Add chart options for user to choose:
- Add new charts features such as pie chart.
2. Add new demography in the demography sorter function for automation:
- State demography.
If you are working on this project, simply follow the guide below.
- Github Account
- Git Bash
- Visual Studio Code
- Streamlit Cloud (Only for deployment stage)
-
Github Account
If you are reading this document, that means you already have a Github account. Congratss!! 🥳🎉 But if you simple want to create a new account, click here for more information about it. -
Git Bash
If you have never heard of it before, Git Bash is a source control management system for Windows that allows users to type Git commands (such as git clone and git commit) which we will use a lot in this project. To install it, download the Git Bash setup from the official website: https://git-scm.com/ -
Visual Studio Code
This is the suggested IDE for this project. The reason for this is because Visual Studio Code works seamlessly with Git since there is a Git Bash extension that you can easily install in it. You can go to this page to download Visual Studio Code that matches your operating system. -
Streamlit Cloud (Only for deployment stage)
There are a lot of public cloud platforms out there that you can use to deploy your Streamlit app. However, in this project, we use Streamlit Cloud since it is free and easy to manage. You need to create a Streamlit Cloud account in order to deploy a new Streamlit app as well as to monitor other Streamlit apps in our existing Invoke Analytics repositories. To create an account, simply sign up once you click this link.
Once you have met all of the prerequisites and completed the installations, you can now start working on the project.
-
Firstly, fork this repository. Make sure the owner of the repository is INVOKE-Solutions.
-
Create a new folder on your local computer.
-
Open Visual Studio Code, click Open Folder and choose the folder that you just created.
-
Open Git Bash terminal.
-
Clone your forked repository into the folder by applying the git command below.
git clone your-forked-repo-url
-
Now, a copy of all the files should appear on your folder. The next step is to create a separate version of the repository that is usually called branch. This will be the place where you will be working on your code. To do this, go to the python file (generator.py) by using
cd generator.py
-
To create a new branch, simply type
git checkout -b branch-name
-
Congrats!! Now you are in your newly isolated branch. You can freely edit your code over here.
-
After you have finished editing the code, it is now time to push it into your forked repository. You can do that firstly by performing the two lines below
git add generator.py git commit -m 'your-message'
-
The next step would be to update your code in the local main. You can go to your local main by using
git checkout main
-
After that, you can update the local main by merging the main with the branch. You can use the code below.
git merge branch-name
-
Boom!! 🎊 Now, your code has been updated into the local main. You just have one step left in your bucket. To finish off, you need to push the code from your local computer to your forked Github repository (remote). Just write
git push