SG-1: Dashboard for Principal Component Analysis
Closed this issue · 3 comments
In this issue, we will use it to create a dashboard of the Principal Component Analysis App since that is our most commonly used feature.
Theory:
Principal Component Analysis can be used to discover hyper-parameters necessary for machine learning to make distinguished features that are intuitive to us as scientists. Our tool aims to help scientists to help discover the best hyperparameters for their chemical data set.
Software Demo
To install it:
!pip install -q global-chem[cheminformatics] --upgrade
To run it:
from global_chem import GlobalChem
from global_chem_extensions import GlobalChemExtensions
gc = GlobalChem()
cheminformatics = GlobalChemExtensions().cheminformatics()
gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
smiles_list = list(gc.get_node_smiles('emerging_perfluroalkyls').values())
mol_ids = cheminformatics.node_pca_analysis(
smiles_list,
morgan_radius = 1,
bit_representation = 512,
number_of_clusters = 3,
number_of_components = 0.95,
random_state = 0,
principal_component_x = 0 ,
principal_component_y = 1 ,
x_axis_label = 'PC1',
y_axis_label = 'PC2',
plot_width = 500,
plot_height = 500,
title = '',
save_file=False,
return_mol_ids=True,
save_principal_components=True,
)
Any problems let me know. Try running it solo on your local machine and you can do it in a jupyter notebook or google colab.
The problem
We would like the user to be able to input a list of SMILES:
smiles_list = ['CCC', 'CC', 'CCCCC' ]
>>> input goes into the function
mol_ids = cheminformatics.node_pca_analysis(
smiles_list,
morgan_radius = 1,
bit_representation = 512,
number_of_clusters = 3,
number_of_components = 0.95,
random_state = 0,
principal_component_x = 0 ,
principal_component_y = 1 ,
x_axis_label = 'PC1',
y_axis_label = 'PC2',
plot_width = 500,
plot_height = 500,
title = '',
save_file=False,
return_mol_ids=True,
save_principal_components=True,
)
<<< Interactive Plot Comes out
And that's it. That will get us our first milestone.
To begin developing the dashboard infrastructure I will bounce ideas off you. The first idea I had for infrastructure was this:
https://towardsdatascience.com/creating-a-better-dashboard-with-python-dash-and-plotly-80dfb4269882
I think to being working on the code I suggest perhaps clone the repository and then make your own directory called dashboard
it can be a standalone directory at the top of the repo because it is an important feature.
Let me know any thoughts or initial questions. I'm around.
@LadyBluenotes This will do the job in terms of the index.html file. Download it. You click "view raw" and then save the file as index.html.
https://github.com/Sulstice/CannabisSativa
I'm wondering if we ever need to edit that file or can we just have a box inside the dashboard and say put all this html in that box.
Input
smiles_list = ['C1=CC=CC=C1', 'CCCC', 'CCCCC']
For the next steps:
1.) Trigger the GIthub Action Public Workflow via a button from the front-end website.
- Repo: https://github.com/Global-Chem/workers
- Workflow to be Triggered. https://github.com/Global-Chem/workers/blob/main/.github/workflows/pca_analysis.yml
2.) Can you pull a file from the github actions rest api from the server that spawned the job or do we have to figure out how to pass a string of HTML back to the front-end via an endpoint of some kind .
Documentation
https://docs.github.com/en/rest?apiVersion=2022-11-28
Another task is to move the front-end into the GlobalChem Organization.