Breast cancer is the most common invasive cancer in women and the second leading cause of cancer death in women after lung cancer. Advances in screening and treatment for breast cancer have improved survival rates dramatically since 1989. According to the American Cancer Society (ACS), there are more than 3.1 million breast cancer survivors in the United States. The chance of any woman dying from breast cancer is around 1 in 38 (2.6%).
The machine learning methodology has long been used in medical diagnosis. in this project features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
- Read the data
- Perform exploratory analysis on it
- Extract features and scale the extracted feature
- Split the data into training and hold-out set
- Create casual graph using different technique
- Examine the model performance based on the graph
The python packages required to do the project are listed in the requiremnt.txt file and can be easily installed by running the setup.py Python file.
Skills:
- Modeling a given problem as a causal graph
- Statistical Modelling and Inference Extraction
- Building model pipelines and orchestration
Knowledge:
- Knowledge about casual graph and statistical learning
- Hypothesis Formulation and Testing
- Statistical Analysis
- https://michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
- https://arxiv.org/pdf/2011.04216.pdf
- http://web.math.ku.dk/~peters/jonas_files/mitTutorialJonas.pdf
- https://link.springer.com/chapter/10.1007/978-981-15-7205-0_10
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.301.1824&rep=rep1&type=pdf