
Tutorial about machine learning for autism researchers

Title: A tutorial on interpretable machine learning algorithms for understanding factors related to childhood autism

Abstract: machine learning is a research area in computer science which is concerned with algorithms which learn from large data sets. For example, the National Survey of Children’s Health (NSCH) is a survey that results in a large data set that can be used with machine learning – can we predict if the child has autism, based on the other survey responses? How accurately can we predict? And what other survey responses are most useful for prediction? In this tutorial, I will show how machine learning can be used to answer these questions.

Slides: https://github.com/tdhock/2024-01-ml-for-autism/blob/main/HOCKING-slides-2024-02-26-ml-for-autism.pdf

See also code https://github.com/vas235/ASG3-machine-learning-prep from Vince which treats more than two years, and standardizes some variables between the years, using a JSON config file.

26 Mars 2024

figures-same-other/ contains CSV and figures to show that it is not just size that matters.


26 Feb 2024

HOCKING-slides-2024-02-26-ml-for-autism.tex makes HOCKING-slides-2024-02-26-ml-for-autism.pdf slides with new drawings

drawing-cv-feature-sets.svg makes drawing-cv-feature-sets.pdf

drawing-cv-same-other-years.svg makes drawing-cv-same-other-years-1.pdf drawing-cv-same-other-years-2.pdf drawing-cv-same-other-years-3.pdf drawing-cv-same-other-years-4.pdf

23 Feb 2024

download-nsch-mlr3batchmark.R launches jobs, here is a preliminary analysis of how much time and memory they take:

Looks like ranger is by far the slowest and more memory intensive, so for now I will omit that.

Below we see that total time for CV experiment with 2700 iterations is 240 hours, so since we did this in a 4 hour time limit, this is about 60x speedup.

2700: 3.194722222  1810.023 classif.nearest_neighbors     all.364
> sum(usage.long$hours)
[1] 240.7103
> sum(usage.long$hours)/4
[1] 60.17757

22 Feb 2024

download-nsch-convert-do.R makes download-nsch-convert-do-2019-2020.csv

> out.dt[, table(survey_year, Autism)]
survey_year   Yes    No
       2019   859 28003
       2020  1255 40826

download-nsch-counts.R separated out from download-nsch.R

18 Dec 2023

https://docs.google.com/spreadsheets/d/19Tm75T4wNN4yITlXuUMNVc22yzHmmzVcMY1GBVGsEnQ/edit#gid=0 is the source file for NSCH_categories.csv

download-nsch.R makes download-nsch-nrow-ncol.csv and download-nsch-column-counts.csv and NSCH_categories_NA_counts.csv after which I manually added different categories for the least missing columns, NSCH_categories_NA_counts_TDH.csv