For converting NCBI datasets downloaded from GDSbrowser to CSV files. Here is a list of tasks::
- Clean the data by removing all extra information
- Extract genes
- Extract chromosomes
- Merge the repetitive genes by calculating average
- Remove fully NULL columns
- Remove columns with ####at names
- Remove columns with --Control names
- Impute the data to fill the NAs
- Normalize samples of each class based on their median
This code extracts data with and without features' names, features' names, chromosomes names and store them into four CSV files, respectively.
Download the DataSet full SOFT file from the GDSbrowser and extract it in a new folder
Download and copy this code into the same folder
Run the code in RStudio or R environment
Four new CSV files will be created in the same folder containing data with and without features' names, features' names, and chromosomes names