maxxim333
Interested in Biomedical Sciences, Genomics, Bioinformatics, Data Science and Machine Learning
Pinned Repositories
Analysis_of_Yeast_mRNA
This was an exercise to retreive information from mRNA data of yeast genome annotation using Bash and construct a couple of graphs using R
API_UniProt
In this work I use API to retrieve information of a customized list of proteins from the famous protein database UniProt.
drug-induced_morphology_changes_quantification
I treated iPSC with FDA approved drugs, stained them with neurite and nuclear markers and obtained confocal fluorescent images. Now, the goal is to quantify the change of neurites in each condition, namely 3 parameters: amount of trunk-branches (neurite branching from the body of the cell), non-trunk branching and mean length of neurites.
Genomic_Data_Analysis
My biggest bioinformatic venture yet. This is an integrated bottom-up analysis of Genomic data, simulating a real bioinformatic pipeline starting from raw transcriptome sequencing data. Three programming languages are used: shell, python2.7 and R 3.5.1.
Methylome_Data_Wrangling
Exercise on Data Wrangling using dplyr and tibble libraries of R
Neural_Networks
This work performs two different types of Neural Networks training and 5-fold CV, each one on three datasets that differ in three of the attributes. It uses 100 iterations with random seeds and after all iterations are completed, it calculates the average of different performance metrics
PCA_SVM
PCA and faces recognition using SVM
random_forest_hyperparameters_tunning
In this work I use Random Forest tool on two datasets and find the optimal hyperparameters for the algorithm on both datasets and compare their performance.
RNAseq_analysis
Exercise of RNAseq alignment against reference, creation of bam, bigwig, RPKM calculation and samples comparison
upm
maxxim333's Repositories
maxxim333/Genomic_Data_Analysis
My biggest bioinformatic venture yet. This is an integrated bottom-up analysis of Genomic data, simulating a real bioinformatic pipeline starting from raw transcriptome sequencing data. Three programming languages are used: shell, python2.7 and R 3.5.1.
maxxim333/drug-induced_morphology_changes_quantification
I treated iPSC with FDA approved drugs, stained them with neurite and nuclear markers and obtained confocal fluorescent images. Now, the goal is to quantify the change of neurites in each condition, namely 3 parameters: amount of trunk-branches (neurite branching from the body of the cell), non-trunk branching and mean length of neurites.
maxxim333/upm
maxxim333/Analysis_of_Yeast_mRNA
This was an exercise to retreive information from mRNA data of yeast genome annotation using Bash and construct a couple of graphs using R
maxxim333/API_UniProt
In this work I use API to retrieve information of a customized list of proteins from the famous protein database UniProt.
maxxim333/Methylome_Data_Wrangling
Exercise on Data Wrangling using dplyr and tibble libraries of R
maxxim333/Neural_Networks
This work performs two different types of Neural Networks training and 5-fold CV, each one on three datasets that differ in three of the attributes. It uses 100 iterations with random seeds and after all iterations are completed, it calculates the average of different performance metrics
maxxim333/PCA_SVM
PCA and faces recognition using SVM
maxxim333/random_forest_hyperparameters_tunning
In this work I use Random Forest tool on two datasets and find the optimal hyperparameters for the algorithm on both datasets and compare their performance.
maxxim333/RNAseq_analysis
Exercise of RNAseq alignment against reference, creation of bam, bigwig, RPKM calculation and samples comparison
maxxim333/Assignment3_ruby
maxxim333/Assignment4
maxxim333/assignment5_QueriesSPARQL
maxxim333/Biomarkers
Using R, biomarkers were defined selecting the most relevant genes after a PCA on a data from a table of gene expression VS cell-type.
maxxim333/Collinear_Points
Here I am learning python and pyspark in an exercise where I have to create a script that from an input consisting in set of points, returns groups of collinear points.
maxxim333/Files_reprocessing_and_Multiple_Sequence_Alignment
Here I perform a iterative MSA (with MUSCLE) on FASTA sequences of many files. The shell script does a bioinformatic reprocessing so the input is correct.
maxxim333/fluorescence_tracking
Plotting and Fourier transformation of cell fluorescence tracking VS time
maxxim333/Neural_Networks_2
This is a work similar to the in maxxim333/Neural_Networks repository. This time, after cross-validation and creation of .rbf files, "sensitivity" parameter is calculated INDIVIDUALLY for each gene from a custom list of genes. This is done for both datasets and then a scatter plot is made to compare performance of each gene in each of the datasets.
maxxim333/orthologs_vs_evolutionary_distance
The goal of this work is to find what relationship does the evolutionary distance of a specie from Homo Sapiens have on the number of orthologs that appear for that specie for any of the 6000 clinically relevant human genes. In this exercise, I work with .nwk files to retreive evolutionary distances between a set of proteins of different species using PhyloTree module of Python, launch a shell script to get the orthologues of a subset oh about 6000 human genes, create a file where for each specie, a number of times the protein belonging to that specie appears in the ortholog list in Homo Sapiens and finally launch and R script that creates a plot and linear regression showing the negative correlation between evolutionary distance and number of orthologs.
maxxim333/pandas
This is an exercise I made as a preparation for an interview for a position in a financial data. The company's main specific requirement was knowledge of pandas module, so I made as much things in pandas a possible.
maxxim333/Python_Health_Data
Training Python applied to Biostatistics
maxxim333/rcommander
Performing Factor Analysis and Correspondence Analysis using R Commander and FactorMineR package
maxxim333/Ruby_Assignment1
Simulating planting seed and updating databases
maxxim333/ruby_assignment2
Intensive integration using Web APIs - Construction of protein interaction network
maxxim333/scooter_data_analysis
This is an exercise I helped a friend to do for his interview. Since all this code is mine, I feel I'm entitled to post it here. It was for a data analyst job in a scooter-renting company. The input file they provided was a csv with logs of users and timestamps of their activities. The goal was to create "Sessions" of each users in which each session is defined by certain characteristics. Then I needed to find out the "peak of demand" for scooters defined by the time period where there is the biggest overlap of sessions. The images I provide are divided in two and there are two .py files. One takes as an input the csv table provided by the interviewers and generates a file with session ids and the other takes this file as input and outputs a sorted array of timestamps and their corresponding amount of overlapping sessions. The first element is the answer to the problem: the time of the year with the most demand of scooters.
maxxim333/Spatial-Statistics
Analyzing a cross-section of a stomach mucous membrane (a dataset with spatial characteristics) using base R and spatstat package
maxxim333/Transcriptome-Analysis-and-PCA
Using R, transcriptome of WT and mutant (with and without compensation) cells are analysed and PCA is performed to find genes affecting the global transcriptome the most.
maxxim333/visited_places
I wanted a resource where I can annotate places I visited/lived in the world but every site either uses it spy on you or eventually asks for money, so I programmed one in R. It takes as an argument a file with coordinates and creates a map with points.
maxxim333/work