About me

Me & Biwi in California enjoying Lake Tahoe.

During my years as a student of Science and Mathematics during my High School years back in India, Biology interested me apart from Computers. The first aspect of Biology that interested me was Genetics. It was all so interesting and fascinating and at that point of time, and I never knew that this fascination of mine towards both biology and computers will drive me towards Bioinformatics, the supposed jargon at that time. It was through internet that I got to know about Bioinformatics and its application. Those made me pretty much sure about my future path. I started reading about the Human Genome project, g_enes_, genomes and it all started attracting me.

What better a platform would I have got at that time than pursuing a Bachelor’s degree in Bioinformatics at D Y Patil University, which not only made me aware of different facets of bioinformatics but also strengthened my Mathematics and Computer Science skills. The first computer language I learnt was C and it really changed my approach towards programming. Though a newbie at programming at that time, I always had friends with whom I would discuss better programming practices.

It was during the sophomore year we went for my first curriculum training at Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India on GENES to DRUGS: In-Silico Drug Discovery, It was a 25 days training where we had a hands on experience on SGI machines using IRIX with different tools related to bioinformatics such as GCG, Insight II, Cerius 2 and Catalyst all part of the Accelrys suite for molecular modelling. Here we came face to face with computational biology an aspect of bioinformatics that we thought was all limited to Rasmol, Blast, Fasta, Homology modeling, searching Protein Data Bank, Pfam etc. Later during the same we wrote Perl programs, accessed Biological Databases, worked on various bioinformatics tools and developed small applications using Perl. We learned MS Access through a project for our course on Airlines Reservation system. It was all exciting, not only developing my skills in wet-lab techniques like DNA extraction, developing cell media, cell plating and many more but also sharpening my programming skills.

While doing my senior design project at Bhabha Atomic Research Center (BARC), Mumbai, India, we had an opportunity to work with senior scientist which explained me the very importance of understanding the basics for whatever you learn, at BARC we did a wet lab project on Characterization of Fusarium Monoliforme, which we combined with bioinformatics by developing C++ program to identify optimal fungal growth determined by the data obtained from the experimental results. By then, I had decided that I want to learn more in this field and, that this is just the start. I started looking for graduate courses in India but could find no satisfactory courses at all and I decided to fly to United States of America for my higher studies.

Important attributes of a successful data scientist include the ability to advance human knowledge, understanding by providing meaningful insights from the coarse data available at hand. I was fortunate to imbibe these traits in the department of Bioinformatics and Computational Biology at George Mason University. My Master’s program was a great learning curve. Not only I expanded my interests into computational biology but also towards software engineering component of the field. From beinginvolved in projects for building a database of Human Genes with HUGO Gene Nomenclature using MySQL and providing it a web interface for searching it using Perl and CGI. Doing protein-protein structure alignment using Python and doing a survey of different available short read sequence assemblers with respect to the data produced by the sequencing technology at that time which was from Solexa. We worked on SVM based method (kernel) to assess the reliability of protein-protein interactions. We learned R for statistics through our course of Research Methods. We also worked on Bovine dataset to understand about the variation within a given species and analysis using SNP Genotypes for determining the linkage disequilibrium, haplotypes, and signatures of selection. We helped to design the database for Bovine Hapmap Project and creating a web based program for displaying various data types by querying the database. For master’s project we worked on evolutionary analysis of dopamine d2 receptors, particularly looking at the intra-cellular loop via multiple sequence alignment and creating and comparing phylogentic trees for the differences in evolution using TOPD/FMTS and Rate4Site.

After my Master’s in Bioinformatics and Computational Biology, I identified my research interest in broad filed of Genomics. First working as a contractor for Monsanto Co, I helped developing frameworks to analyze sequencing data generated from different sequencing technologies, performing genome assembly, working towards making draft genomes, then working as Computational Biologist for Memorial Sloan Kettering Cancer Center, doing analysis of cancer specific sequencing data to gain meaningful insight in patients tumor molecular profile. As independent research analyst for the past five years, much of my focus has been to do develop, automate and improve the framework to do in depth analysis of the targeted sequencing data produced by Illumina based machines for clinical as well as research projects, leading to publications (Google Scholar) in various aspect of Cancer Genomics research. In addition to a background in research, I have sought out leadership and management experience. For the past two years I have grown to appreciate my role both as mentor and collaborator while advising high school students through their summer internship projects.

For my role as Director of Bioinformatics at Northwell Health (NH) working in Feinstein Institute for Medical Research (FIMR), I helped them evaluate there current infrastructure short comings w.r.t to doing large scale genomics projects. Helped them develop and deploy a small cluster to analyze genomics data untill we were able to build a large hybrid system.

As Manager, Computational Biology at MSKCC in Center for Molecualar Oncology, I oversees a team of Analysts and Software Developers who develop, maintain, and operate bioinformatics pipelines. We also perform collaborative research with other labs and clinicians both within MSKCC and in the broader research community. On a daily basis, we analyze blood samples from patients with tissue-based cancers, as well as patients’ circulating tumor DNA, avoiding the need for tumor biopsies. More specifically, I lead the team in designing, developing and implementing software tools for processing and analyzing high through-put, next generation sequencing data, specifically for liquid biopsy applications.

rhshah/rhshah.github.io

About me