software package for array based molecular blood group typing
- Install the required tools
sudo apt-get install build-essential
Please make sure you have the static versions of glibc and stdlibc installed, too. What you have to install depends on what OS you are running. Eventually you need the static versions of glibc and libstdc++
# CentOS
sudo yum install glibc-static libstdc++-static -y
# Ubuntu
sudo apt-get install libc6-dev
# ...
- Build MyTools
MyTools is a collection of usefull cpp classes/methods and must to be compiled before compiling the other cpp projects
cd MyTools/
make all
cd ..
the library can be found under: dist/Release/GNU-Linux/
- Build bloodArray
bloodArray reads FinalReport-files and returns a tab delimited table with the blood group alleles. output goes to stdout, log goes to stderr. This is experimental coding and it grew over the time with a lot of copy and paste. If you are interested in this software part, you should take the logic from the phenotype() functions but reprogram it in a language of your choice ...
cd bloodArray/
make all
cd ..
the executable can be found under: dist/Release/GNU-Linux/
run like:
bloodArray FinalReport1.txt [FinalReport2.txt ... FinalReportN.txt]
- Build FinalReportToEvoker
FinalReportToEvoker generates evoker file(s) from FinalReport-files.
cd FinalReportToEvoker/
make all
cd ..
the executable can be found under: dist/Release/GNU-Linux/
Evoker file format usually consists of binary plink files plus an extra file with the Allele-AB intensities in binary format. We need these intensities for the TensorFlow classifier. With FinalReportToEvoker we generate a fam file (sample annotation), bim file (SNP annotation) and a bnt file (the Allele-AB intensities). We do not generate the bed file (Genotypes in binary plink format) as we do not need the genotypes.
FinalReportToEvoker does no parameter evaluation. If something is wrong, it crashes without meaningful messages. Please run like:
FinalReportToEvoker consider_those.csv OUTPUFAMFILE OUTPUTBIMFILE OUTPUTBNTFILE FINALREPORTFILE1 [FINALREPORTFILE2 ... FINALREPORTFILEN]
consider_those.csv is a text file with two comma separated columns. One column with the antigen/allele and the other column with the required probe_set_ids. The file can be found in the folder DeepBloodArray.
DeepBloodArray is a python project that needs an appropriate environment. the two main files are:
- trainAndEvaluate.py trains a new classifier
- classifyFinalReports.py takes a final report as input and returns a json file with the blood group alleles
The folder Models contains pre-trained models.
- Install miniconda3
- environment setup:
conda create -n MyEnvName
conda activate MyEnvName
conda install -c conda-forge tensorflow scikit-learn pandas -y
cd MBTarray
bloodArray/dist/Release/GNU-Linux/bloodarray DeepBloodArray/test/FinalReport1.txt
returns:
Sample_ID filename ABO Rh Lutheran Kell Duffy Kidd Diego Yt Scianna Dombrock Colton Landsteiner-Wiener CROM Knops JR LAN Vel IndianMNS Rh
Sample_ID filename ABO RH LU KEL FY JK DI YT SC DO CO LW CROM KN JR LAN VEL Indian MNS RH
Sample_ID filename ABO RHD BCAM KEL ACKR1 SLC14A1 SLC4A1 ACHE ERMAP ART4 AQP1 ICAM4 CD55 CR1 ABCG2 ABCB6 SMIM1 CD44 GYPA,GYPB RHCE
Sample_ID filename 001 004 005 006 008 009 010 011 013 014 015 016 021 022 032 033 034 023 002 004
pseudoID FinalReport1.txt A D. Lu(a-b+),Au(a-b+),Lu8+Lu14- kk,Kp(a-b+),Js(a-b+) Fy(a-b+) Jk(a-b+) Di(a-b+),Wr(a-b+) Yt(a+b-) Sc1+Sc2- Do(a+b+),Hy+,Jo+ Co(a+b-), LW(a+b-) Cr(a+),Tc(a+b-c-) Kn(a+b-),McC(a+b-),Vil- Jr(a+) Lan+ Vel+ #N/A
FinalReportToEvoker/dist/Release/GNU-Linux/finalreporttoevoker DeepBloodArray/consider_those.csv out.fam out.bim out.bnt DeepBloodArray/test/FinalReport1.txt
should generate the three output files out.fam, out.bim, out.bnt
Does the job. It uses the two before testet executables and runs the classifier. Finally it generates a json output
# if your conda environment is not activated, please activate it with
conda activate MyEnvName
run the script:
python3 DeepBloodArray/classifyFinalReports.py
should create the result file data_sampleID.json
python3 DeepBloodArray/trainAndEvaluate.py --output /your/output/directory
this should run the training of the classifiers for different antigens ['c','C','e','E','M','N','s','S']
The output directory will contain the trained models (*.mdl) and different plots. An overview plot of the score distribution of the validation samples (20% of all samples) and scatter plots for every SNPs with allele_AB intensities and a color-code that shows to which group the corresponding sample belongs.