MBTarray

software package for array based molecular blood group typing

compiling the source code

Install the required tools

sudo apt-get install build-essential

Please make sure you have the static versions of glibc and stdlibc installed, too. What you have to install depends on what OS you are running. Eventually you need the static versions of glibc and libstdc++

# CentOS
sudo yum install glibc-static libstdc++-static -y
# Ubuntu
sudo apt-get install libc6-dev
# ...

Build MyTools
MyTools is a collection of usefull cpp classes/methods and must to be compiled before compiling the other cpp projects

cd MyTools/
make all
cd ..

the library can be found under: dist/Release/GNU-Linux/

Build bloodArray
bloodArray reads FinalReport-files and returns a tab delimited table with the blood group alleles. output goes to stdout, log goes to stderr. This is experimental coding and it grew over the time with a lot of copy and paste. If you are interested in this software part, you should take the logic from the phenotype() functions but reprogram it in a language of your choice ...

cd bloodArray/
make all
cd ..

the executable can be found under: dist/Release/GNU-Linux/
run like:
bloodArray FinalReport1.txt [FinalReport2.txt ... FinalReportN.txt]

Build FinalReportToEvoker
FinalReportToEvoker generates evoker file(s) from FinalReport-files.

cd FinalReportToEvoker/
make all
cd ..

the executable can be found under: dist/Release/GNU-Linux/

Evoker file format usually consists of binary plink files plus an extra file with the Allele-AB intensities in binary format. We need these intensities for the TensorFlow classifier. With FinalReportToEvoker we generate a fam file (sample annotation), bim file (SNP annotation) and a bnt file (the Allele-AB intensities). We do not generate the bed file (Genotypes in binary plink format) as we do not need the genotypes.
FinalReportToEvoker does no parameter evaluation. If something is wrong, it crashes without meaningful messages. Please run like:
FinalReportToEvoker consider_those.csv OUTPUFAMFILE OUTPUTBIMFILE OUTPUTBNTFILE FINALREPORTFILE1 [FINALREPORTFILE2 ... FINALREPORTFILEN]

consider_those.csv is a text file with two comma separated columns. One column with the antigen/allele and the other column with the required probe_set_ids. The file can be found in the folder DeepBloodArray.

DeepBloodArray

DeepBloodArray is a python project that needs an appropriate environment. the two main files are:

trainAndEvaluate.py trains a new classifier
classifyFinalReports.py takes a final report as input and returns a json file with the blood group alleles

The folder Models contains pre-trained models.

init the environment

Install miniconda3
environment setup:

conda create -n MyEnvName
conda activate MyEnvName
conda install -c conda-forge tensorflow scikit-learn pandas -y

Test

direct typing using the executable bloodarray

cd MBTarray

bloodArray/dist/Release/GNU-Linux/bloodarray DeepBloodArray/test/FinalReport1.txt

returns:

Sample_ID	filename	ABO	Rh	Lutheran	Kell	Duffy	Kidd	Diego	Yt	Scianna	Dombrock	Colton	Landsteiner-Wiener	CROM	Knops	JR	LAN	Vel	IndianMNS	Rh
Sample_ID	filename	ABO	RH	LU	KEL	FY	JK	DI	YT	SC	DO	CO	LW	CROM	KN	JR	LAN	VEL	Indian	MNS	RH
Sample_ID	filename	ABO	RHD	BCAM	KEL	ACKR1	SLC14A1	SLC4A1	ACHE	ERMAP	ART4	AQP1	ICAM4	CD55	CR1	ABCG2	ABCB6	SMIM1	CD44	GYPA,GYPB	RHCE
Sample_ID	filename	001	004	005	006	008	009	010	011	013	014	015	016	021	022	032	033	034	023	002	004
pseudoID	FinalReport1.txt	A	D.	Lu(a-b+),Au(a-b+),Lu8+Lu14-	kk,Kp(a-b+),Js(a-b+)	Fy(a-b+)	Jk(a-b+)	Di(a-b+),Wr(a-b+)	Yt(a+b-)	Sc1+Sc2-	Do(a+b+),Hy+,Jo+	Co(a+b-),	LW(a+b-)	Cr(a+),Tc(a+b-c-)	Kn(a+b-),McC(a+b-),Vil-	Jr(a+)	Lan+	Vel+	#N/A

generating evoker files from FinalReports

FinalReportToEvoker/dist/Release/GNU-Linux/finalreporttoevoker DeepBloodArray/consider_those.csv out.fam out.bim out.bnt DeepBloodArray/test/FinalReport1.txt

should generate the three output files out.fam, out.bim, out.bnt

molecular blood group typing, main script.

Does the job. It uses the two before testet executables and runs the classifier. Finally it generates a json output

# if your conda environment is not activated, please activate it with
conda activate MyEnvName

run the script:

python3 DeepBloodArray/classifyFinalReports.py

should create the result file data_sampleID.json

Test the training function of the classifier

python3 DeepBloodArray/trainAndEvaluate.py --output /your/output/directory

this should run the training of the classifiers for different antigens ['c','C','e','E','M','N','s','S']
The output directory will contain the trained models (*.mdl) and different plots. An overview plot of the score distribution of the validation samples (20% of all samples) and scatter plots for every SNPs with allele_AB intensities and a color-code that shows to which group the corresponding sample belongs.

marchoeppner/BloodTypingArray