/Drosophila-genetic-mapping

To map Drosophila melanogaster gene region(s) containing resistance to infection by parasitic wasp Leptopilina boulardi.

Primary LanguageR

Drosophila-genetic-mapping

Goal: To map Drosophila melanogaster gene region(s) containing resistance to infection by parasitic wasp Leptopilina boulardi. Previous controlled crosses identified a resistance factor or chromosome 2. First step was broad mapping on chromosome 2. Infected 385 F2 D. melanogaster flies from an backross with a resistant and suceptible parental genotype with L. boulardi and recorded their resistance status as resistant or susceptible (1 or 2 in phenotype column below). Selected 10 insertion-deletion markers interspersed along chromosome 2 for mapping and genotyped the 385 D. melanogaster. Then, interval mapping was done using R/QTL.

Broad mapping

First step is to create a genetic map using marker segreation ratios, to see how linked markers are in current cross.

Set working directory and load library

library(qtl)
library(zoo)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

Inspect the data

comb1 <- read.csv(file="qtl_chr3rest_with5cm.csv")
head(comb1)
Phenotype Chr2L_3cM Chr2L_5cM Chr2L_7cM Chr2L_10.3cM Chr2L_17cM Chr2L_27cM Chr2L_34cM Chr2L_54cM Chr2R_67cM Chr2R_104cM
1 NA 1 1 1 1 1 1 1 1 1 1
2 1 A A H A A A A A H A
3 2 H H H H H H H H H A
4 1 A A A A A A A A A A
5 2 A A A A A A A A A
6 2 H H H H H H H A A H

Read genotypes from cross into rqtl

mapthis <- read.cross("csv", file="qtl_chr3rest_with5cm.csv", estimate.map=FALSE)

only keep individuals typed for more than nine markers

mapthis <- subset(mapthis, ind=(ntyped(mapthis)>9)) 

estimates recombination fraction (rf)

mapthis <- est.rf(mapthis)

assign markers into linkage groups

lg <- formLinkageGroups(mapthis, max.rf=0.5, min.lod=6) 

get rf

rf <- pull.rf(mapthis) 

get lod scores

lod <- pull.rf(mapthis, what="lod")

plot rf v lod scores

plot(as.numeric(rf), as.numeric(lod), xlab="Recombination fraction", ylab="LOD score")

image calculate error probabilities

mapthis <- calc.errorlod(mapthis, error.prob=0.01)

estimate map with new error probabilities, to see how genetic map changes with greater genotyping error

newmap <- est.map(mapthis, error.prob=0.01) 
plotMap(newmap,show.marker.names = TRUE)

image

replace old map with new one

mapthis <- replace.map(mapthis, newmap)

run this code to use standard drosophila genetic map.

column names contain recombination map position. use recombination estimates from drosophila genetic map (from Flybase) rather than those calculated here using marker ratios.

markers=colnames(read.csv("qtl_chr3rest_with5cm.csv")) 
markers2=markers[2:length(markers)]
markers2=gsub('Chr2L_','',markers2)
markers2=gsub('Chr2R_','',markers2)
markers2=gsub('cM','',markers2)
markers2=as.numeric(markers2)
flymap=as.list(markers2) 
flymap=newmap
flymap[[1]]<-markers2
names(flymap[[1]])=markers[2:length(markers)]
plotMap(flymap,show.marker.names = TRUE)

image

Flybase map used further

mapthis <- replace.map(mapthis, flymap)

Plot the genotypes. Genotypes AA and AB are indicated by white and black circles. X's in intervals having a crossover.

plotGeno(mapthis, chr=1, ind=c(1:115))

image

calculate genotype probabilities along map

mapthis <- calc.genoprob(mapthis, step=1, error.prob=0.05)

Genome scan with single qtl model using Haley-Knott (HK) regression. Easy and fast, but performs poorly in the case of selective genotyping (Feenstra et al. 2006), which is not the case here.

out.hk <- scanone(mapthis, method="hk")
summary(out.hk)
chr pos lod
c1.loc12 1 15 17.9

here carrying out a permutation test. LOD thresholds (1000 permutations)

operm.hk <- scanone(mapthis, method="hk", n.perm=1000) 
summary(operm.hk, alpha=0.05)
lod
5% 1.46

Test if marker is above log threshold. Yes it is!

summary(out.hk, perms=operm.hk, alpha=0.05, pvalues=TRUE)
chr pos lod pval
c1.loc12 1 15 17.9 0

get confidence intervals

cM_in_qtl=out.hk$pos[out.hk$lod>(max(out.hk$lod)-1.5)]
low=min(cM_in_qtl)
up=max(cM_in_qtl)
low

11

up

25

Composite interval mapping, see if more than one qtl exists

out.cim <- cim(mapthis,window=10)

Combined plot with Haley-Knott single qtl model (Black line) and composite interval mapping (Red line) which blue shaded region indicating area of significance.

plot(out.hk,xlab="Chromosome 2 Map Position (cM)",cex.lab=1.4, cex.axis=1.4)
plot(out.cim,add=T,col="red")
rect(low,-5,up,20, col= rgb(0,0,1.0,alpha=0.15), border = NA)

image

Idetified a single 24cM QTL from broad mapping.

Fine mapping

Here the approach was use the same backcross as before but only genotype resistant flies and identify regions where the marker ratio departed from a 50:50 expectation (homozygous vs heterozygous). The broad mapping results from before was used to focus on the region from 3 to 27 cM on chromosome 2. The first step was to identify flies with a recombination breakpoint within 3 to 27 cM. ~1300 F2 flies were phenotyped and genotyped. But only ~300 had a breakpint between the 3 to 27 cM.

This is the observed risk ratio from non-recombinant flies (adults with capsules) (same marker genotype at 3 and 27cM). There were 738 flies RR genotype and 260 SS flies

observedRR=738/282

Choosing informative markers involved employing a χ2 drop, which was determined by simulating 1,000 datasets using the estimated risk ratio from nonrecombinant flies and the observed recombination fraction. This χ2 drop delineated a region containing the gene in 95% of the simulations.

genosim=function(nrecombs=298, risk=observedRR){
  
  #number genetically resistant recombinants
  nR=rbinom(1, nrecombs, risk/(risk+1))
  
  #recombinants at 3 and 27 cM, so 240 0.1cM intervals
  #1,0 recombinants
  mat=matrix(1,nrow=1000,ncol=240)
  breakpoint=sample(238, 1000,replace=T)+1
  for (i in 1:1000){
    mat[i,breakpoint[i]:240]=0
  }
  
  #0,1 recombinants
  mat2=matrix(0,nrow=1000,ncol=240)
  breakpoint=sample(238, 1000,replace=T)+1
  for (i in 1:1000){
    mat2[i,breakpoint[i]:240]=1
  }
  
  #shuffle the two
  mat3=rbind(mat,mat2)[sample(1:2000),]
  
  #split into resistant and susceptible at position 70 (ie 7cm from start, 10cM) 
  #assign R as genotype 1
  matR=mat3[mat3[,70]==1,]
  matS=mat3[mat3[,70]!=1,]
  
  #sample resistant and susceptible lines following risk ratio
  genotypes=rbind(matR[1:nR,],matS[1:(nrecombs-nR),])
  genotypes
}
genosim2=function(N=251, R=observedRR,ldrop=1.5,chidrop=4.5){
  
  QTL=vector()
  CIsize=vector()
  CIgene=vector()
  lodCIsize=vector()
  lodCIgene=vector()
  chiCIsize=vector()
  chiCIgene=vector()

  for(i in 1:10000){
    genos=genosim(nrecombs=N, risk=R)
    x=which(colSums(genos)==max(colSums(genos)))
    #note this is to randomly choose marker when ties
    QTL[i]=x[sample(length(x))][1]
    
    #get chi2 scores
    chi2=vector()
    for(j in 1:ncol(genos)){
      x=prop.test(sum(genos[,j]),nrow(genos),p=0.5)$statistic
      #next line changes sign depnding on whether in correct direction
      chi2[j]=ifelse((sum(genos[,j])/nrow(genos)>0.5),
                     x,-x)
    }
    
    ci_chi=which(chi2>(max(chi2)-chidrop))
    chiCIsize[i]=(max(ci_chi)-min(ci_chi))/10
    chiCIgene[i]=ifelse(min(ci_chi)<=70&max(ci_chi)>=70,"gene in CI","gene not in CI")
    
  }
  #use this for chi-squared ci
  c(table(chiCIgene)[2]/length(chiCIgene),
    mean(chiCIsize)
  )
}

run simulations

test different chi-squared drops. called lods.

lods=40:60/10
effect_L=matrix(nrow=length(lods),ncol=2)

for(k in 1:length(lods)){
  effect_L[k,]=genosim2(N=251, R=observedRR,chidrop=lods[k])
  print(k)
}
effect_L2=cbind(lods,effect_L[,1:2])
colnames(effect_L2)[2:3]=c("proportion_qtl_containing_gene","mean_qtl_size")
write.csv(effect_L2,file="QTL simulations different chi drops.csv")

effect_L2=read.csv(file="QTL simulations different chi drops.csv")

#pdf(file="QTL simulations different chi drops.pdf",height=4,width=5)
par(mfrow=c(1,1))
plot(100*effect_L2[,3]~effect_L2[,2],
     ylim=c(0,9),
     xlab="chi2 drop",
     ylab="% times gene outside 95% CI interval",
     main='risk ratio=3.21,N=298')
abline(h=5,col="red",lwd=3)

image

This is the drop in chi2 to get confidence intervals on location. Derived empirically from simulation script.

chi_drop=4.6

Read input file

HSSH <- read.csv("HS-SH,15-5.csv")
head(HSSH)
Sample Plate Well Row Col Egg lay date Egg lay period Egg transfer (ET) Infection start date Infection end date Infection period Infection start date - Egg lay date No. vials made on that date Yeast usage Cage Total flies collected on that date 3 cM 27 cM Genotype (3 and 27 cM) Plate Label 7 cM 10.3 cM 12 cM 17 cM Wasp primer melting temperature Wasp DNA amplified 8 cM 11 cM 10.7 cM 11.3 cM 11.6 cM 8.5 cM 9 cM 10 cM
15 1 G2 G 2 25. Jan Overnight 25. Jan 28. Jan 29.01.19 1 day 3 10 Plastic, no yeast 2 15 H S HS 1_G2 H H S S 80.82924652 N NA H NA H H NA NA NA
18 1 B3 B 3 27. Jan Overnight 27. Jan 29. Jan 31.01.19 2 days 2 70 Plastic, no yeast 2 89 H S HS 1_B3 H H S S 81.12721252 N NA H NA H H NA NA NA
34 1 B5 B 5 27. Jan Overnight 27. Jan 29. Jan 31.01.19 2 days 2 70 Plastic, no yeast 2 89 H S HS 1_B5 H H S S 81.87213135 N NA S NA NA NA NA NA NA
50 1 B7 B 7 27. Jan Overnight 27. Jan 29. Jan 31.01.19 2 days 2 70 Plastic, no yeast 2 89 H S HS 1_B7 H H S S 78.44551086 Y NA H NA H H NA NA NA

Create matrix of marker genotypes

genotypes1=HSSH[,grep("cM",names(HSSH))]
names(genotypes1)=gsub(".cM", "", names(genotypes1))
genotypes1 <- genotypes1[-c(3)]
names(genotypes1)=gsub("X", "", names(genotypes1))
ind=order(as.numeric(names(genotypes1)))
genotypes1=data.matrix(genotypes1[,ind])

Only use samples with evidence of parasitic wasp infection

genotypes1 <- genotypes1[HSSH$Wasp.DNA.amplified=="Y",]

Impute missing genotypes when flanking values same

is.wholenumber <-
  function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol

genotypes2=t(apply(genotypes1,1,na.approx))
ind=rowSums(!t(apply(genotypes2,1,is.wholenumber)))==0
genotypes=genotypes2[ind,]
table(genotypes[,7] == genotypes[,9])
FALSE TRUE
2 109

only 2 informative recombinants between 10.3 and 11

Test if genotypes differ from 50:50 ratio

genotypes[genotypes==2]=0

chi2=vector()
for(i in 1:ncol(genotypes)){
  x=prop.test(sum(genotypes[,i]),nrow(genotypes),p=0.5)$statistic
  #next line changes sign depending on whether in correct direction
  chi2[i]=ifelse((sum(genotypes[,i])/nrow(genotypes)>0.5),
                 x,-x)
}

get location CIs

locations=as.numeric(colnames(genotypes1))
outside=which(chi2<max(chi2-4.6))
peak=which(chi2==max(chi2))
low=locations[outside[max(which(outside<peak))]]
up=locations[outside[min(which(outside>peak))]]
low

10

up

11.6

Plot of chi2 (test of whether marker ratio departs from 50:50 ratio) along chromosome. Single 1.6cM peak detected.

Beware this is not interval mapping so pay little attention to line between markers

image