MARK A. HANSON LAB
  • Home
  • Research
  • Join the team
  • Publications
  • Resources
  • Contact
  • Blog
  • Gallery

Fruit fly genetics and publishing ethics

Mining the DGRP for naturally occurring mutations

3/27/2021

 
Drosophila geneticists benefit from decades of research and development, providing tools that can tackle literally any gene in the genome. At the click of your mouse, you can readily order flies that will express dsRNA to knock down your gene of interest for <$20 plus the cost of shipping. Most genes have putative mutant alleles disrupting gene expression or structure. Now, in the age of CRISPR, it's even easier to fill in the gaps where existing toolkits are less robust.

But CRISPR involves capitol. Time and money, and energy. And after that, there is still a chance that your "good" mutation isn't as good as you thought it was. Take it from someone who's experienced this firsthand on more than one occasion: and I haven't even worked with that many flies generated using CRISPR and double gRNA! Briefly: we generated and tracked a Cecropin mutation using mutant-specific primers, but somehow failed to detect a wild-type Cecropin locus present in our Cecropin-mutant stocks caused by a bizarre recombination event. More recently, I detected that a similar double gRNA approach that also included an HDR vector (two chromosome arms that were there as guides to the locus) did not insert as one might expect... I detected this after performing maybe the 3rd or 4th routine-check PCR where I eventually realized: "hey... that band is ~100 bp larger than it should be... isn't it?" In the end, instead of replacing the locus, the HDR vector failed to do its job and inserted in the middle of the promoter. SNPs and a 10 nt indel prevented detection by existing primers for the wild-type gene. In the end, our mutant was still effectively a hypomorph (which is good, because it had a phenotype we spent a good while on), but not a full knock out (Fig 1).
Picture
Figure 1: the BaraB[LC1] mutation (Hanson and Lemaitre, 2021) did not insert as expected. This mutation disrupts the promoter but the full gene remains present and mostly unchanged. A 10nt insertion at the C-terminus overlapped the reverse primer binding site that was initially used to check for the wild-type locus.

​My point here is that CRISPR isn't a magic tool that always works. Beyond these bizarre instances, off-target effects are a serious concern and could affect 4% of CRISPR mutant stocks (Shu Kondo, EDRC 2019); or more, it's tough to say! In total, CRISPR is a powerful tool in the toolkit, but it requires gRNA generation, injection, mutant isolation, and in my experience something that is especially important: robust mutant validation. This process takes a couple months at best, but more likely longer. But what if you didn't have to make new mutations? What if there was already an existing mutation you could access, but not in known databases.


Enter the DGRP
The DGRP is a fantastic commercially available resource full of genetic variation. The very basis of the DGRP is to screen the 205 DGRP stocks for alleles that predict variation in response to a treatment through genome-wide association study. The power of this approach is a proof-of-concept that sufficient genetic variation in the DGRP exists such that alleles in individual genes can predict outcome of treatment. Perhaps the most striking example I know of comes for the antimicrobial peptide gene Diptericin A (DptA), where a serine/arginine polymorphism at residue 69 (S69R) predicts defence against Providencia rettgeri infection (Unckless et al., 2016). Moreover, two stocks (DGRP-822 and -832) encode a premature stop codon in DptA that was associated with extremely poor immune defence in that study. I recently isogenized those alleles from DGRP stocks into a controlled genetic background (unpublished), confirming their roles in defence (Fig 2).
Picture
Figure 2: isogenized DptA loci from the DGRP encoding either the S69R locus (DGRP-38 as source) or premature stop (DGRP-822 as source) suffer highly increased susceptibility to infection compared to iso w1118 wild-type. iso Dpt[SK1], iso ΔAMPs14, and iso Rel[E20] flies are known to be highly susceptible to P. rettgeri (Hanson et al., 2019a). Unpublished (Mar 27, 2021)!

​​Mutations causing frameshifts, loss of start codons, and premature stops are easy to spot. They're annotated as such! This makes it extremely easy to screen for mutations that dramatically affect a gene's coding sequence, making them putative loss-of-function alleles. For the AMP genes I study, traditional mutagenesis been difficult owing to their small size. But as non-essential genes (that are very useful in specific circumstances!) they are ripe targets for standing genetic variation (Unckless and Lazzaro, 2016; Hanson et al., 2019b).

Here I am proposing to treat the DGRP as a resource of loss-of-function alleles. This blog post will serve as an occasionally updated list of genes for which no transgenic mutant exists, but a DGRP line encodes a putative loss-of-function allele.

Gene - Allele - Annotation - Some line(s) that encode this allele
  • DptA - 2R_14753502_SNP - STOP_GAINED - (DGRP-822, DGRP-832)
  • DptA - 2R_14753589_SNP - NON_SYNONYMOUS_CODING - (DGRP-28, DGRP-38, etc...)
    • Allele causing the serine/arginine polymorphism associated with poor immune defence
  • DptB - 2R_14755267_SNP - STOP_GAINED - (DGRP-223)
  • AttA - 2R_10635584_SNP - Asn->Asp change in DptA S69R homologous site - (DGRP-41, DGRP-136, etc...)
    • Note there are a number of non-synonymous changes in AttA common to lines carrying this allele
    • Asp (D) is basal in DptA of Diptera (Hanson et al., 2019b)
    • Lines DGRP-41 and DGRP-136 also have non-synonymous alleles in AttB (e.g. here)
  • AttB - 2R_10637422_DEL - FRAME_SHIFT - (DGRP-776)
    • Frame shift upstream of DptA S69R homologous site
  • AttC - 2R_9281370_INS - FRAME_SHIFT - (DGRP-85, DGRP-320, etc...)
    • In this case, it's actually the reference genome that encodes the mutation, which causes incorrect frame translation of AttC
  • AttD - 3R_13451467_DEL - FRAME_SHIFT - (DGRP-318)
  • IM18 - 2R_19488690_SNP - START_LOST - (DGRP-370)
  • IM18 - 2R_19488505_DEL - FRAME_SHIFT - (DGRP-28, DGRP-41, etc...)
  • edin - 3L_17487847_DEL - UPSTREAM - (DGRP-59, DGRP-88, etc...)
    • An intriguing deletion of 5 nucleotides that overlaps one of three putative NF-kB transcription factor binding site in the edin promoter region
  • edin - 3L_17488380_DEL - DOWNSTREAM - (DGRP-28, DGRP-85, etc...)
    • An intriguing deletion of 62 nucleotides that overlaps the edin 3' UTR (Dmel_R6), possible hypomorph
  • CG42649 - 2R_13487008_DEL - FRAME_SHIFT - (DGRP-882)
    • Frameshift causing nonsense peptide that no longer encodes PTP-like proline-rich domain (P-rich domains common in AMPs)
  • CG43920 - 2R_13486518_DEL and 2R_13486499_DEL - DOWNSTREAM - (DGRP-235)
    • DGRP-235 encodes two large deletions relative to the reference genome in the region directly downstream of CG43920​
  • Drs-like1 - 3L_3335775_DEL and 3L_3335682_DEL - FRAME_SHIFT - (DGRP-882 and DGRP-907, etc...)
    • Two indels present that cause frameshifts in Drs-like1
  • Drs-like2 - 3L_3314625_DEL - UTR_3_PRIME - (DGRP-100)
    • 20 nucleotide deletion in 3' UTR, possible hypomorph
  • Drs-like5 - 3L_3316835_SNP - START_LOST - (DGRP-531)

I hope the extensive list provided here from just my personal observations is convincing enough to check the DGRP. Perhaps a bioinformatician greater than I could quickly extract a list of all genes with clear candidates for loss-of-function (i.e. start lost, stop gained, frame shift, insertion/deletion). I'm happy to collaborate if this is you!

Troubleshooting

As recommended by Ferreira et al. (2014), isogenization is beneficial to confirm a mutation's effect is true and not reliant on a genetic interaction or LD effect. But especially for mutations involving a deletion or insertion, mutant-specific primer design is easy. But even for SNPs one can still use 3' mismatch primer design for a rapid screening process. Alternately you might see multiple SNPs in tandem, which could offer a specific primer site. Otherwise, Sanger sequencing isn't so bad...

But I do have one hack for isogenizing difficult-to-PCR SNPs: you can instead scan upstream and downstream for a large indel near your gene of interest, and see if one of your mutation lines also has this indel. That way, primers can be designed for that indel instead, which is in linkage disequilibrium with your mutation of interest.

For instance:
  • In the case of DptA[822], I found a locus in the neighbouring gene jheh2 that encodes a complex INDEL now coding a totally different segment of ~6bp about 10.4kb upstream of DptA. Recommended pragmatic protocol in this instance would be to sequence the mutant locus to first validate the line, and at the same time test primers predicted to be in linkage disequilibrium. Afterwards, isogenize using those primers, and perhaps halfway and certainly at the end of the isogenization scheme send your samples for sequencing and look for a signal of the SNP. 

Onwards!

Seems good no? The DGRP is a resource. And at least in my eyes, it's got a lot more potential than it's being given credit for!

References:
  1. Unckless et al. 2016. https://pubmed.ncbi.nlm.nih.gov/26776733/
  2. ​Hanson et al. 2019a. https://pubmed.ncbi.nlm.nih.gov/30803481/
  3. Unckless and Lazzaro. 2016. https://pubmed.ncbi.nlm.nih.gov/27160594/
  4. Hanson et al. 2019b. https://pubmed.ncbi.nlm.nih.gov/31781114/
  5. Ferreira et al. 2014. https://pubmed.ncbi.nlm.nih.gov/25473839/

Comments are closed.

    Author

    Mark

    Archives

    December 2024
    November 2024
    August 2024
    July 2024
    March 2024
    March 2023
    February 2023
    October 2022
    April 2022
    December 2021
    October 2021
    March 2021
    June 2020
    November 2019

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Research
  • Join the team
  • Publications
  • Resources
  • Contact
  • Blog
  • Gallery