GWDI-bank – Introducing the genome wide Dictyostelium insertion resource for functional genomics
In an age of –omic technologies it is essential to be able to quickly acquire and study targets. We have developed a pipeline for the large-scale generation of gene knockout mutants by combining restriction enzyme-mediated integration (REMI) mutagenesis with NGS technology, namely REMI-seq. Using this method; we have created a genome-wide collection of Dictyostelium mutants as a new resource for the international research community. The resource comprises both individual and large pools of mutants. The position of REMI mutations will be searchable via dictyBase, and the Dicty Stock Centre will distribute the resource.
The resource, referred to as the Genome Wide Dictyostelium Insertion bank, or GWDI-bank for short, comprises approx. 23,000 individually banked mutants with known insertion sites. This includes ~ 14,000 different genomic loci, of which 69% are intragenic. Approx. 5,500 different genes have at least one insertion; there are multiple alleles available for the majority of these genes. A further ~1,000 genes have an insertion within 500 bp upstream of their start codon.
Validation of the resource has established that the REMI-seq pipeline is robust; inverse PCR demonstrated that mutants contain an insertion at the expected loci and a screen for developmental phenotypes revealed the expected phenotypes when the insertion occurred in a previously characterised gene.
This new resource will produce a step change in Dictyostelium genetics. The principle benefits will be the online availability of independent and multi-allelic mutants for many Dictyostelium genes, the capacity to conduct complex phenotyping of protein families, and the ease at which whole genome phenotypic screens can be conducted.
Parallel phonotyping using the GWDI-library
Genetically tractable, simple eukaryotic cells like Dictyostelium are immensely valuable tools for discovery genetics and biomedical research. One key element is their potential for high content genetic screening.
However, despite the plethora of existing bio-resources, including complete genome sequence and transcriptional profiles, full exploitation of these data is hampered by our inability to efficiently link genotype, transcriptome or proteome level information to phenotype. To date, it has only been possible to generate pools of mutants by REMI and to isolate mutants with a desired phenotype. However, this approach is hampered by the fact that (a) the complexity of the initial pool of mutants was unknown (b) the causative mutation in each mutant of interest had to be identified one by one (this laborious process necessitated that only a handful of mutants could be studied) (c) only positive selections could be carried out that enriched for mutants that increased in frequency (e.g. drug resistant), whilst hypersensitive mutants that decreased in frequency were lost.
To remove these key bottlenecks, we have developed a novel technique (REMI-seq), which combines REMI mutagenesis with NGS technology to create a genome-wide set of ‘barcoded’ single gene mutants with defined insertion sites.
After insertion at DpnII or NlaIII sites, 20 bp fragments can be extracted using a type III endonuclease, MmeI (which cuts 18/20 bp downstream of its recognition sequence) and an I-SceI meganuclease site (there are no other I-SceI sites in the Dictyostelium genome). Addition of Illumina sequencing adaptors after digestion allows these fragments to be efficiently isolated from contaminating gDNA. This novel REMI-seq methodology allows us to identify insertion sites from both single mutants and pools of mutants en masse.
Most importantly, this allows researchers to screen for mutants that exhibit changes in fitness when challenged (e.g. in developmental signalling or drug sensitivity). This is because within a mixed population of multiple mutants, the number of reads of each unique sequence tag (or “barcode”) provides a quantitative measure of the relative abundance of that mutant. When these populations are subjected to selection conditions, mutants that increase or decrease their frequency can be identified by changes in barcode read counts. We have carried out proof of principle experiments to determine the dynamic range of this method, which illustrates linearity over a 10,000x fold range.
Furthermore, trial selections of a pool of 30,000 mutants grown in HL5 or on different bacterial species illustrate the biological value of the data. We will also highlight the utility of the resource to describe essential genes for other aspects of the life cycle such as differentiation and in biomedical screening for drug sensitivity. These data sets will not only provide proof of principle studies, but also highly useful data for the research community. Finally, because the pools of mutants will be made available to the research community, we have developed a step-by-step analysis pipeline of the data and highlight best practice for use of this resource.