*Image Credit: Samtools
Genome sequencing gives us unprecedented ability to read the variation in an individuals genome. It is especially useful for finding novel and rare variation that is uncorrelated with common variation that can be cheaply assayed using genotyping chip technology.
Unfortunately, genome sequencing is still very expensive (>5x the cost) of genotyping chips. However, when one only cares about knowing the population frequency of a variant, and not which individuals carry it. The sequencing can be pooled and costs very little.
Unfortunately, with pooled sequencing, individual level data cannot be matched to genotypes and so common covariates such as gender, age and ancestry can not be included in regression models. This will increase the noise in the data resulting in less powerful and possible confounded studies.
However, since for the same cost more individuals can be included in pooled sequencing. At what point does the increase in sample size result in a more powerful study than using genotyping technology. More importantly, can the pooling be designed in such a way as to preserve some of the individuals level information without creating batch affects?
Recommend Prior Knowledge:
- strong statistical background with knowledge of bayesian statistics
- proficiency in a programming language (e.g. R, python, MATLAB, etc.)
- Basic biology background (e.g. LS 3, 4, 7B)
Before meeting with me:
Please read the introduction and basic method description of the following:
- some videos
If interested, contact Rob Brown (firstname.lastname@example.org)