ASAFE (Ancestry Specific Allele Frequency Estimation)

Estimating the frequency of an allele for a bi-allelic marker in a 3-way admixed population

View project on GitHub

What is ASAFE?

The ASAFE EM algorithm provides maximum likelihood estimates of ancestry-specific allele frequencies at a bi-allelic marker, given local ancestries and genotypes at the marker. It deals with uncertainty in the phase (i.e. order) of a local ancestry pair relative to a genotype.

ASAFE was motivated by genome-wide association studies of Hispanics performed by the University of Washington Genetic Analysis Center as part of the Hispanic Community Health Study, and applies to any 3-way admixed diploid population, not just humans.

15 min talk (with commentary) , 15 min talk (no commentary) : I presented these at the XXVIIIth International Biometric Conference/WNAR combined conference 2016. Includes some introduction to genetics.

4 min talk (with commentary) , 4 min talk (no commentary) , Poster : I presented these at JSM 2016

How does ASAFE fit into a genetic analysis workflow? Show me the picture.

In the top folder of the ASAFE R package is a file, ASAFE_Visual.pdf, illustrating how ASAFE fits into a genetic analysis workflow:

Picture of a Workflow using ASAFE

Sharon Browning has some code that does the steps in the diagram involving phasing of genotypes and running of RFMIX. It is the part of the following script above the line that says "Apply masking": http://faculty.washington.edu/sguy/local_ancestry_pipeline/rfmix_mds_pipeline. I've extracted the relevant part of the script here:

BEAGLE then RFMIX

You'd then write your own script to go from text file output from RFMIX to text file input to ASAFE.

How do I use the ASAFE R Package?

To download the package, click either of the two "Download" blue links on the upper right of this webpage, or go to https://github.com/BiostatQian/ASAFE and click the green "Clone or download" link. In the main ASAFE package folder, see the vignette inst/doc/ASAFE.pdf for instructions on how to use the package.

A smaller version of this package (which has the same R code and unit tests, but not information needed to reproduce the paper) is on Bioconductor: https://bioconductor.org/packages/ASAFE/.

Changes and Feedback

If your question is not answered on this page or in the vignette, feel free to contact me with package-related questions.

[Changes made June 4, 2016]

(1) Removed function em() from vignette, because a user would likely only be interested in using function algorithm_1snp(), which calls em().

(2) Change variable names in estep.R, mstep.R, and em.R to match the supplement.

Paper Reference

Zhang QS, Browning BL, and Browning SR (2016) “Ancestry Specific Allele Frequency Estimation.” Bioinformatics.

There's a typo in the supplement. Tables are called "Table 1", "Table 2", "Supplementary Table 1", and "Supplementary Table 2", instead of Tables 1, 2, 3, and 4. I've changed comments in the code to hopefully make the distinction amongst tables clear.

Contributors

Qian Sophia Zhang (qszhang@uw.edu)