Publications Search - Abstract View
||PCAmatchR: a flexible R package for optimal case-control matching using weighted principal components.
||Brown DW, Myers TA, Machiela MJ
||2020 Sep 14
||SUMMARY: A concern when conducting genome-wide association studies (GWAS) is the potential for population stratification, i.e. ancestry based genetic differences between cases and controls, that if not properly accounted for, could lead to biased association results. We developed PCAmatchR as an open source R package for performing optimal case-control matching using principal component analysis (PCA) to aid in selecting controls that are well matched by ancestry to cases. PCAmatchR takes user supplied PCA outputs and selects matching controls for cases by utilizing a weighted Mahalanobis distance metric which weights each principal component by the percent of genetic variation explained. Results from the 1000 Genomes Project data demonstrate both the functionality and performance of PCAmatchR for selecting matching controls for case populations as well as reducing inflation of association test statistics. PCAmatchR improves genomic similarity between matched cases and controls, which minimizes the effects of population stratification in GWAS analyses. AVAILABILITY: PCAmatchR is freely available for download on GitHub (https://github.com/machiela-lab/PCAmatchR) or through CRAN (https://cran.r-project.org/web/packages/PCAmatchR/index.html). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.