Necessary and sufficient conditions for identifiability in the admixture model
Abstract
We consider M SNP data from N individuals who are an admixture of K unknown ancient populations. Let si be the frequency of the reference allele of individual i at SNP s. So the number of reference alleles at SNP s for a diploid individual is binomially distributed with parameters 2 and si. We suppose si=Σk=1KFskQki, where Fsk is the allele frequency of SNP s in population k and Qki is the proportion of population k in the ancestry of individual i. I am interested in the identifiability of F and Q, up to a relabelling of the ancient populations. Under what conditions, when =F1Q1=F2Q2 are F1 and F2 and Q1 and Q2 equal? I show that the anchor condition (Cabreros and Storey, 2019) on one matrix together with an independence condition on the other matrix is sufficient for identifiability. I will argue that the proof of the necessary condition in Cabreros and Storey, 2019 is incorrect, and I will provide a correct proof, which in addition does not require knowledge of the number of ancestral populations. I will also provide abstract necessary and sufficient conditions for identifiability. I will show that one cannot deviate substantially from the anchor condition without losing identifiability. Finally, I show necessary and sufficient conditions for identifiability for the non-admixed case.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.