PIC =
= 1 – SIGMA((frequency of allele i from 1 to n)^2) – SIGMA-SIGMA(2 * ((allele 1 to <n-1>)^2) * ((allele 2 to n)^2)) =
= 1 – (((frequency of allele 1)^2) + ((frequency of allele 2)^2) + ….. + ((frequency of allele n)^2)) - (((frequency of allele 1)^2) + ((frequency of allele 2)^2) + ….. + ((frequency of allele n)^2)) - ((2 * ((frequency of allele 1)^2) * ((frequency of allele 2)^2)) + (2 * ((frequency of allele 1)^2) * ((frequency of allele 3)^2)) +…+ (2 * ((frequency of allele 1)^2) * ((frequency of allele n)^2)) + (2 * ((frequency of allele 2)^2) * ((frequency of allele 3)^2)) + (2 * ((frequency of allele 2)^2) * ((frequency of allele 4)^2)) + ….+ (2 * ((frequency of allele 2)^2) * ((frequency of allele n)^2)) + ….+ (2 * ((frequency of allele <n-1>)^2) * ((frequency of allele n)^2)))
Rationale for the last element to the equation, from Sham (1998, page 61):
If a parental mating is AiAj x AiAj (where i and j are different), then there is a probability of 0.5 that the parental origins of the alleles transmitted to an offspring can be traced and 0.5 that the parental origins cannot be traced. Since the probability of the parental mating type in a random mating population is (4 * ((frequency of allele i)^2) * ((frequency of allele j)^2)), multiplication by 0.5 gives the (2 * ((frequency of allele i)^2) * ((frequency of allele j)^2)).
For very large numbers of alleles at a locus, the heterozygosity index and PIC become very close.
If there are n alleles of equal frequency, then
PIC =
= ((n – 1)^2) * (n + 1) / (n^3) =
= 1 – (1/n) – ((n-1) / (n^3))