We view a genotype as a vector of sites, each site having a value from the domain {0, 1, 2}; and a haplotype as a vector of sites, each site having a value from the domain {0, 1}. According to Gusfield, a genotype is ambiguous if its value is 2; and resolved otherwise. Two haplotypes h1 and h2 form (or explain) a genotype g if for every site j the following hold:

if g[j]=2 then h1[j]=0 and h2[j]=1;

if g[j]=1 then h1[j]=1 and h2[j]=1; and

if g[j]=0 then h1[j]=0 and h2[j]=0.

For instance, the genotype 20110 can be explained by the haplotypes 10110 and 00110:

10110

00110

-------

20110

 

Consider a set H of k haplotypes. For the problem above, H is a solution to HIPP-DEC if the following constraints are satisfied:

 

C1       Every genotype g in G is mapped to two haplotypes in H.

C2       For every genotype g in G, for every ambiguous site j of g, the values of the j'th sites of these haplotypes are different.

C3       For every genotype g in G, for every resolved site j of g, the values of the j'th site of these haplotypes are g[j].