The concept of two haplotypes explaining a genotype is defined in various ways. Here is Gusfield's definition.
We have studied HIPP by means of its decision version:
HIPP-DEC is NP-complete.
Assume that H is a set that contains 2*n haplotypes, h1, ..., h2n, and that every genotype gi in G is mapped to two haplotypes, h2i and h2i-1, in H. Then H is a solution to HIPP-DEC if the following hold:
C1 For every genotype g in G, for every site j of g
with value 2, the values of the jth sites of the corresponding two
haplotypes are different.
C2 For every genotype g in G, for every site j of g
with value 1 or 0, the values of the jth sites of the corresponding
two haplotypes are identical to g[j].
C3 There are at most k unique haplotypes in H.
We represent HIPP-DEC as a program, where each condition C1--C3 is represented by a set of rules. Here is a straightforward ASP formulation of HIPP-DEC, with respect to Gusfield's definition: hipp-dec-alternative-gus-asp.lp.
We have also formulated HIPP-DEC, with respect to other mathematical models of the problem, and in other declarative programming paradigms (Constraint Programming and Integer Linear Programming). All formulations are available here.
For a genotype g of the form (g11, g12)...(gm1, gm2), let us denote by g1 the vector g11g21...gm1 and by g2 the vector g12g22...gm2. We say that two alleles i and j are compatible if they are identical or if one of them is ?. Two haplotypes h1 and h2 are compatible with a genotype g, all with m sites, if, for every site j, h1[j] is compatible with one of the two alleles, g1[j] or g2[j], and h2[j] is compatible with the other. For instance, the haplotypes 011 and 111 are compatible with (1, 0)(1, 1)(1, 1) and (1, ?)(1, ?)(1, 1) but not with (1, ?)(1, 0)(1, 1). Note that, we can discard the sites with missing information while computing a solution for HIPAG.
We consider the following decision version of HIPAG:
HIPAG-DEC Given a set G of n genotypes each with m biallelic sites, and a positive integer k, decide whether there is a set H of at most k unique haplotypes such that each genotype in G is compatible with two haplotypes in H.
Assume that H is a set that contains 2*n haplotypes, h1, ..., h2n, and that every genotype gi in G is mapped to two haplotypes, h2i and h2i-1, in H. Then H is a solution to HIPAG-DEC if the following hold:
H1 For every genotype g in G, for every site j of
g, the values of the j'th sites of
the corresponding haplotypes h1
and h2 are compatible with
the values of g1[j] and g2[j].
H2 There are at most k
unique haplotypes in H.
The ASP representation of HIPAG-DEC is similar to the ASP representation of HIPP-DEC: We generate values for the sites of haplotypes and compute the maximum number of unique haplotypes with the same set of rules. Haplotypes are represented as in HIPP representation. Genotypes are represented in terms of atoms of the form present(G,J,I) ("Chromosome I of Genotype G has value 1 at Site J") describing present genes; -present(G,J,I) describes absence of genes. For instance, Genotype 3 of the form (1, ?)(1, 0)(1, 1) is represented as follows:
present(3,1,1). present(3,2,1). present(3,3,1). -present(3,2,2). present(3,3,2).
Here is the ASP program that describes HIPAG-DEC: hipag-dec-asp.lp