Haplotype Inference

Haplotype inference is described in the paper "Comparing ASP, CP, ILP on Haplotype Inference by Pure Parsimony", as follows:

A genotype is the specific genetic makeup of an individual. It is in the form of a DNA, consisting of a number of sites. Each genotype has two copies, one from the mother and one from the father. These two copies are called haplotypes. Haplotype Inference problem is, for a given set of genotypes, to infer haplotypes that explain them.

Haplotype Inference by Pure Parsimony (HIPP) is a slight modification of the Haplotype Inference problem, which asks for a minimal set of haplotypes that explain the given genotypes.

We consider two problems: an optimization problem (HIPP) and a decision problem that corresponds to it (HIPP-DEC).

HIPP Given a set G of n genotypes each with m sites, find a minimal set H of haplotypes such that each genotype in G is explained by two haplotypes in H.

HIPP-DEC Given a set G of n genotypes each with m sites, and a positive integer k, decide whether there is a set H of k haplotypes such that each genotype in G is explained by two haplotypes in H.

HIPP-DEC is NP-hard.

Both problems depend on the concept of two haplotypes “explaining” a genotype. This concept is defined in two ways, the standard definition by D. Gusfield and an alternative one by D. Brown & I. Harrower; essentially they are equivalent. We will mainly consider Gusfield’s definition to formulate the problem, and then compare it to Brown & Harrower’s.

Problem Formulations

In the following we present formulations of HIPP-DEC for each of the programming paradigms: Constraint Programming, Integer Linear Programming, and Answer Set Programming. First, we formalized the HIPP-DEC in a straightforward way. These formulations of HIPP-DEC are minimally modified to obtain formulations for HIPP, as explained in the paper above. In order to test elaboration tolerance and observe performance changes, we can add two kinds of symmetry breaking constraints to these formulations. Formulations are in the language of OPL (in the case of CP and ILP), and in the language of lparse (in the case of ASP).

Formulation

CP

ILP

ASP

HIPP-DEC

hipp-dec-gus-cp.mod

hipp-dec-gus-ilp.mod

hipp-dec-gus-asp.lp

HIPP-DEC with S1

hipp-dec-s1-gus-cp.mod

hipp-dec-s1-gus-ilp.mod

hipp-dec-s1-gus-asp.lp

HIPP-DEC with S2

hipp-dec-s2-gus-cp.mod

hipp-dec-s2-gus-ilp.mod

hipp-dec-s2-gus-asp.lp

HIPP (from HIPP-DEC)

hipp-opt-gus-cp.mod

hipp-opt-gus-ilp.mod

hipp-opt-gus-asp.lp

HIPP with S1

hipp-opt-s1-gus-cp.mod

hipp-opt-s1-gus-ilp.mod

hipp-opt-s1-gus-asp.lp

HIPP with S2

hipp-opt-s2-gus-cp.mod

hipp-opt-s2-gus-ilp.mod

hipp-opt-s2-gus-asp.lp

Alternative HIPP

hipp-opt-alternative-gus-cp.mod

hipp-opt-alternative-gus-ilp.mod

hipp-opt-alternative-gus-asp.lp

Alternative HIPP-DEC

hipp-dec-alternative-gus-cp.mod

hipp-dec-alternative-gus-ilp.mod

hipp-dec-alternative-gus-asp.lp

All formulations above are available in one file: hipp-models-gus.zip

HIPP can also be formulated according to Dan Brown and Ian Harrower’s definition. These formulations are available in this file: hipp-models-bro.zip.

HIPP-DEC

With these files, we can solve problem instances of HIPP-DEC using Clasp for ASP, and using ILOG-OPL for CP and ILP. For instance, a solution to problem P2 can be computed by the following commands:

 lparse -c n=8 -c m=9 –c k=8 -d none --true-negation hipp-dec-s1-gus-asp.lp P2.happ | clasp

 Here are the input files (*.happ files) describing problems P0-P3 in lparse (same for both definitions): hipp-dec-input-gus-lparse.zip

 In OPL, we run programs (*.mod files) with data files (*.dat files) and different options for CP and ILP. Here are data files describing problems P0-P3 in OPL, relative to Gusfield’s definition: hipp-dec-input-gus-opl.zip, and relative to Brown’s definition: hipp-dec-input-bro-opl.zip.

The options -conflict and -relax are included only for CP formulations. For instance, a solution to problem P2 can be computed by the following command in OPL, for CP:

oplrun –v –conflict -relax -D n=8 -D m=9 –D k=8 hipp-dec-gus-cp.mod P2.gus.dat

and for ILP:

oplrun –v -D n=8 -D m=9 –D k=8 hipp-dec-gus-ilp.mod P2.gus.dat

If OPL is running through an IDE, such as ILOG, you can change the options by including a *.ops file into the project. For more details, you may visit the ILOG web site.

HIPP

In order to solve the original problem HIPP, we can run these commands with HIPP-DEC formulations for different values of k (search for minimal k that terminates with decision “YES”), or we can use the HIPP formulations (obtained by modifying HIPP-DEC formulations) to find the minimal value of k. When running HIPP formulations, we discard the parameter k and add the parameter 0 in the case of ASP; for example, a solution to problem P2 can be computed as follows:

lparse -c n=8 -c m=9 –d none -–true-negation hipp-opt-s1-gus-asp.lp P2.happ | clasp 0

In the case of OPL, we discard the lines in data files that assign values to parameter k, and run as the following:

oplrun -v -D n=8 -D m=9 hipp-opt-gus-ilp.mod P2-opt.gus.dat

Here are the modified data files: hipp-opt-input-gus-opl.zip and hipp-opt-input-bro-opl.zip.

Experimental Results

With the formulations above, we tried to solve four problem instances of HIPP,  using the answer set solver clasp (Version 1.0.4) and ILOG opl Development Studio (Version 5.5) with CP-optimizer (Version 1.1), on a machine with Intel Centrino 1.8GHz CPU and 1 GB of RAM running on Windows XP.

The experimental results can be found here: hipp-results.pdf