PACE 2026 - (rooted) Maximum-Agreement Forest

Maximum-Agreement Forest

The challenge problem for this year is (rooted) Maximum-Agreement Forest:

Input: A list of phylogenetic trees on the same leaf-set $X$
Output: An agreement forest for all input trees with a minimum number of trees

What is a phylogenetic tree?

A phylogenetic tree $T$ is a tree whose edges are directed away from the unique root. Its sinks/leaves (vertices with out-degree zero) are bijectively labelled with a set $X$ (for simplicity, it is often assumed that $X$ is the set of leaves of $T$).

What is an agreement-forest?

An agreement forest for a set of trees $T_1$, $T_2$, $\ldots$ on the same set $X$ of leaves (leaf-labels) is a forest $F$ of phylogenetic trees whose leaf-set is $X$ such that $F$ can be obtained from each $T_i$ by first removing directed edges and then exhaustively contracting vertices that are not in $X$ but have in- and out-degree at most one (“cleanup”).

Example

It is worth to note that a maximum-agreement forest for two phylogenetic trees $T_1$ and $T_2$ contains exactly one more tree than the so-called “rooted subtree prune and regraft” (SPR) distance between $T_1$ and $T_2$. Formally, the distance is defined as the smallest number of “SPR-moves” necessary to turn $T_1$ into $T_2$ (or vice versa – the distance is a metric), where an SPR-move consists of removing any edge $uv$, subdividing any edge $xy$ of the weak component of $u$ with a new vertex $z$, contracting $u$ and adding the edge $zv$. Results for computing the rooted SPR-distance are often applicable for computing MAFs for two trees.

Complexity

Deciding whether two rooted binary trees on $n$ leaves have an agreement forest with $k$ trees is known to be NP-complete. However, this problem can be solved in $O^\star(2.35^k)$ time and there is a kernel with $28k$ leaves. It also admits a polynomial-time 2-approximation algorithm. If the input consists of $t>2$ trees, then the problem can be solved in $O(3^knt)$ time or $O^\star(2.42^kn^4t^3)$ time. Structural parameters have not been explored for either case.

Verifier and test set

A verifier with a small test set will be made available.

Public instances

Public instances will be made available.

Tracks

Exact Track

The task is to compute an optimal agreement forest for the given phylogenetic trees. For each instance, the solver has to output a solution within a time limit of $x$ minutes and a memory limit of $y$ GB ($x$ and $y$ to be determined).

Instances in this track will come with precomputed parameter values and proofs (i.e. decompositions of certain width). We will take requests for parameters to be included in this list.

Submissions should be based on provably optimal algorithms. Submissions that output an incorrect solution or a solution that is known to be non-optimal will be disqualified. Besides dedicated algorithms, we also encourage submissions based on other paradigms such as SAT, MaxSAT, or ILPs. We allow the use of non-commercial SAT solvers and ILP solvers provided they are not subject to licenses that restrict the free distribution of your solvers.

Heuristic Track

In this track, the solver shall compute a good solution quickly. The solver will be run on each instance for a short time with limited memory. After this time, the solver receives the Unix signal SIGTERM. When receiving this signal, the process has to output a solution to the standard output and terminate. If the program does not halt in a reasonable time after receiving the signal, it will be stopped via SIGKILL. In this case, the instance is counted as “time limit exceeded”.

For this track, solutions do not have to be optimal. However, solvers that produce an incorrect solution will still be disqualified.

Lower-Bound Track

The goal of this track is to encourage research on good lower bounds. The idea is that good lower bound allow solvers to determine early that a solution is “good enough” to be within some radius of the optimum. Only solutions within a fixed radius will be considered valid and valid solutions will be scored by their running time.

Final Evaluation

The final score will be computed over a set of $x$ private instances that are similar to the public instances.

Input and Output

Input Format

Each instance is described by a single text file that consists of different line types:

Empty lines carry no strict meaning and can be ignored.
Lines starting with the symbol # and can always be ignored, but may contain useful additional data. The meaning of these lines is determined by the second character:
- #p {t} {n} indicates that the file contains $t$ trees with $n$ leaves each. This line appears in the header as the first non-comment line.
- # is a comment line.
- We may add further line types in a backward-compatible way later, in the sense that it is never wrong to ignore an unrecognized # line.
All other non-empty lines contain a tree description in the Newick format (see below) and represent an input tree. There are exactly $t$ such lines.

Files do not contain unnecessary whitespace anywhere (e.g., Newick expression contain no whitespace, lines do not start or end with a whitespace, …).

Relevant subset of the Newick format

We use a subset of the standard Newick format for phylogenetic trees: Each tree is represented by a valid parenthesis-expression in a dedicated line. The whole expression is terminated with a semicolon. It is defined recursively:

Each leaf is represented by its label, a positive integer from $1, 2, …, n$ where $n$ is the number of leaves indicated in the #p line.
Each internal node has exactly two children and no own label. It is denoted as (A,B) where A and B are the expressions of the two children.

The two trees in the introduction example above can be represented as follows:

#p 2 6
(((5,6),(3,4)),(1,2));
(((((4,2),1),5),3),6);

But it may also contain comments:

# This is a demo file consisting of two trees with 6 leaves each:
#p 2 6
# It may have comments here, ...
(((5,6),(3,4)),(1,2));
# ... or here ...

(((((4,2),1),5),3),6);
# ... or even here ...

Output Format

The output format is the same as the input format, but consisting of $k$ phylogenetic trees whose leaf labels are a partition of the leaf labels of each input tree. Empty lines and lines starting with # are ignored.

The solution for the example above could be the following:

# internal debugging message 1
(6,5);
4;
# internal debugging message 2
3;
(1,2);

Literature

M. Bordewich, C. Semple (2005). On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance. Ann. Comb. 8. DOI
C. Semple, M. Steel (2003). Phylogenetics. Oxford University Press. Publisher
D. H. Huson, R. Rupp, C. Scornavacca (2011). Phylogenetic Networks. Cambridge University Press. DOI
C. Whidden, N. Zeh, (2009). A Unifying View on Approximation and FPT of Agreement Forests. Algorithms in Bioinformatics. WABI 2009. LNCS, vol 5724. DOI
C. Whidden, R. G. Beiko, N. Zeh (2013). Fixed-Parameter Algorithms for Maximum Agreement Forests. SIAM J. Comput. 42(4). DOI
Z. Chen, Y. Harada, Y. Nakamura, L. Wang (2020). Faster Exact Computation of rSPR Distance via Better Approximation. IEEE/ACM Trans. Comput. Biol. Bioinform. 17(3). DOI.
C. Widden (2013). Efficient Computation and Application of Maximum Agreement Forests. PhD Thesis. Dalhousie Univeristy. PDF
F. Shi, J. Wang, J. Chen, Q. Feng, J. Guo (2014). Algorithms for parameterized maximum agreement forest problem on multiple trees. Computing and Combinatorics 554. DOI
F. Shi, J. Chen, Q. Feng, J. Wang (2018). A parameterized algorithm for the Maximum Agreement Forest problem on multiple rooted multifurcating trees. Journal of Computer and System Sciences 97. DOI
N. Olver, F. Schalekamp, S. van der Ster, L. Stougie, A. van Zuylen (2023). A duality based 2-approximation algorithm for maximum agreement forest. Math. Program. 198. DOI
L. Bulteau, M. Weller (2019). Parameterized Algorithms in Bioinformatics: An Overview. Algorithms 12(12). DOI