Informatik, TU Wien

Dynamic Programming for Lazy Bastards

Dynamic Programming (DP) Algorithms are very common in bioinformatics.

Abstract

Dynamic Programming (DP) Algorithms are very common in bioinformatics. Key computational problems in the field, such as sequence comparison, RNA structure prediction, and many aspects of phylogenetics admit exact DP solutions. Although conceptually simple, algorithms applicable to practical scenaria are complicated by incorporating much biological detail, and thus are tedious to implement and test. Algebraic dynamic programming simplifies algorithm development through a strict separating the generation of the search space, an evaluation algebra, and a choice function. Since state space traversal is represented by grammars in ADP, the framework is large restricted to strings and ordered trees, and practical implementations are usually focussed on context-free grammars. I will report on several recent advances. (1) Products of grammars allow the efficient construction of multi-tape grammars from very simple low-dimensional building blocks. In particular, complex alignment algorithms can designed very easily in this manner. (2) The ideas of ADP readily generalize to arbitrary Multi-Context-Free Grammars. This extension covers in particular the extremely complicated algorithms solving RNA folding problems for so-called pseudoknotted structures. (3) Re-interpreting the notion of parsing (of strings) as partitioning of essentially arbitrary data objects makes it possible to use the ideas of ADP on general combinatorial data structures. In this general framework it becomes clear that backward (in a HMM setting) outside (in a CFG setting) recursions are uniquely determined by the easier forward/inside version of the DP algorithms, i.e., by the generalized grammar defining the rules of search space generation. The outside algorithm thus can be mechanically computed from the inside "grammar" and does not need to be developed separately. As an application we show how this can be used to compute probability distributions over Hamiltonian paths. The key advantage of this approach is that it becomes very simple and efficient in practise to develop prototypical algorithms and in particular to modify the detailed specification of the underlying models with very little effort. The presentation will report joint work with Christian Hoener zu Siederdissen, Sonja J. Prohaska, Maik Riechert, and Johannes Waldmann.

Note

This talk is organized by the Institute of Information Systems. With kind support of the Vienna Center for Logic and Algorithms (VCLA) and the Wolfgang Pauli Institut (WPI).