We argue that with current complexity levels and necessity of dealing with time, in addition to classical synthesis and analysis methods, we need to turn to empirical data-driven approaches which require monitoring, online measurement, online analysis, diagnosis, failure prediction and decision making to support recovery and nonstop computing and communication. To illustrate such approaches two case studies are presented: In the first case study, we address the problem of proactive fault management by demonstrating how runtime monitoring, variable selection and model re-evaluation lead to effective failure prediction.
The second case study illustrates how by observation and measurement a generator for realistic topologies of ad hoc networks has been developed. A number of topology generation algorithms for simulation of wireless multi-hop networks have been proposed but as shown in literature most of the existing node placement models create topologies that are considerably different from topologies of real networks. In order to address this issue we have developed a novel node placement algorithm - NPART that creates topologies that resemble the real networks and helps in resilience analysis.
Finally, we conclude that models derived from monitoring and measurement will continue gaining on significance and impact and list the major challenges for empirical research on dependability.
Miroslaw Malek is professor and holder of Chair in Computer Architecture and Communication at the Department of Computer Science at Humboldt University in Berlin. His research interests focus on dependable, embedded and distributed systems including failure prediction, dependable architectures and service availability. He has participated in two pioneering parallel computer projects, contributed to the theory and practice of parallel network design, developed the comparison-based method for system diagnosis, codeveloped comprehensive WSI and networks testing techniques, proposed the consensus-based framework for responsive (fault-tolerant, real-time) computer systems design and failure prediction methods and has made numerous other contributions, reflected in over 200 publications including six books and over 25 supervised Ph.D. dissertations (nine of
his students are professors). He has organized, chaired and been a program committee member of numerous IEEE and ACM international conferences and workshops. Among others, he was Program and General Chairman of the Real-Time Systems Symposium and General Chairman of the 24th Fault-Tolerant Computing Symposium, Program Co-chairman of the 22nd Symposium on Reliable Distributed Computing, Program Chairman and General Chairman of the International Service Availability Symposium. He served and serves on the editorial boards of various journals, among them the Journal of Interconnection Networks as well as Real-Time Systems journal.
He is consultant to government and companies on technical and strategic issues in information technology. Malek received his PhD in Computer Science from the Technical University of Wroclaw in Poland, spent 17 years as professor at the University of Texas at Austin and was also, among others, visiting professor at Stanford, Universita di Roma “La Sapienza”, Keio University, New York University, Chinese University of Hong Kong, and guest researcher at Bell Laboratories and IBM T.J. Watson Research Center.