Informatik, TU Wien

Reliability Engineering

Google TechTalk @ TU Wien, Kooperationsveranstaltung der Fakultät für Informatik mit Google

Abstract

If by accident "google.com" would not send users to Google servers but to your personal webserver, it would break down under traffic after a few seconds. A minute later, your internet service provider would go offline. If they are unlucky, even their upstream network provider would fold under the load.

10⁹ users impose a daily traffic load on popular Google services that can be best described as distributed denial of service attack. Coping with that torrent of user requests is where the story of Reliability Engineering begins. It is a story of keeping the site up under all peek traffic, while upgrading the software stack, while upgrading the network fabric, and replacing broken hardware without a user noticing, 24/7.

This talk is about distributed software architectures, their failure modes, and why load balancers need load balancers themselves. The talk concludes with a brief discussion on how to scale not software-wise but organization-wise, namely Google's hiring practices for engineers and interns.

Biography


Clemens Fruhwirth is a senior site reliability engineer at Google Zürich, a TU Wien master graduate, and a free software contributor. He is the inventor of LUKS, the most popular harddisk encryption standard with GNU/Linux. At Google, he and his colleagues run systems that invoice 10⁵$ per minute. He likes sub-millisecond latency and his favorite storage unit is a petabyte.

Note  

Following the talk, your hot questions can find their answers over a cold buffet.