EUPEX consortium member FORTH has a paper accepted in the 18th Workshop on Virtualization in High-Performance Cloud Computing (VHPC’23) held in conjuction with ISC23.
With the dynamicity of emerging systems rapidly multiplying, it is important to evolve our testing infrastructures to better understand how distributed systems deal with failures. Existing Chaos tools often lack a comprehensive understanding of the system’s runtime and typically inject faults in a random manner. While random testing approaches are helpful in uncovering “shallow” bugs, testing deep failure paths requires precise and controlled fault injection at specific runtime conditions in distributed systems. This paper introduces Frisbee, an automated chaos testing platform for distributed applications on Kubernetes. Frisbee utilizes both static and dynamic runtime instrumentation to manage the dependency stack and perform testing actions. It achieves this by integrating the collection of runtime events from multiple sources with a scenario modeling language. This approach allows Frisbee to inject realistic software faults in a controlled manner while the target system runs. Moreover, since our method is based on runtime events, it ensures deterministic fault injection regardless of the specific system or workload involved. We demonstrate the practicality and relevance of Frisbee across various applications, including Cloud-native databases and Federated learning deployments.