Chaos Testing

We offer Docker and Kubernetes boilerplates designed to test the resilience of NodeSet and Blockchain, which you can customize and integrate into your pipeline.

Goals

We recommend structuring your tests as a linear suite that applies various chaos experiments and verifies the outcomes using a load testing suite. Focus on critical user metrics, such as:

  • The ratio of successful responses to failed responses
  • The nth percentile of response latency

Next, evaluate observability:

  • Ensure proper alerts are triggered during failures (manual or automated)
  • Verify the service recovers within the expected timeframe (manual or automated)

In summary, the primary focus is on meeting user expectations and maintaining SLAs, while the secondary focus is on observability and making operational part smoother.

Docker

For Docker, we utilize Pumba to conduct chaos experiments, including:

  • Container reboots
  • Network simulations (such as delays, packet loss, corruption, etc., using the tc tool)
  • Stress testing for CPU and memory usage

Additionally, we offer a resources API that allows you to test whether your software can operate effectively in low-resource environments.

You can also use fake package to create HTTP chaos experiments.

Given the complexity of Kubernetes, we recommend starting with Docker first. Identifying faulty behavior in your services early—such as cascading latency—can prevent more severe issues when scaling up. Addressing these problems at a smaller scale can save significant time and effort later.

Check NodeSet + Blockchain template here.

Kubernetes

We utilize a subset of ChaosMesh experiments that can be safely executed on an isolated node group. These include:

Check NodeSet + Blockchain template here.

Blockchain

We also offer a set of blockchain-specific experiments, which typically involve API calls to blockchain simulators to execute certain actions. These include:

  • Adjusting gas prices

  • Introducing chain reorganizations (setting a new head)

  • Utilizing developer APIs (e.g., Anvil)

Check gas and reorg examples, the same example work for K8s.

Debugging and Developing Chaos Suites

To debug Docker applications you can just use CTFv2 deployments and Docker logs.

To debug K8s please use our simulator.

The simplest way to start is to spin up the simulator and run K8s tests:

cd infra/chaosmesh-playground
devbox run up
k9s

Then go to default namespace and check which example pods we have, run the tests

cd framework/examples/myproject/chaos
CTF_CONFIGS=chaos_k8s.toml go test -v -run TestK8sChaos

Reorg and gas tests will fail and that's fine, you should run them on a real Geth deployment.

Grafana's annotations required GRAFANA_URL environment variable to be set

export GRAFANA_URL=...
export GRAFANA_TOKEN=...

If you want CRDs to stay use remove_k8s_chaos flag in chaos-k8s.toml

    remove_k8s_chaos = false

Then search for :podchaos and :networkchaos in k9s to inspect the CRDs