One or more drives in one of our virtual-machine hosting servers, lexx, have failed.
We replicate storage between physical hosts using DRBD, and most VMs continued running using their secondary storage. These were relatively easy to live-migrate onto their secondary node.
A small number, however, locked up with either I/O errors or even numerous segfaults and kernel panics. For crashed VMs we are manually restarting them on another physical host. Unfortunately there is a delay to do this while we make sure that drbd is up to date — in some cases it is requiring a full drbd resync.