Crash of VPS host shnik.g.faelix.net Virtual Servers

Event Started
2025-01-02 12:39
Report Published
2025-01-02 17:45
Last Updated
2025-01-05 13:31
Event Finished
2025-01-02 13:11

One of our VPS hosts in Geneva has locked up.

Timeline (most recent first)
  • 2025-01-05
    13:30:00

    Customer servers have been running on this for almost a day now, and there has been no recurrence of the hardware warning. We are closing this incident.

  • 2025-01-04
    16:01:00

    After a BIOS and BMC update, a full shutdown and power-cycle, shnik is showing no hardware warnings.

  • 2025-01-04
    15:37:00

    The BMC update seems to have been successful. We're going to apply a BIOS update as well, for completeness' sake.

  • 2025-01-04
    15:23:00

    We are going to apply an update to the BMC (baseboard management controller) to see if this sheds further light or clears the warning.

  • 2025-01-04
    15:19:00

    So far we've turned up nothing obvious: all airflow baffles and channels are correctly placed within the server shnik and no components are showing warning lights.

  • 2025-01-04
    14:58:00

    All customer VPSs are moved off. We are now shutting down shnik for investigation.

  • 2025-01-04
    14:43:00

    Our engineer is on-site at the Geneva DC and is beginning work. First we are going to migrate VPSs off shnik before we investigate the hardware "warning".

  • 2025-01-02
    13:55:00

    The VPS host's hardware is showing a "warning" but none of its sensors are out of thresholds. No log entries on the system management console show any reason for the "warning" either. We are going to send an engineer to the datacentre on 4th January to investigate further.

  • 2025-01-02
    13:14:00

    All customer VPSs are now running.

  • 2025-01-02
    13:11:00

    Customer VPSs are now beginning to start.

  • 2025-01-02
    13:04:00

    It's taken us longer than we'd hoped to power-cycle the server because, at the time it crashed, it was also hosting the VPS for our documentation which includes passwords for the hardware lights out.

  • 2025-01-02
    12:44:00

    The VPS host shnik.g.faelix.net has become unresponsive. We're going to issue a reboot via the "lights out management" for the server.

  • 2025-01-02
    12:39:00

    The first alerts have come in that some of the VPSs are unresponsive. We are investigating.