YorkHost - PVE03 Offline – Incident details

All systems operational

PVE03 Offline

Resolved
Major outage
Started 2 days agoLasted about 1 hour

Affected

Virtualization Infrastructure

Major outage from 1:46 PM to 1:48 PM, Operational from 1:48 PM to 1:55 PM, Degraded performance from 1:55 PM to 2:10 PM, Major outage from 2:10 PM to 2:40 PM

PVE 03

Major outage from 1:46 PM to 1:48 PM, Operational from 1:48 PM to 1:55 PM, Degraded performance from 1:55 PM to 2:10 PM, Major outage from 2:10 PM to 2:40 PM

Updates
  • Resolved
    Resolved

    The root cause was identified as a link flap protection trigger (linkFlapErrDisabled) on interface Ethernet104/1/47 of edge1-par3-b9.
    The port has been manually re-enabled after confirming stability, and network access to PVE-03 has been restored.

    The node is now fully operational, and services are back online.
    Monitoring will continue to ensure stability following the recovery.

  • Update
    Update

    The system has been repaired in rescue mode, and we are now attempting to restart and access the internal Proxmox environment.
    Initial recovery steps completed successfully, and network reinitialization is underway.

    Monitoring will continue throughout the reboot process.
    Further updates will follow once the node status is confirmed stable.

  • Update
    Update

    Our team is currently working in rescue mode using a Debian image to repair the affected Proxmox environment.
    The issue originated from a memory allocation failure, which also disrupted the network configuration.

    Restoration efforts are in progress to bring the node back online safely.
    We will share another update once the system is stabilized and network access is restored.

  • Update
    Update
    We are continuing to work on a fix for this incident.
  • Update
    Update

    The node PVE-03 experienced a crash due to a memory allocation error.
    As a result, the system is currently facing network recovery issues.

    Our technical team is actively investigating and working to restore normal connectivity as quickly as possible.
    Further updates will be provided as soon as progress is made.

    We appreciate your patience and understanding.

  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.