Today, one of my fellow colleagues received a call about a server that had run out of memory. They sent a soft reboot, and because of that the process task hung. This is because the hypervisor compute node sends a message to the nova agent running on the guest virtual machine! If the guest virtual machine has run out of memory, it’s not possible for nova to receive that command, or, if it does, then the soft (software) reboot can fail, because there is not enough memory to fork the process.
This could have been avoided by issuing a hard reboot straight away, but in this case we needed to cancel the task and send a hard reboot. Here is what I did:
List all pending tasks on xen-server
# xe task-list uuid ( RO) : a9f84f3d-0b96-8da2-a1d1-f5b774cd9173 name-label ( RO): VM.clean_reboot name-description ( RO): status ( RO): pending progress ( RO): 0.275
Cancel a pending task on xen-server
xe task-cancel uuid=a9f84f3d-0b96-8da2-a1d1-f5b774cd9173
This sets the active_state back to normal and gets rid of the ‘pending soft reboot’, but we need to restart the server too.
Using supernova API to stop and restart the server
supernova lon stop serveruuidhere supernova lon start serveruuidhere
and…The customer is back up online and running, yay!