Restarting a Xen Server that is out of memory thru Hypervisor

So, you have a xen server, but the virtual machine is not responding, what do you do? You login to the hypervisor and fix it, all right!

Please note for sanitation that random strings have been used instead of real life UUID.

Step 1: Connect Hypervisor

ssh root@somehypervisoriporhostname

Step 2: Check Running Tasks (Task List)

[root@10-1-1-1 ~]# xe task-list
uuid ( RO)                : ff9ca1a3-fc29-a245-1f28-2adc646114a2
          name-label ( RO): Async.VM.clean_reboot
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.371


uuid ( RO)                : aff56852-6db4-1ab3-b2b1-33e48c797dbb
          name-label ( RO): Connection to VM console
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.000


[root@10-1-1-1 ~]# xe task-list params=all
uuid ( RO)                  : ff9ca1a3-fc29-a245-1f28-2adc646114a2
            name-label ( RO): Async.VM.clean_reboot
      name-description ( RO):
            subtask_of ( RO): 
              subtasks ( RO):
           resident-on ( RO): 43b6096b-09cd-4890-b51b-56e50de573ff
                status ( RO): pending
              progress ( RO): 0.372
                  type ( RO): 
                result ( RO):
               created ( RO): 20151014T15:01:17Z
              finished ( RO): 19700101T00:00:00Z
            error_info ( RO):
    allowed_operations ( RO): Cancel


uuid ( RO)                  : aff56852-6db4-1ab3-b2b1-33e48c797dbb
            name-label ( RO): Connection to VM console
      name-description ( RO):
            subtask_of ( RO): 
              subtasks ( RO):
           resident-on ( RO): 43b6096b-09cd-4890-b51b-56e50de573ff
                status ( RO): pending
              progress ( RO): 0.000
                  type ( RO): 
                result ( RO):
               created ( RO): 20151014T15:57:48Z
              finished ( RO): 19700101T00:00:00Z
            error_info ( RO):
    allowed_operations ( RO):

I could see that there were two tasks running on this slice:

[root@10-1-1-1 ~]# xe vm-list name-label=slice10011111
uuid ( RO)           : 4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
     name-label ( RW): slice10011111
    power-state ( RO): running

This told me that the slice was running OK. So I am going to cancel the task pending for it


$ xe task-cancel uuid=ff9ca1a3-fc29-a245-1f28-2adc646114a2

Shutdown the server (HALT IT)


[root@10-1-1-1 ~]# xe vm-shutdown --force uuid=4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
[root@10-1-1-1 ~]# xe vm-list name-label=slice10011111
uuid ( RO)           : 4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
     name-label ( RW): slice10011111
    power-state ( RO): halted

Start the Virtual Machine

[root@10-1-1-1 ~]# xe vm-start uuid=4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff

At the end I wanted to check if the instance was still causing a large swap as it was when it was running out of memory! That is the reason why I had to start the server.

(echo "Slice IO_Read IO_Write Total"; (for uuid in $(xe vbd-list params=uuid | awk '$5{print $5}'); do xe vbd-param-list uuid=$uuid | grep -P "^\s*(io_|vm-name-label|vdi-name-label|vdi-uuid|device)" | awk '{if($1=="vdi-uuid") {hasswap="no";vdi_uuid=$4;}}{if($1=="vm-name-label") name=$4; if($1=="vdi-name-label") {if ($4 ~ /swap/) {hasswap="yes";name=name"-swap"}; if ($5 ~ /ephemeral/) name=name"-eph";} if($1=="device"){if($4=="hda" || $4=="xvda") name=name"-root"; if($4=="xvdc" && hasswap=="no") {vdicmd="xe vdi-list uuid="vdi_uuid" params=name-description --minimal | grep swap >> /dev/null"; swpname=system(vdicmd); if(swpname==0) name=name"-swap"};} if($1=="io_read_kbs") ioread=$4; if($1=="io_write_kbs") iowrite=$4}END{if(substr(name,0,9)!="XenServer") print name" "ioread" "iowrite" "ioread+iowrite}'; done) | sort -k4n) | column -t

Job done!

haxed.me.uk

a place of infrastructure delights

Restarting a Xen Server that is out of memory thru Hypervisor