Restarting a Xen Server that is out of memory thru Hypervisor

So, you have a xen server, but the virtual machine is not responding, what do you do? You login to the hypervisor and fix it, all right!

Please note for sanitation that random strings have been used instead of real life UUID.

Step 1: Connect Hypervisor

ssh root@somehypervisoriporhostname

Step 2: Check Running Tasks (Task List)

[root@10-1-1-1 ~]# xe task-list
uuid ( RO)                : ff9ca1a3-fc29-a245-1f28-2adc646114a2
          name-label ( RO): Async.VM.clean_reboot
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.371


uuid ( RO)                : aff56852-6db4-1ab3-b2b1-33e48c797dbb
          name-label ( RO): Connection to VM console
    name-description ( RO):
              status ( RO): pending
            progress ( RO): 0.000


[root@10-1-1-1 ~]# xe task-list params=all
uuid ( RO)                  : ff9ca1a3-fc29-a245-1f28-2adc646114a2
            name-label ( RO): Async.VM.clean_reboot
      name-description ( RO):
            subtask_of ( RO): 
              subtasks ( RO):
           resident-on ( RO): 43b6096b-09cd-4890-b51b-56e50de573ff
                status ( RO): pending
              progress ( RO): 0.372
                  type ( RO): 
                result ( RO):
               created ( RO): 20151014T15:01:17Z
              finished ( RO): 19700101T00:00:00Z
            error_info ( RO):
    allowed_operations ( RO): Cancel


uuid ( RO)                  : aff56852-6db4-1ab3-b2b1-33e48c797dbb
            name-label ( RO): Connection to VM console
      name-description ( RO):
            subtask_of ( RO): 
              subtasks ( RO):
           resident-on ( RO): 43b6096b-09cd-4890-b51b-56e50de573ff
                status ( RO): pending
              progress ( RO): 0.000
                  type ( RO): 
                result ( RO):
               created ( RO): 20151014T15:57:48Z
              finished ( RO): 19700101T00:00:00Z
            error_info ( RO):
    allowed_operations ( RO):

I could see that there were two tasks running on this slice:

[root@10-1-1-1 ~]# xe vm-list name-label=slice10011111
uuid ( RO)           : 4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
     name-label ( RW): slice10011111
    power-state ( RO): running

This told me that the slice was running OK. So I am going to cancel the task pending for it


$ xe task-cancel uuid=ff9ca1a3-fc29-a245-1f28-2adc646114a2

Shutdown the server (HALT IT)


[root@10-1-1-1 ~]# xe vm-shutdown --force uuid=4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
[root@10-1-1-1 ~]# xe vm-list name-label=slice10011111
uuid ( RO)           : 4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff
     name-label ( RW): slice10011111
    power-state ( RO): halted

Start the Virtual Machine

[root@10-1-1-1 ~]# xe vm-start uuid=4a9a5dfb-3c4a-b2bb-be7b-db3be6297fff

At the end I wanted to check if the instance was still causing a large swap as it was when it was running out of memory! That is the reason why I had to start the server.

(echo "Slice IO_Read IO_Write Total"; (for uuid in $(xe vbd-list params=uuid | awk '$5{print $5}'); do xe vbd-param-list uuid=$uuid | grep -P "^\s*(io_|vm-name-label|vdi-name-label|vdi-uuid|device)" | awk '{if($1=="vdi-uuid") {hasswap="no";vdi_uuid=$4;}}{if($1=="vm-name-label") name=$4; if($1=="vdi-name-label") {if ($4 ~ /swap/) {hasswap="yes";name=name"-swap"}; if ($5 ~ /ephemeral/) name=name"-eph";} if($1=="device"){if($4=="hda" || $4=="xvda") name=name"-root"; if($4=="xvdc" && hasswap=="no") {vdicmd="xe vdi-list uuid="vdi_uuid" params=name-description --minimal | grep swap >> /dev/null"; swpname=system(vdicmd); if(swpname==0) name=name"-swap"};} if($1=="io_read_kbs") ioread=$4; if($1=="io_write_kbs") iowrite=$4}END{if(substr(name,0,9)!="XenServer") print name" "ioread" "iowrite" "ioread+iowrite}'; done) | sort -k4n) | column -t

Job done!

Playing with Xenstore

So, I have been playing around with xenstore-ls and xenstore-read commands on my Virtual Machine in the cloud. Basically xenstore-ls and xenstore-read are used to retrieve variable information about the network settings which are passed to a vdi when it is being built. Also if the network configuration breaks there is a way to use xenstore-write to read the vm-data read only network variables and the nova agent will be called to reset it.

I am still familiarizing myself with this so apologies if there are any mistakes. This article will be updated as I learn more about it.

Commands available on Rackspace Virtual Machines

xenstore         xenstore-chmod   xenstore-exists  xenstore-list    xenstore-ls      xenstore-read    xenstore-rm      xenstore-watch   xenstore-write

There are several things we can do

1. Show the Data associated with the VM, including mem_free, mem_total of the instance, the OS version (8), the os_name, distro, and the uname for the kernel.

 root@dingdong:~# xenstore-ls data
host = ""
meminfo_total = "1018872"
meminfo_free = "560180"
os_name = "Debian GNU/Linux 8.1 (jessie)"
os_majorver = "8"
os_minorver = "1"
os_uname = "3.16.0-4-amd64"
os_distro = "debian"
updated = "Fri Aug 28 17:30:00 BST 2015"
guest = ""
 9cab4aed-0d29-4c7e-be2f-e15f1ed33231 = "{"message": "1.39.1", "returncode": "0"}"
 6f38e0e1-6606-4245-8c8f-560c0204b419 = "{"message": "109108621899310233456141728258155", "returncode": "D0"}"
 c96d2b6e-31fb-489f-882c-790da25dbe1a = "{"message": "", "returncode": "0"}"
 be73ed2e-5c5a-4183-861b-2b6faaf8b09b = "{"message": "", "returncode": "0"}"
 ad730c7c-3c5e-4ba7-bfcc-8400a2675566 = "{"message": "75660051671748071924891088737764", "returncode": "D0"}"
 d3536286-4447-4f5b-84cc-8b6b1f61989c = "{"message": "", "returncode": "0"}"
 PresentationForAdamOfHowXenstoreWork = "{"message": "", "returncode": "0"}"

As you can see from above my colleague was helping make a presentation for me as to how this actually works!

Listing all of the current vm-data

 

xenstore-ls vm-data

networking = ""
 BC764E08E370 = "{"label": "private", "broadcast": "10.179.255.255", "ips": [{"ip": "10.179.197.101", "netmask": "255.255.192.0", "enabled": "1", "gateway": null}], "mac": \..."
 BC764E086A56 = "{"ip6s": [{"ip": "2a00:1a48:7806:115:be76:4eff:fe08:6a56", "netmask": 64, "enabled": "1", "gateway": "fe80::def"}], "label": "public", "broadcast": "162.13\..."
meta = "{"rxtx_cap": 120.0}"
hostname = "dingdong"
auto-disk-config = "False"
provider_data = ""
 ip_whitelist = ""
  54 = "10.182.5.215"
  53 = "134.213.147.236"
  52 = "10.182.5.234"
  51 = "134.213.148.114"
  50 = "10.179.0.222"
  49 = "10.179.75.22"
  48 = "162.13.1.53"
  47 = "95.138.174.55"
  46 = "162.13.5.15"
  45 = "10.177.132.233"
  44 = "31.222.169.12"
  43 = "10.179.0.234"
  42 = "10.177.199.231"
  41 = "10.179.0.159"
  40 = "10.176.3.232"
  39 = "10.176.3.236"
  38 = "10.176.3.239"
  37 = "10.176.3.235"
  36 = "10.177.5.90"
  35 = "10.177.5.89"
  34 = "10.177.5.88"
  33 = "10.177.1.73"
  32 = "10.176.3.158"
  31 = "10.177.0.105"
  30 = "162.13.5.96"
  29 = "5.79.25.90"
  28 = "162.209.3.51"
  27 = "162.13.22.243"
  26 = "162.13.22.242"
  25 = "166.78.7.98"
  24 = "166.78.17.140"
  23 = "166.78.24.91"
  22 = "31.222.184.215"
  21 = "31.222.184.38"
  20 = "46.38.166.180"
  19 = "46.38.160.93"
  18 = "31.222.157.156"
  17 = "31.222.177.183"
  16 = "31.222.177.167"
  15 = "31.222.164.168"
  14 = "31.222.180.84"
  13 = "31.222.161.245"
  12 = "173.203.157.20"
  11 = "119.9.12.98"
  10 = "119.9.12.91"
  9 = "162.13.1.53"
  8 = "95.138.174.55"
  7 = "162.209.4.155"
  6 = "166.78.107.18"
  5 = "50.56.249.239"
  4 = "166.78.7.146"
  3 = "89.234.21.64/28"
  2 = "67.192.155.96/27"
  1 = "173.203.5.160/27"
  0 = "173.203.32.136/29"
 roles = "["object-store:default", "compute:default", "identity:user-admin"]"
 region = "lon"
 provider = "Rackspace"
user-metadata = ""
 build_config = ""monitoring_defaults,monitoring_agent_only,auto_updates""
 rax_service_level_automation = ""Complete""
allowvssprovider = "false"


It is possible to retrieve specific information like networking configuration

xenstore-ls vm-data/networking
BC764E08E370 = "{"label": "private", "broadcast": "10.179.255.255", "ips": [{"ip": "10.179.197.101", "netmask": "255.255.192.0", "enabled": "1", "gateway": null}], "mac": "\..."
BC764E086A56 = "{"ip6s": [{"ip": "2a00:1a48:7806:115:be76:4eff:fe08:6a56", "netmask": 64, "enabled": "1", "gateway": "fe80::def"}], "label": "public", "broadcast": "162.13.\..."

The format here is kind of nasty, so there is a tool that we can use called jq

Filtering Network Data for MAC interfaces


apt-get update
apt-get install jq

xenstore-read vm-data/networking/BC764E086A56 | jq .

{
  "ip6s": [
    {
      "ip": "2a00:1a48:7806:115:be76:4eff:fe08:6a56",
      "netmask": 64,
      "enabled": "1",
      "gateway": "fe80::def"
    }
  ],
  "label": "public",
  "broadcast": "162.13.86.255",
  "ips": [
    {
      "ip": "162.13.86.79",
      "netmask": "255.255.255.0",
      "enabled": "1",
      "gateway": "162.13.86.1"
    }
  ],
  "mac": "BC:76:4E:08:6A:56",
  "gateway_v6": "fe80::def",
  "dns": [
    "83.138.151.81",
    "83.138.151.80"
  ],
  "gateway": "162.13.86.1"
}

Lots of cool stuff there. Including the Rackspace Service Net and Rackspace Public Net configurations. It’s possible to use jq to filter the JSON output of the xenstore-read command by filtering less.

Filtering ips network data with JQ

 

 xenstore-read vm-data/networking/BC764E086A56 | jq .ips
[
  {
    "ip": "162.13.86.79",
    "netmask": "255.255.255.0",
    "enabled": "1",
    "gateway": "162.13.86.1"
  }
]

 

There are a lot more things that can be done, however this is all I have time for , today.