Testing Rackspace Cloud-server Service-net Connectivity and creating an alarm

So, the last few weeks my colleagues and myself have been noticing that there has been a couple of issues with the cloud-servers servicenet interface. Unfortunately for some customers utilizing dbaas instances this means that their cloud-server will be unable to communicate, often, with their database backend.

The solution is a custom monitoring script that my colleague Marcin has kindly put together for another customer of his own.

The python script that goes on the server:

Create file:

vi /usr/lib/rackspace-monitoring-agent/plugins/servicenet.sh

Paste into file:

#!/bin/bash
#
ping="/usr/bin/ping -W 1 -c 1 -I eth1 -q"

if [ -z $1 ];then
   echo -e "status CRITICAL\nmetric ping_check uint32 1"
   exit 1
else
   $ping $1 &>/dev/null
   if [ "$?" -eq 0 ]; then
      echo -e "status OK\nmetric ping_check uint32 0"
      exit 0
   else
      echo -e "status CRITICAL\nmetric ping_check uint32 1"
      exit 1
   fi
fi

Create an alarm that utilizes the below metric

if (metric["ping_check"] == 1) {
    return new AlarmStatus(CRITICAL, 'what?');
}
if (metric["ping_check"] == 0) {
    return new AlarmStatus(OK, 'eee?');
}

Of course for this to work the primary requirement is a Rackspace Cloud-server and an installation of Rackspace Cloud Monitoring installed on the server already.

Thanks again Marcin, for this golden nugget.

Uploading and Mounting ISO for Windows Cloud-server

So a customer came with the question today on the process of uploading an ISO to a cloud-server. It’s a relatively easy process. RDP allows you to use resource sharing, that effectively mounts on the server you RDP to, your remote filesystem of your choice. This is the manner in which you can upload the file easily.

Once you have the ISO on the server you simply need to mount the disk with a software to present it as an ordinary CD/DVD disk. I personally use Elaborate Bytes’ Virtual Clone Drive.

Copying Resources between client and server using RDP resource sharing:
https://support.microsoft.com/en-us/kb/313292

Mounting CD/DVD ISO/CUE/BIN files in Windows
https://www.elby.ch/en/products/vcd.html

Yeah, I know it’s simple to us, but other people might find this useful. Exact same question came up today 😀

Using Sar to tell a story

So, a customer is experiencing slowness/sluggishness in their app. You know there is not issue with the hypervisor from instinct, but instinct isn’t enough. Using tools like xentop, sar, bwm-ng are critical parts of live and historical troubleshooting.

Sar can tell you a story, if you can ask the storyteller the write questions, or even better, pick up the book and read it properly. You’ll understand what the plot, scenario, situation and exactly how to proceed with troubleshooting by paying attention to these data and knowing which things to check under certain circumstances.

This article doesn’t go in depth to that, but it gives you a good reference of a variety of tests, the most important being, cpu usage, io usage, network usage, and load averages.

CPU Usage of all processors

# Grab details live
sar -u 1 3

# Use historical binary sar file
# sa10 means '10th day' of current month.
sar -u -f /var/log/sa/sa10 

CPU Usage of a particular Processor

sar -P ALL 1 1

‘-P 1’ means check only the 2nd Core. (Core numbers start from 0).

sar -P 1 1 5

The above command displays real time CPU usage for core number 1, every 1 second for 5 times.

Observing Changes in Memory over time

sar -r 1 3

The above command provides memory stats every 1 second for a total of 3 times.

Observing Swap usage over time

sar -S 1 5

The above command reports swap statistics every 1 seconds, a total 3 times.

Overall I/O activity

sar -b 1 3 

The above command checks every 1 seconds, 3 times.

Individual Block Device I/O Activities

This is a useful check for LUN , block devices and other specific mounts

sar -d 1 1 
sar -p d

DEV – indicates block device, i.e. sda, sda1, sdb1 etc.

Total Number processors created a second / Context switches

sar -w 1 3

Run Queue and Load Average

sar -q 1 3 

This reports the run queue size and load average of last 1 minute, 5 minutes, and 15 minutes. “1 3” reports for every 1 seconds a total of 3 times.

Report Network Statistics

sar -n KEYWORD

KEYWORDS Available;

DEV – Displays network devices vital statistics for eth0, eth1, etc.,
EDEV – Display network device failure statistics
NFS – Displays NFS client activities
NFSD – Displays NFS server activities
SOCK – Displays sockets in use for IPv4
IP – Displays IPv4 network traffic
EIP – Displays IPv4 network errors
ICMP – Displays ICMPv4 network traffic
EICMP – Displays ICMPv4 network errors
TCP – Displays TCPv4 network traffic
ETCP – Displays TCPv4 network errors
UDP – Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL – This displays all of the above information. The output will be very long.

sar -n DEV 1 1

Specify Start Time

sar -q -f /var/log/sa/sa11 -s 11:00:00
sar -q -f /var/log/sa/sa11 -s 11:00:00 | head -n 10

Grabbing network activity from server without network utility

So, is it possible to look at a network interfaces activity without bwm-ng, iptraf, or other tools? Yes.

while true do
RX1=`cat /sys/class/net/${INTERFACE}/statistics/rx_bytes`
TX1=`cat /sys/class/net/${INTERFACE}/statistics/tx_bytes`
DOWN=$(($RX1-$RX2))
UP=$(($TX1-$TX2))
DOWN_Bits=$(($DOWN * 8 ))
UP_Bits=$(($UP * 8 ))
DOWNmbps=$(( $DOWN_Bits >> 20 ))
UPmbps=$(($UP_Bits >> 20 ))
echo -e "RX:${DOWN}\tTX:${UP} B/s | RX:${DOWNmbps}\tTX:${UPmbps} Mb/s"
RX2=$RX1; TX2=$TX1
sleep 1; done

I found this little gem yesterday, but couldn’t understand why they had not used clear. I guess they wanted to log activity or something… still this was a really nice find. I can’t remember where I found it yesterday but googling part of it should lead you to the original source 😀

Using TCP to ping test your Cloud server connectivity

So, you have probably heard that there are a variety of reasons why you shouldn’t use ICMP to test your service is operating normally. Mainly because of the way that ICMP is handled by routers. If you really want a representative view of the way that TCP packets, such as HTTP and HTTPS are performing in terms of packet loss (that is to say packets which do not arrive at their destination) , then hping is your friend.

You might be pinging a cloud-server that is not replying. You might think it’s down. But what if the firewall is simply dropping ICMP echo requests coming in on that port? Indeed.

Enter hping.

# hping -S -p 80 google.com
HPING google.com (eth0 74.125.136.102): S set, 40 headers + 0 data bytes
len=46 ip=74.125.136.102 ttl=46 id=23970 sport=80 flags=SA seq=0 win=42780 rtt=13.8 ms
len=46 ip=74.125.136.102 ttl=47 id=37443 sport=80 flags=SA seq=1 win=42780 rtt=12.6 ms
len=46 ip=74.125.136.102 ttl=47 id=43654 sport=80 flags=SA seq=2 win=42780 rtt=12.0 ms
len=46 ip=74.125.136.102 ttl=47 id=37877 sport=80 flags=SA seq=3 win=42780 rtt=11.4 ms
len=46 ip=74.125.136.102 ttl=47 id=62433 sport=80 flags=SA seq=4 win=42780 rtt=13.3 ms
^C
--- google.com hping statistic ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 11.4/12.6/13.8 ms

In this case I tested with google.com. I’m actually surprised that more people don’t use hping, because, hping is awesome. It also makes quite a decent port scanner, were it not for the fact that the machine I tried to test that feature with buffer overflowed 😉 It’s a nice way to test a firewalled box, but more than that, it’s a more reliable test in my opinion.

Using Nova and Supernova to manage Firewall IP access lists, automation & more

So, a customer today reached out to us asking if Rackspace provided the entire infrastructure IP address ranges in use on cloud. The answer is, no. However, that doesn’t mean that making your firewall rules, or autoscale automation need to be painful.

In fact, Rackspace Cloud utilizes Openstack which fully supports API calls which will easily be able to provide this detail in just a few simple short steps. To do this you require nova to be installed, this is really relatively easy to install, and instructions for installing it can be found here;

https://support.rackspace.com/how-to/installing-python-novaclient-on-linux-and-mac-os/

Once you have installed nova, it’s simply a case of making sure you set these 4 lines correctly in your .bash_profile

OS_USERNAME=mycloudusernamegoeshere
OS_TENANT_NAME=yourrackspaceaccountnumbergoeshereusuallysomethinglike1010101010
OS_AUTH_SYSTEM=rackspace
OS_PASSWORD=apikeygoeshere
OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
OS_REGION_NAME=LON
OS_NO_CACHE=1
export OS_USERNAME OS_TENANT_NAME OS_AUTH_SYSTEM OS_PASSWORD OS_AUTH_URL OS_REGION_NAME OS_NO_CACHE

OS_USERNAME is your mycloud login username (normally the primary user).
OS_TENANT_NAME is your Customer ID, it’s the number that appears in the URL of your control panel link, see below picture for illustration

Screen Shot 2016-08-10 at 2.45.05 PM

OS_PASSWORD is a bit misleading, this is actually where your apikey goes , but I think it’s possible to authenticate using your control panel password too, don’t do that for security reasons.

OS_REGION_NAME is pretty self explanatory, this is simply the region that you would like to list cloud-server IP’s in or rather, the region that you wish to perform NOVA API calls.

Making the API call using the nova API wrapper

# supernova lon list --tenant 100010101 --fields accessIPv4,name
[SUPERNOVA] Running nova against lon...
+--------------------------------------+-----------------+-----------+
| ID                                   | accessIPv4      | Name      |
+--------------------------------------+-----------------+-----------+
| 7e5a7f99-60ae-4c28-b2b8              | 1.1.1.1  |  xapp      |
| 94747603-812d-4594-850b              | 1.1.1.1  |   rabbit2   |
| d5b318aa-0fa2-4269-ae00              | 1.1.1.1  |   elastic5  |
| 6c1d8d33-ae5e-44be-b9f0              | 1.1.1.1  | | elastic6  |
| 9f79a7dc-fd19-4f8f-9c26              |1.1.1.1   | | elastic3  |
| 05b1c52b-6ced-4db0-8af2              | 11.1.1.1 | | elastic1  |
| c8302366-f2f9-4c36-8f7a              | 1.1.1.1  | | app5      |
| b159cd07-8e68-49bc-83ee              | 1.1.1.1  | | app6      |
| f1f31eef-97c6-4c68-b01a              | 1.1.1.1  | | ruby1     |
| 64b7f0fd-8f2f-4d5f-8f89              | 1.1.1.1  | | build3    |
| e320c051-b5cf-473a-9f96              | 1.1.1.1  |   mysql2    |
| 4fddd022-59a8-4502-bf6e              | 1.1.1.1  | | mysql1    |
| c9ad6951-f5f9-4351-b31d              | 1.1.1.1  | | worker2   |
+--------------------------------------+-----------------+-----------+

This is pretty useful for managing autoscale permissions if you need to make sure your corporate network can be connected to from your cloud-servers when new cloud-servers with new IP are built out. considerations like this are really important when putting together a solution. The nice thing is the tools are really quite simple and flexible. If I wanted I could have pulled out detail for servicenet instead. I hope this helps make some folks lives a bit easier and works to demystify API to others that haven’t had the opportunity to use it.

You are probably wondering though, what field names can I use? a nova show will reveal this against one of your server UUID’s

# supernova lon show someuuidgoeshere
+-------------------------------------+------------------------------------------------------------------+
| Property                            | Value                                                            |
+-------------------------------------+------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                           |
| OS-EXT-SRV-ATTR:host                | censored                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | censored                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-734834278-sdfdsfds-                   |
| OS-EXT-STS:power_state              | 1                                                                |
| OS-EXT-STS:task_state               | -                                                                |
| OS-EXT-STS:vm_state                 | active                                                           |
| censorednet network                 | censored                                                     |
| accessIPv4                          | censored                                                 |
| accessIPv6                          | censored                      |
| created                             | 2015-12-11T14:12:08Z                                             |
| flavor                              | 15 GB I/O v1 (io1-15)                                            |
| hostId                              | 860...         |
| id                                  | 9f79a7dc-fd19-4f8f-9c26-72a335ed2be8                             |
| image                               | Debian 8 (Jessie) (PVHVM) (cf16c435-7bed-4dc3-b76e-57b09987866d) |
| metadata                            | {"build_config": "", "rax_service_level_automation": "Complete"} |
| name                                | elastic3                                                         |
| private network                     |                                                 |
| progress                            | 100                                                              |
| public network                      |          |
| status                              | ACTIVE                                                           |
| tenant_id                           |                                                    |
| updated                             | 2016-02-27T09:30:20Z                                             |
| user_id                             |                             |
+-------------------------------------+------------------------------------------------------------------+

I censored some of the fields.. but you can see all of the column names, so if you wanted to see metadata and progress only, with the server uuid and server name.



nova list --fields name, metadata, progress

This could be pretty handy for detecting when a process has finished building, or detecting once automation has completed. The possibilities with API are quite endless. API is certainly the future, and, there is no reason why, in the future, people won't be building and deploying websites thru API only, and some sophisticated UI wrapper like NOVA.

Admittedly, this is very far away, but that should be what the future technology will be made of, stuff like LAMBDA, serverless architecture, will be the future.

Adding some excludes for Lsyncd

A customer was having some issues with their syncing, as was shown by their inotify

Error: Terminating since out of inotify watches.
Consider increasing /proc/sys/fs/inotify/max_user_watches

Fix was quite simple, to remove other folders from sync that aren’t necessary.

Adding this line to the /etc/lsyncd.conf

excludeFrom="/etc/lsyncd-excludes.txt",

And creating the ‘excludes’ file for LsyncD, i.e. what folders you want to ignore, in this case we wanted to ignore old httpdocs.OLD backup.

# cat /etc/lsyncd-excludes.txt
somewebsite.com/httpdocs.OLD/

A shockingly simple fix.

Please note that the path in lsyncd-excludes.txt is determined by the path in lsyncd. (do not give full path, give relative path inside the parent). It was a simple fix.

Resizing a SD or USB card partition for Retropie Raspberry Pi Arcade on Linux

So, I had a friend who had recently bought his Raspberry Pi 3 and wanted to run retropie on it like I have been with my arcade cabinet.

13682010_1735545403352472_192137128_o

The problem was the sandisk 64GB disk he had bought had some few less sectors on the disk, which meant my image was just a few bytes too big. What a bummer!

13599828_10153775400798481_3462834024739541253_n

So I used this great tool by sirlagz to fix that.

#!/bin/bash
# Automatic Image file resizer
# Written by SirLagz
strImgFile=$1


if [[ ! $(whoami) =~ "root" ]]; then
echo ""
echo "**********************************"
echo "*** This should be run as root ***"
echo "**********************************"
echo ""
exit
fi

if [[ -z $1 ]]; then
echo "Usage: ./autosizer.sh "
exit
fi

if [[ ! -e $1 || ! $(file $1) =~ "x86" ]]; then
echo "Error : Not an image file, or file doesn't exist"
exit
fi

partinfo=`parted -m $1 unit B print`
partnumber=`echo "$partinfo" | grep ext4 | awk -F: ' { print $1 } '`
partstart=`echo "$partinfo" | grep ext4 | awk -F: ' { print substr($2,0,length($2)-1) } '`
loopback=`losetup -f --show -o $partstart $1`
e2fsck -f $loopback
minsize=`resize2fs -P $loopback | awk -F': ' ' { print $2 } '`
minsize=`echo $minsize+1000 | bc`
resize2fs -p $loopback $minsize
sleep 1
losetup -d $loopback
partnewsize=`echo "$minsize * 4096" | bc`
newpartend=`echo "$partstart + $partnewsize" | bc`
part1=`parted $1 rm 2`
part2=`parted $1 unit B mkpart primary $partstart $newpartend`
endresult=`parted -m $1 unit B print free | tail -1 | awk -F: ' { print substr($2,0,length($2)-1) } '`
truncate -s $endresult $1

It was a nice solution to my friends problem… the only problem now is the working image for my pi is not working with audio for him, and for some reason when he comes out of hte game and goes back to emulation station he loses the joystick input controller. That is kind of bizarre.

Does anyone know what could cause those secondary issues? I’m a bit stumped on this one.

Preventing /etc/resolv.conf reset on startup/boot

Today, a customer approached us after a Host Server Down complaining that, although the server is up again their website and application were down & not working. Even though the server was online and functioning correctly.

The customer discovered that the source of the issue was that there /etc/resolv.conf is blank, this means that they will not be able to resolve DNS A/PTR/CNAME record hostnames into a resolved IP. This is called hostname to IP resolution. Its means that if /etc/resolv.conf is blank and the customer uses hostnames in their calls, such a failure will break the connectivity due to failure to resolve to IP to communicate on the TCP stack.

There is actually a very simple way to prevent the /etc/resolv.conf file from being changed. But first, it’s important to understand why /etc/resolv.conf is being reset.

On All Rackspace cloud-servers there is a process called nova-agent, and when the server starts up, the /etc/resolv.conf file will be reset along with the networking configuration. This happens each time your server is restarted and is used to set new networking details, specifically if you take an image and build server on a new ip address or if your server is live-migrated to a new host, it makes sure on the next reboot it comes up with correct networking detail transparently. However this can cause some issues, such as in this case with the /etc/resolv.conf file. Fortunately there are some novel ways of preventing your /etc/resolv.conf being modified after you have added the correct nameservers you desire to it.

You can use the chattr immutable file setting to stop processes from modifying it after you have made the changes to your resolv.conf that are desired;

Set Immutable File

chattr +i /etc/resolv.conf 

Un-set Immutable File

chattr -i /etc/resolv.conf 

This /etc/resolv.conf issue is a common problem, however using immutable file flag and chattr should prevent it from being changed ever again.