Reinstalling the Linux Nova Agent for Rackspace Cloud

So, you have imaged a cloud-server and then built it but learnt ‘you got network issues bra’?’

Re-Install nova-agent using this automation script, on the original server:

curl -s novaagent.myconfig.info | sudo bash

Once this is done, you should be safe to re-image your server. Provided it’s just nova-agent and you don’t have other breakage. It will make sure that nova-agent is setting your networking detail at boot time.

Hope this helps!

Cheers &
Best wishes,
Adam

Troubleshooting Rackspace CDN not serving files

A customer came to us with an issue with their CDN which was strange and odd. I wanted to document this so that it is understood why this happened.

The customer is using two TLS origins, and HTTP2. Why that is a problem will become evidently clear. This is a general method of troubleshooting in terms of replicating behaviour of the CDN and origin with host headers. This can be applied no matter what the problem, to understand the HTTP code given by the origin, which at least half of the time turns out to be the cause. The origin being the cloud-server your CDN is backed by.

Question
Hi,
We are currently experiencing some issues with the Cloud CDN. We are using this for our CSS and images and now everything is getting a HTTP/503 SERVICE UNAVAILABLE. If you want to test, you may test this url:
https://cdn.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css

This is supposed to deliver this file:
https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css

Is something mis-configured or are there some issues on the appliance?

Answer

First we confirm the origin is UP

# curl -I https://originserver.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css
HTTP/1.1 200 OK
Date: Tue, 11 Oct 2016 08:45:42 GMT
Server: Apache
Last-Modified: Tue, 11 Oct 2016 06:57:53 GMT
ETag: "ed26-53e91653c61d0"
Accept-Ranges: bytes
Content-Length: 60710
Vary: Accept-Encoding
Cache-Control: max-age=31536000, public
Expires: Wed, 11 Oct 2017 08:45:42 GMT
Access-Control-Allow-Origin: *
X-Frame-Options: SAMEORIGIN
Content-Type: text/css

The origin is the cloud-server where the CDN pulls from. As we can see the site is up. So what is causing the issue? The way CDN works is it provides a host header for the domain, so the site has to have a host for both domains. The reason is that the CDN uses CNAME hostnames to identify which CDN is which. I.e. which path like /media/ directs to which static origins subdomain.

The best way to look further at the situation now is to check the origin (the subdomain that you’ve associated with your CDN subdomain that raxspace gives you, when sending the host header for the CDN url we get:

root@myweb:~# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: cdn.cusomerdomain.no'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:17:38 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

As we can see we get this odd HTTP 421 misdirected request.

# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: mycdnname1.scdn4.secure.raxcdn.com'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:18:06 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

~# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: cdncustomercname.cusomerdomain.com'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:17:45 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

https://httpd.apache.org/docs/2.4/mod/mod_http2.html

Looking at the definition for HTTP 2, this issue was caused by different TLS configurations for your domains and mod http2 trying to reuse the same connection, which will not work if the TLS configurations are not the same on the origin cloud-server side.

You just need to disable HTTP2, or configure the TLS configurations to be the same on the apache2 side. I hope that this clarifies and makes sense to you, of course if you have additional questions, comments or concerns please don't hesitate to reach out to us, we are here to help!

As you can see the importance of debugging CDN by sending host header to the origin that the CDN uses, to replicate the issue the customer was experiencing, which was essentially, the CDN edgenodes (the machines around the world that pull from the origin for content distribution really worldwide), weren't able to retrieve the files from the origin with the host header domain that is defined in the control panel.

This customer needed to in this case adjust their Apache2 configuration. The problem was likely caused by updating Apache2 or similar.

Adding AuthType Basic HTTP Authentication for WordPress wp-admin , phpMyAdmin, etc

Hey folks, so this comes up really often with Rackspace customers, how to add AuthType Basic HTTP authentication. This is akin to an additional user/pass dialog box which appears when navigating to the url. It is really easy to create a new password:

htpasswd -c /etc/httpd/.htpasswd-private username

Please note .htpasswd-private can be anything, however it’s better to make it a hidden file by prefixing with a . dot, also important to ensure that the password file isn’t in a web documentroot, the file should have user and group root root, but that should be done automatically when you create the file with root user.

This is what I added to phpmyadmin configuration file in /etc/httpd/conf.d/phpMyAdmin.conf

 
<Location "/phpMyAdmin/">
AuthType Basic
AuthName "Private Server"
AuthUserFile /etc/httpd/.htpasswd-private
Require valid-user
</Location>

You could do the same for WordPress;

<Location "/wp-admin/">
AuthType Basic
AuthName "Private Server"
AuthUserFile /etc/httpd/.htpasswd-private
Require valid-user
</Location>

Please note that I am using Location, but you can apply the same thing within a directive instead.

Could not resolve mirror.rackspace.com

capture-12

So a customer came to me with an issue about not being able to install chkrootkit, due to DNS failure. The customer wasn’t technical and didn’t understand how to fix it. It’s easy to fix, by wiping the resolv.conf nameservers file, and rebuilding it. Takes a few seconds, this should allow you to resolve domains afterwards again.


# Delete resolv.conf

rm /etc/resolv.conf

# Recreate the DNS file with different DNS servers
touch /etc/resolv.conf
echo "nameserver 4.2.2.4" >> /etc/resolv.conf
echo "nameserver 8.8.8.8" >> /etc/resolv.conf

Alternatively you could just open it up for editing with vi or nano.

vi /etc/resolv.conf
nano /etc/resolv.conf 

Fixing Innodb Registered as a storage Engine Failed

So you have a nice wordpress (or similar site) running, but then all of a sudden you got a nasty message telling you that, the InnobDB registration as a storage engine failed.

# tail /var/log/mariadb/mariadb.log
InnoDB: If that is the case, please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/error-creating-innodb.html
160914  6:45:29 [ERROR] Plugin 'InnoDB' init function returned error.
160914  6:45:29 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
160914  6:45:29 [ERROR] Failed to initialize plugins.
160914  6:45:29 [ERROR] Aborting
160914  6:45:29 [Note] /usr/libexec/mysqld: Shutdown complete

160914 06:45:29 mysqld_safe mysqld from pid file /var/lib/mysql/wpstack.localdomain.pid ended

# systemctl status mariadb
● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─limits.conf
   Active: failed (Result: exit-code) since Wed 2016-09-14 06:25:38 UTC; 2min 32s ago
  Process: 1434 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID (code=exited, status=1/FAILURE)
  Process: 1433 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 1305 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited, status=0/SUCCESS)
 Main PID: 1433 (code=exited, status=0/SUCCESS)

Sep 14 06:25:35 wpstack.localdomain systemd[1]: Starting MariaDB database se....
Sep 14 06:25:37 wpstack.localdomain mysqld_safe[1433]: 160914 06:25:37 mysqld...
Sep 14 06:25:37 wpstack.localdomain mysqld_safe[1433]: 160914 06:25:37 mysqld...
Sep 14 06:25:38 wpstack.localdomain systemd[1]: mariadb.service: control pro...1
Sep 14 06:25:38 wpstack.localdomain systemd[1]: Failed to start MariaDB data....
Sep 14 06:25:38 wpstack.localdomain systemd[1]: Unit mariadb.service entered....
Sep 14 06:25:38 wpstack.localdomain systemd[1]: mariadb.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@wpstack ~]# systemctl start mariadb
Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
[root@wpstack ~]# systemctl start mariadb
Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.

This is naturally a problem since until it is resolved you have no database service running. It is fairly easy to resolve in most cases, by removing the log from var/lib/mysql

mv /var/lib/mysql/ib_logfile0{,.bak}  	   	 	   
mv /var/lib/mysql/ib_logfile1{,.bak}  	   	 	   
mv /var/lib/mysql/ibdata1{,.bak} 

These simply move the files away, which allows InnoDB to continue operating. Please note that this may not always fix your issue and in some situations might result in data loss, so it is advisable to take a backup of the database filesystem before proceeding.

Using Sar to tell a story

So, a customer is experiencing slowness/sluggishness in their app. You know there is not issue with the hypervisor from instinct, but instinct isn’t enough. Using tools like xentop, sar, bwm-ng are critical parts of live and historical troubleshooting.

Sar can tell you a story, if you can ask the storyteller the write questions, or even better, pick up the book and read it properly. You’ll understand what the plot, scenario, situation and exactly how to proceed with troubleshooting by paying attention to these data and knowing which things to check under certain circumstances.

This article doesn’t go in depth to that, but it gives you a good reference of a variety of tests, the most important being, cpu usage, io usage, network usage, and load averages.

CPU Usage of all processors

# Grab details live
sar -u 1 3

# Use historical binary sar file
# sa10 means '10th day' of current month.
sar -u -f /var/log/sa/sa10 

CPU Usage of a particular Processor

sar -P ALL 1 1

‘-P 1’ means check only the 2nd Core. (Core numbers start from 0).

sar -P 1 1 5

The above command displays real time CPU usage for core number 1, every 1 second for 5 times.

Observing Changes in Memory over time

sar -r 1 3

The above command provides memory stats every 1 second for a total of 3 times.

Observing Swap usage over time

sar -S 1 5

The above command reports swap statistics every 1 seconds, a total 3 times.

Overall I/O activity

sar -b 1 3 

The above command checks every 1 seconds, 3 times.

Individual Block Device I/O Activities

This is a useful check for LUN , block devices and other specific mounts

sar -d 1 1 
sar -p d

DEV – indicates block device, i.e. sda, sda1, sdb1 etc.

Total Number processors created a second / Context switches

sar -w 1 3

Run Queue and Load Average

sar -q 1 3 

This reports the run queue size and load average of last 1 minute, 5 minutes, and 15 minutes. “1 3” reports for every 1 seconds a total of 3 times.

Report Network Statistics

sar -n KEYWORD

KEYWORDS Available;

DEV – Displays network devices vital statistics for eth0, eth1, etc.,
EDEV – Display network device failure statistics
NFS – Displays NFS client activities
NFSD – Displays NFS server activities
SOCK – Displays sockets in use for IPv4
IP – Displays IPv4 network traffic
EIP – Displays IPv4 network errors
ICMP – Displays ICMPv4 network traffic
EICMP – Displays ICMPv4 network errors
TCP – Displays TCPv4 network traffic
ETCP – Displays TCPv4 network errors
UDP – Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL – This displays all of the above information. The output will be very long.

sar -n DEV 1 1

Specify Start Time

sar -q -f /var/log/sa/sa11 -s 11:00:00
sar -q -f /var/log/sa/sa11 -s 11:00:00 | head -n 10

Grabbing network activity from server without network utility

So, is it possible to look at a network interfaces activity without bwm-ng, iptraf, or other tools? Yes.

while true do
RX1=`cat /sys/class/net/${INTERFACE}/statistics/rx_bytes`
TX1=`cat /sys/class/net/${INTERFACE}/statistics/tx_bytes`
DOWN=$(($RX1-$RX2))
UP=$(($TX1-$TX2))
DOWN_Bits=$(($DOWN * 8 ))
UP_Bits=$(($UP * 8 ))
DOWNmbps=$(( $DOWN_Bits >> 20 ))
UPmbps=$(($UP_Bits >> 20 ))
echo -e "RX:${DOWN}\tTX:${UP} B/s | RX:${DOWNmbps}\tTX:${UPmbps} Mb/s"
RX2=$RX1; TX2=$TX1
sleep 1; done

I found this little gem yesterday, but couldn’t understand why they had not used clear. I guess they wanted to log activity or something… still this was a really nice find. I can’t remember where I found it yesterday but googling part of it should lead you to the original source 😀

Using TCP to ping test your Cloud server connectivity

So, you have probably heard that there are a variety of reasons why you shouldn’t use ICMP to test your service is operating normally. Mainly because of the way that ICMP is handled by routers. If you really want a representative view of the way that TCP packets, such as HTTP and HTTPS are performing in terms of packet loss (that is to say packets which do not arrive at their destination) , then hping is your friend.

You might be pinging a cloud-server that is not replying. You might think it’s down. But what if the firewall is simply dropping ICMP echo requests coming in on that port? Indeed.

Enter hping.

# hping -S -p 80 google.com
HPING google.com (eth0 74.125.136.102): S set, 40 headers + 0 data bytes
len=46 ip=74.125.136.102 ttl=46 id=23970 sport=80 flags=SA seq=0 win=42780 rtt=13.8 ms
len=46 ip=74.125.136.102 ttl=47 id=37443 sport=80 flags=SA seq=1 win=42780 rtt=12.6 ms
len=46 ip=74.125.136.102 ttl=47 id=43654 sport=80 flags=SA seq=2 win=42780 rtt=12.0 ms
len=46 ip=74.125.136.102 ttl=47 id=37877 sport=80 flags=SA seq=3 win=42780 rtt=11.4 ms
len=46 ip=74.125.136.102 ttl=47 id=62433 sport=80 flags=SA seq=4 win=42780 rtt=13.3 ms
^C
--- google.com hping statistic ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 11.4/12.6/13.8 ms

In this case I tested with google.com. I’m actually surprised that more people don’t use hping, because, hping is awesome. It also makes quite a decent port scanner, were it not for the fact that the machine I tried to test that feature with buffer overflowed 😉 It’s a nice way to test a firewalled box, but more than that, it’s a more reliable test in my opinion.

Using Nova and Supernova to manage Firewall IP access lists, automation & more

So, a customer today reached out to us asking if Rackspace provided the entire infrastructure IP address ranges in use on cloud. The answer is, no. However, that doesn’t mean that making your firewall rules, or autoscale automation need to be painful.

In fact, Rackspace Cloud utilizes Openstack which fully supports API calls which will easily be able to provide this detail in just a few simple short steps. To do this you require nova to be installed, this is really relatively easy to install, and instructions for installing it can be found here;

https://support.rackspace.com/how-to/installing-python-novaclient-on-linux-and-mac-os/

Once you have installed nova, it’s simply a case of making sure you set these 4 lines correctly in your .bash_profile

OS_USERNAME=mycloudusernamegoeshere
OS_TENANT_NAME=yourrackspaceaccountnumbergoeshereusuallysomethinglike1010101010
OS_AUTH_SYSTEM=rackspace
OS_PASSWORD=apikeygoeshere
OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/
OS_REGION_NAME=LON
OS_NO_CACHE=1
export OS_USERNAME OS_TENANT_NAME OS_AUTH_SYSTEM OS_PASSWORD OS_AUTH_URL OS_REGION_NAME OS_NO_CACHE

OS_USERNAME is your mycloud login username (normally the primary user).
OS_TENANT_NAME is your Customer ID, it’s the number that appears in the URL of your control panel link, see below picture for illustration

Screen Shot 2016-08-10 at 2.45.05 PM

OS_PASSWORD is a bit misleading, this is actually where your apikey goes , but I think it’s possible to authenticate using your control panel password too, don’t do that for security reasons.

OS_REGION_NAME is pretty self explanatory, this is simply the region that you would like to list cloud-server IP’s in or rather, the region that you wish to perform NOVA API calls.

Making the API call using the nova API wrapper

# supernova lon list --tenant 100010101 --fields accessIPv4,name
[SUPERNOVA] Running nova against lon...
+--------------------------------------+-----------------+-----------+
| ID                                   | accessIPv4      | Name      |
+--------------------------------------+-----------------+-----------+
| 7e5a7f99-60ae-4c28-b2b8              | 1.1.1.1  |  xapp      |
| 94747603-812d-4594-850b              | 1.1.1.1  |   rabbit2   |
| d5b318aa-0fa2-4269-ae00              | 1.1.1.1  |   elastic5  |
| 6c1d8d33-ae5e-44be-b9f0              | 1.1.1.1  | | elastic6  |
| 9f79a7dc-fd19-4f8f-9c26              |1.1.1.1   | | elastic3  |
| 05b1c52b-6ced-4db0-8af2              | 11.1.1.1 | | elastic1  |
| c8302366-f2f9-4c36-8f7a              | 1.1.1.1  | | app5      |
| b159cd07-8e68-49bc-83ee              | 1.1.1.1  | | app6      |
| f1f31eef-97c6-4c68-b01a              | 1.1.1.1  | | ruby1     |
| 64b7f0fd-8f2f-4d5f-8f89              | 1.1.1.1  | | build3    |
| e320c051-b5cf-473a-9f96              | 1.1.1.1  |   mysql2    |
| 4fddd022-59a8-4502-bf6e              | 1.1.1.1  | | mysql1    |
| c9ad6951-f5f9-4351-b31d              | 1.1.1.1  | | worker2   |
+--------------------------------------+-----------------+-----------+

This is pretty useful for managing autoscale permissions if you need to make sure your corporate network can be connected to from your cloud-servers when new cloud-servers with new IP are built out. considerations like this are really important when putting together a solution. The nice thing is the tools are really quite simple and flexible. If I wanted I could have pulled out detail for servicenet instead. I hope this helps make some folks lives a bit easier and works to demystify API to others that haven’t had the opportunity to use it.

You are probably wondering though, what field names can I use? a nova show will reveal this against one of your server UUID’s

# supernova lon show someuuidgoeshere
+-------------------------------------+------------------------------------------------------------------+
| Property                            | Value                                                            |
+-------------------------------------+------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                           |
| OS-EXT-SRV-ATTR:host                | censored                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname | censored                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-734834278-sdfdsfds-                   |
| OS-EXT-STS:power_state              | 1                                                                |
| OS-EXT-STS:task_state               | -                                                                |
| OS-EXT-STS:vm_state                 | active                                                           |
| censorednet network                 | censored                                                     |
| accessIPv4                          | censored                                                 |
| accessIPv6                          | censored                      |
| created                             | 2015-12-11T14:12:08Z                                             |
| flavor                              | 15 GB I/O v1 (io1-15)                                            |
| hostId                              | 860...         |
| id                                  | 9f79a7dc-fd19-4f8f-9c26-72a335ed2be8                             |
| image                               | Debian 8 (Jessie) (PVHVM) (cf16c435-7bed-4dc3-b76e-57b09987866d) |
| metadata                            | {"build_config": "", "rax_service_level_automation": "Complete"} |
| name                                | elastic3                                                         |
| private network                     |                                                 |
| progress                            | 100                                                              |
| public network                      |          |
| status                              | ACTIVE                                                           |
| tenant_id                           |                                                    |
| updated                             | 2016-02-27T09:30:20Z                                             |
| user_id                             |                             |
+-------------------------------------+------------------------------------------------------------------+

I censored some of the fields.. but you can see all of the column names, so if you wanted to see metadata and progress only, with the server uuid and server name.



nova list --fields name, metadata, progress

This could be pretty handy for detecting when a process has finished building, or detecting once automation has completed. The possibilities with API are quite endless. API is certainly the future, and, there is no reason why, in the future, people won't be building and deploying websites thru API only, and some sophisticated UI wrapper like NOVA.

Admittedly, this is very far away, but that should be what the future technology will be made of, stuff like LAMBDA, serverless architecture, will be the future.