Retrieving Process List from Rackspace Cloud Server Monitoring

It is possible for you to use the Rackspace API to retrieve the Running Process List of a Cloud-server on your Rackspace account which has the rackspace cloud-server monitoring agent installed.

TOKEN=mytokengoeshere
server_uuid_here=serveruuidgoeshere
customeridhere=customeridtenantnumberhere

 curl -s -H "Content-Type: application/json" -H "X-Auth-Token: $token" https://monitoring.api.rackspacecloud.com/v1.0/$customeridhere/agents/$server_uuid_here/host_info/processes

Not that difficult to do, really !

Enabling MySQL Slow Query Logs

In the case your seeing very long pageload times, and have checked your application. It always pays to check the way in which the database performs when interacting with the application, especially if they are on either same or seperate server, as these significantly affect the way that your application will run.


mysql> SET GLOBAL slow_query_log = 'ON' ;
Query OK, 0 rows affected (0.01 sec)

mysql> SET GLOBAL slow_query_log_file = '/slow_query_logs/slow_query_logs.txt';
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)

PHP5 newrelic agent not collecting data

So today I had a newrelic customer who was having issues after installing the newrelic php plugin. He couldn’t understand why it wasn’t collecting data. For it to collecting data you need to make sure newrelic-daemon process is running by using ps auxfwww | grep newrelic-daemon.

We check the process of the daemon is running

[root@rtd-production-1 ~]# ps -ef | grep newrelic-daemon
root     26007 18914  0 09:59 pts/0    00:00:00 grep newrelic-daemon

We check the status of the daemon process

[root@rtd-production-1 ~]# service newrelic-daemon status
newrelic-daemon is stopped...

Copy basic NewRelic configuration template to correct location

[root@rtd-production-1 ~]# cp /etc/newrelic/newrelic.cfg.template /etc/newrelic/newrelic.cfg

Start the daemon

[root@rtd-production-1 ~]# service newrelic-daemon start
Starting newrelic-daemon:                                  [  OK  ]

Count number of IP’s over a given time period in Apache Log

So, a customer had an outage, and wasn’t sure what caused it. It looked like some IP’s were hammering the site, so I wrote this quite one liner just to sort the IP’s numerically, so that uniq -c can count the duplicate requests, this way we can count exactly how many times a given IP makes a request in any given minute or hour:

Any given minute

# grep '24/Feb/2017:10:03' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any given hour

# grep '24/Feb/2017:10:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given day

# grep '24/Feb/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given Month

# grep '/Feb/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any Given Year

# grep '/2017:' /var/www/html/website.com/access.log | awk '{print $1}' | sort -k2nr | uniq -c

Any given year might cause dupes though, and I’m sure there is a better way of doing that which is more specific

Checking Webserver Logs and generating hit counts

So, I’ve been meaning to do this for a while. I’m sure your all familiar with this crappy oneliner where you’ll check the hits using wc -l against a webserver log for a given hour and use for i in seq or similar to get the last 10 minutes data, or the last 59 minutes. But getting hours and days is a bit harder. You need some additional nesting, and, it isn’t difficult for you to do at all!

for i in 0{1..9}; do echo "16/Jan/2017:06:$i"; grep "16/Jan/2017:06:$i" /var/log/httpd/soemsite.com-access_log | wc -l ; done

I improved this one, quite significantly by adding an additional for j, for the hour, and adding an additional 0 in front of {1..9}, this properly is matching the Apache2 log format and allows me to increment through all the hours of the day. to present ;-D All that is missing is some error checking when the last date in the file is, im thinking a tail and grep for the timecode from the log should be sufficient.

Here is the proud oneliner I made for this!

for j in 0{1..9}; do for i in 0{1..9} {10..59}; do echo "16/Jan/2017:$j:$i"; grep "16/Jan/2017:06:$i" /var/log/httpd/website.com-access_log | wc -l ; done; done

Tracing Down Network and Process Traffic Using Netfilter

Every now and then at Rackspace, as with any hosting provider. We do occasionally have issues where customers have left themselves open to attack. In such cases sometimes customers find their server is sending spam email, and is prone to other malware occurring on the Rackspace Network.

Due to AUP and other obligations, it can become a critical issue for both the uptime, and reputation of your site. In many cases, customers do not necessarily have forensic experience, and will struggle very hard to remove the malware. In some cases, the malware keeps on coming back, or, like in my customers case, you could see lots of extra network traffic still using tcpdump locally on the box.

Enter, netfilter, part of the Linux Kernel, and it is able, if you ask it, to track down where packets are coming from, on a process level. This is really handy if you have an active malware or spam process on your system, since you can find out exactly where it is, before doing more investigation. Such a method, also allows you to trace down any potential false positives, since the packet address is always included, you get a really nice overview.

To give you an idea, I needed to install a kernel with debuginfo, just to do this troubleshooting, however this depends on your distribution.

Updating your Kernel may be necessary to use netfilter debug

$yum history info 18

Transaction performed with:
    Installed     rpm-4.11.3-21.el7.x86_64                               @base
    Installed     yum-3.4.3-150.el7.centos.noarch                        @base
    Installed     yum-plugin-auto-update-debug-info-1.1.31-40.el7.noarch @base
    Installed     yum-plugin-fastestmirror-1.1.31-40.el7.noarch          @base
Packages Altered:
    Updated kernel-debuginfo-4.4.40-202.el7.centos.x86_64               @base-debuginfo
    Update                   4.4.42-202.el7.centos.x86_64               @base-debuginfo
    Updated kernel-debuginfo-common-x86_64-4.4.40-202.el7.centos.x86_64 @base-debuginfo
    Update                                 4.4.42-202.el7.centos.x86_64 @base-debuginfo

You could use a similar process using netfilter.ip.local_in, I suspect.

The Script

#! /usr/bin/env stap

# Print a trace of threads sending IP packets (UDP or TCP) to a given
# destination port and/or address.  Default is unfiltered.

global the_dport = 0    # override with -G the_dport=53
global the_daddr = ""   # override with -G the_daddr=127.0.0.1

probe netfilter.ip.local_out {
    if ((the_dport == 0 || the_dport == dport) &&
        (the_daddr == "" || the_daddr == daddr))
            printf("%s[%d] sent packet to %s:%d\n", execname(), tid(), daddr, dport)
}

Executing the Script

[root@pirax-test-new hacked]# chmod +x dns_probe.sh
[root@pirax-test-new hacked]# ./dns_probe.sh
Missing separate debuginfos, use: debuginfo-install kernel-3.10.0-514.2.2.el7.x86_64
swapper/3[0] sent packet to 78.136.44.6:0
sshd[25421] sent packet to 134.1.1.1:55336
sshd[25421] sent packet to 134.1.1.1:55336
swapper/3[0] sent packet to 78.136.44.6:0

I was a little bit concerned about the above output, it looks like swapper with pid 3, is doing something it wouldn’t normally do. Upon further inspection though, we find it is just the outgoing cloud monitoring call;

# nslookup 78.136.44.6
Server:		83.138.151.81
Address:	83.138.151.81#53

Non-authoritative answer:
6.44.136.78.in-addr.arpa	name = collector-lon-78-136-44-6.monitoring.rackspacecloud.com.

Authoritative answers can be found from:

A Unique Situation for grep (finding the files with content matching a specific pattern Linux)

This article explains how to find all the files that have a specific text or pattern within them, this is the article you’ve been looking for!

So today, I was dealing with a customers server where he had tried to configure BASIC AUTH. I’d found the httpd.conf file for the specific site, but I couldn’t see which file had basic auth setup as wrong. To save me looking through hundreds of configurations (and also to save YOU from looking through hundreds of configuration files) for this specific pattern. Why not use grep to recursively search files for the pattern, and why not use -n to give the filename and line number of files which have text in that match this pattern.

I really enjoyed this oneliner, and been meaning to work to put something like this together, because this kind of issue comes up a lot, and this can save a lot of time!

 grep -rnw '/' -e "PermitRootLogin"

# OUTPUT looks like

/usr/share/vim/vim74/syntax/sshdconfig.vim:157:syn keyword sshdconfigKeyword PermitRootLogin
/usr/share/doc/openssh-5.3p1/README.platform:37:instead the PermitRootLogin setting in sshd_config is used.

The above searches recursively all files in the root filesystem ‘/’ looking for PermitRootLogin.

I wanted to find which .htaccess file was responsible so I ran;

# grep -rnw '/' -e "/path/to/.htpasswd'

# OUTPUT looks like
/var/www/vhosts/somesite.com/.htaccess:14:AuthUserFile /path/to/.htpasswd

Comparing Files on the internet or CDN with MD5 to determine if they present same content

So, a customer today was having some issues with their CDN. They said that their SSL CDN was presenting a different image, than the HTTP CDN. So, I thought the best way to begin any troubleshooting process would firstly be to try and recreate those issues. To do that, I need a way to compare the files programmatically, enter md5sum a handly little shell application usually installed by default on most Linux OS.

[user@cbast3 ~]$ curl https://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.ssl.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file ; cat file | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  1726k      0 --:--:-- --:--:-- --:--:-- 1732k
e917a67bbe34d4eb2d4fe5a87ce90de0  -
[user@cbast3 ~]$ curl http://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.r45.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file2 ; cat file2 | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  2071k      0 --:--:-- --:--:-- --:--:-- 2081k
e917a67bbe34d4eb2d4fe5a87ce90de0  -

As we can see from the output of both, the md5sum (the hashing) of the two files is the same, this means there is a statistically very very very high chance the content is exactly the same, especially when passing several hundred characters or more. The hashing algorithm is combination based, so the more characters, the less likely same combination is of coming around twice!

In this case I was able to disprove the customers claim’s. Not because I wanted to, but because I wanted to solve their issue. These results show me, the issue must be, if it is with the CDN, with a local edgenode local to the customer having the issue. Since I am unable to recreate it from my location, it is therefore not unreasonable to assume that it is a client side issue, or a failure on our CDN edgenode side, local to the customer. That’s how I troubleshooted this, and quite happy with this one! Took about 2 minutes to do, and a few minutes to come up with. A quick and useful check indeed, which reduces the number of possibilities considerably in tracing down the issue!

Cheers &
Best wishes,
Adam

Please note the real CDN location has been altered for privacy reasons

Adding nodes and Updating nodes behind a Cloud Load Balancer

I have succeeded in putting together a basic script documenting exactly how API works and for adding node(s), listing the nods behind the LB, as well as updating the nodes (such as DRAINING, DISABLED, ENABLED).

Use update node to set one of your nodes to gracefully drain (not accept new connections, wait for present connections to die). Naturally, you will want to put the secondary server in behind the load balancer first, with addnode.sh.

Once new node is added as enabled, set the old node to ‘DRAINING’. This will gracefully switch over the server.

# List Load Balancers

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='10017858'

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: application/json" |  python -mjson.tool | grep -A5 token | grep id | cut -d '"' -f4`



curl -v -H "X-Auth-Token: $TOKEN" -H "content-type: application/json" -X GET "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID"

#

# Add Node(s) addnode.sh

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='10017858'

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: application/json" |  python -mjson.tool | grep -A5 token | grep id | cut -d '"' -f4`

# Add Node
curl -v -H "X-Auth-Token: $TOKEN" -d @addnode.json -H "content-type: application/json" -X POST "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID/nodes"



## 

For the addnode script you require a file, called addnode.json
that file must contain the snet ip's you wish to add

#
# addnode.json

{"nodes": [
        {
            "address": "10.0.0.1",
            "port": 80,
            "condition": "ENABLED",
            "type":"PRIMARY"
        }
    ]
}

##

##

# updatenode.sh

#!/bin/bash

USERNAME='yourmycloudusernamegoeshere'
APIKEY='apikeygoeshere'
LB_ID='157089'
CUSTOMER_ID='100101010'
NODE_ID=719425

TOKEN=`curl https://identity.api.rackspacecloud.com/v2.0/tokens -X POST -d '{ "auth":{"RAX-KSKEY:apiKeyCredentials": { "username":"'$USERNAME'", "apiKey": "'$APIKEY'" }} }' -H "Content-type: applic

# Update Node

curl -v -H "X-Auth-Token: $TOKEN" -d @updatenode.json -H "content-type: application/json" -X PUT "https://lon.loadbalancers.api.rackspacecloud.com/v1.0/$CUSTOMER_ID/loadbalancers/$LB_ID/nodes/$NODE_ID"

##

##

## updatenode.json

{"node":{
            "condition": "DISABLED",
            "type":"PRIMARY"
        }
}

Naturally, you will be able to change condition to ENABLED, DISABLED, or DRAINING.

I recommend to use DRAINING, since it will gracefully remove the cloud-server, and any existing connections will be waited on, before removing the server from LB.

Testing Rackspace Cloud-server Service-net Connectivity and creating an alarm

So, the last few weeks my colleagues and myself have been noticing that there has been a couple of issues with the cloud-servers servicenet interface. Unfortunately for some customers utilizing dbaas instances this means that their cloud-server will be unable to communicate, often, with their database backend.

The solution is a custom monitoring script that my colleague Marcin has kindly put together for another customer of his own.

The python script that goes on the server:

Create file:

vi /usr/lib/rackspace-monitoring-agent/plugins/servicenet.sh

Paste into file:

#!/bin/bash
#
ping="/usr/bin/ping -W 1 -c 1 -I eth1 -q"

if [ -z $1 ];then
   echo -e "status CRITICAL\nmetric ping_check uint32 1"
   exit 1
else
   $ping $1 &>/dev/null
   if [ "$?" -eq 0 ]; then
      echo -e "status OK\nmetric ping_check uint32 0"
      exit 0
   else
      echo -e "status CRITICAL\nmetric ping_check uint32 1"
      exit 1
   fi
fi

Create an alarm that utilizes the below metric

if (metric["ping_check"] == 1) {
    return new AlarmStatus(CRITICAL, 'what?');
}
if (metric["ping_check"] == 0) {
    return new AlarmStatus(OK, 'eee?');
}

Of course for this to work the primary requirement is a Rackspace Cloud-server and an installation of Rackspace Cloud Monitoring installed on the server already.

Thanks again Marcin, for this golden nugget.