Thanks to my colleague Jan, for this oneliner, it is possible to check if your cloud-server runs out of memory historically;
find /var/log -maxdepth 1 -type f -mtime -3 -exec zgrep -i -E "oom|killed" {} \;
Thanks to my colleague Jan, for this oneliner, it is possible to check if your cloud-server runs out of memory historically;
find /var/log -maxdepth 1 -type f -mtime -3 -exec zgrep -i -E "oom|killed" {} \;
So a customer came with the question today on the process of uploading an ISO to a cloud-server. It’s a relatively easy process. RDP allows you to use resource sharing, that effectively mounts on the server you RDP to, your remote filesystem of your choice. This is the manner in which you can upload the file easily.
Once you have the ISO on the server you simply need to mount the disk with a software to present it as an ordinary CD/DVD disk. I personally use Elaborate Bytes’ Virtual Clone Drive.
Copying Resources between client and server using RDP resource sharing:
https://support.microsoft.com/en-us/kb/313292
Mounting CD/DVD ISO/CUE/BIN files in Windows
https://www.elby.ch/en/products/vcd.html
Yeah, I know it’s simple to us, but other people might find this useful. Exact same question came up today π
So, a customer is experiencing slowness/sluggishness in their app. You know there is not issue with the hypervisor from instinct, but instinct isn’t enough. Using tools like xentop, sar, bwm-ng are critical parts of live and historical troubleshooting.
Sar can tell you a story, if you can ask the storyteller the write questions, or even better, pick up the book and read it properly. You’ll understand what the plot, scenario, situation and exactly how to proceed with troubleshooting by paying attention to these data and knowing which things to check under certain circumstances.
This article doesn’t go in depth to that, but it gives you a good reference of a variety of tests, the most important being, cpu usage, io usage, network usage, and load averages.
# Grab details live sar -u 1 3 # Use historical binary sar file # sa10 means '10th day' of current month. sar -u -f /var/log/sa/sa10
sar -P ALL 1 1
‘-P 1’ means check only the 2nd Core. (Core numbers start from 0).
sar -P 1 1 5
The above command displays real time CPU usage for core number 1, every 1 second for 5 times.
sar -r 1 3
The above command provides memory stats every 1 second for a total of 3 times.
sar -S 1 5
The above command reports swap statistics every 1 seconds, a total 3 times.
sar -b 1 3
The above command checks every 1 seconds, 3 times.
This is a useful check for LUN , block devices and other specific mounts
sar -d 1 1 sar -p d
DEV – indicates block device, i.e. sda, sda1, sdb1 etc.
sar -w 1 3
sar -q 1 3
This reports the run queue size and load average of last 1 minute, 5 minutes, and 15 minutes. β1 3β reports for every 1 seconds a total of 3 times.
sar -n KEYWORD
KEYWORDS Available;
DEV β Displays network devices vital statistics for eth0, eth1, etc.,
EDEV β Display network device failure statistics
NFS β Displays NFS client activities
NFSD β Displays NFS server activities
SOCK β Displays sockets in use for IPv4
IP β Displays IPv4 network traffic
EIP β Displays IPv4 network errors
ICMP β Displays ICMPv4 network traffic
EICMP β Displays ICMPv4 network errors
TCP β Displays TCPv4 network traffic
ETCP β Displays TCPv4 network errors
UDP β Displays UDPv4 network traffic
SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
ALL β This displays all of the above information. The output will be very long.
sar -n DEV 1 1
sar -q -f /var/log/sa/sa11 -s 11:00:00
sar -q -f /var/log/sa/sa11 -s 11:00:00 | head -n 10
So, is it possible to look at a network interfaces activity without bwm-ng, iptraf, or other tools? Yes.
while true do
RX1=`cat /sys/class/net/${INTERFACE}/statistics/rx_bytes`
TX1=`cat /sys/class/net/${INTERFACE}/statistics/tx_bytes`
DOWN=$(($RX1-$RX2))
UP=$(($TX1-$TX2))
DOWN_Bits=$(($DOWN * 8 ))
UP_Bits=$(($UP * 8 ))
DOWNmbps=$(( $DOWN_Bits >> 20 ))
UPmbps=$(($UP_Bits >> 20 ))
echo -e "RX:${DOWN}\tTX:${UP} B/s | RX:${DOWNmbps}\tTX:${UPmbps} Mb/s"
RX2=$RX1; TX2=$TX1
sleep 1; done
I found this little gem yesterday, but couldn’t understand why they had not used clear. I guess they wanted to log activity or something… still this was a really nice find. I can’t remember where I found it yesterday but googling part of it should lead you to the original source π
So, you have probably heard that there are a variety of reasons why you shouldn’t use ICMP to test your service is operating normally. Mainly because of the way that ICMP is handled by routers. If you really want a representative view of the way that TCP packets, such as HTTP and HTTPS are performing in terms of packet loss (that is to say packets which do not arrive at their destination) , then hping is your friend.
You might be pinging a cloud-server that is not replying. You might think it’s down. But what if the firewall is simply dropping ICMP echo requests coming in on that port? Indeed.
Enter hping.
# hping -S -p 80 google.com HPING google.com (eth0 74.125.136.102): S set, 40 headers + 0 data bytes len=46 ip=74.125.136.102 ttl=46 id=23970 sport=80 flags=SA seq=0 win=42780 rtt=13.8 ms len=46 ip=74.125.136.102 ttl=47 id=37443 sport=80 flags=SA seq=1 win=42780 rtt=12.6 ms len=46 ip=74.125.136.102 ttl=47 id=43654 sport=80 flags=SA seq=2 win=42780 rtt=12.0 ms len=46 ip=74.125.136.102 ttl=47 id=37877 sport=80 flags=SA seq=3 win=42780 rtt=11.4 ms len=46 ip=74.125.136.102 ttl=47 id=62433 sport=80 flags=SA seq=4 win=42780 rtt=13.3 ms ^C --- google.com hping statistic --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max = 11.4/12.6/13.8 ms
In this case I tested with google.com. I’m actually surprised that more people don’t use hping, because, hping is awesome. It also makes quite a decent port scanner, were it not for the fact that the machine I tried to test that feature with buffer overflowed π It’s a nice way to test a firewalled box, but more than that, it’s a more reliable test in my opinion.
So, a customer today reached out to us asking if Rackspace provided the entire infrastructure IP address ranges in use on cloud. The answer is, no. However, that doesn’t mean that making your firewall rules, or autoscale automation need to be painful.
In fact, Rackspace Cloud utilizes Openstack which fully supports API calls which will easily be able to provide this detail in just a few simple short steps. To do this you require nova to be installed, this is really relatively easy to install, and instructions for installing it can be found here;
https://support.rackspace.com/how-to/installing-python-novaclient-on-linux-and-mac-os/
Once you have installed nova, it’s simply a case of making sure you set these 4 lines correctly in your .bash_profile
OS_USERNAME=mycloudusernamegoeshere OS_TENANT_NAME=yourrackspaceaccountnumbergoeshereusuallysomethinglike1010101010 OS_AUTH_SYSTEM=rackspace OS_PASSWORD=apikeygoeshere OS_AUTH_URL=https://identity.api.rackspacecloud.com/v2.0/ OS_REGION_NAME=LON OS_NO_CACHE=1 export OS_USERNAME OS_TENANT_NAME OS_AUTH_SYSTEM OS_PASSWORD OS_AUTH_URL OS_REGION_NAME OS_NO_CACHE
OS_USERNAME is your mycloud login username (normally the primary user).
OS_TENANT_NAME is your Customer ID, it’s the number that appears in the URL of your control panel link, see below picture for illustration
OS_PASSWORD is a bit misleading, this is actually where your apikey goes , but I think it’s possible to authenticate using your control panel password too, don’t do that for security reasons.
OS_REGION_NAME is pretty self explanatory, this is simply the region that you would like to list cloud-server IP’s in or rather, the region that you wish to perform NOVA API calls.
# supernova lon list --tenant 100010101 --fields accessIPv4,name [SUPERNOVA] Running nova against lon... +--------------------------------------+-----------------+-----------+ | ID | accessIPv4 | Name | +--------------------------------------+-----------------+-----------+ | 7e5a7f99-60ae-4c28-b2b8 | 1.1.1.1 | xapp | | 94747603-812d-4594-850b | 1.1.1.1 | rabbit2 | | d5b318aa-0fa2-4269-ae00 | 1.1.1.1 | elastic5 | | 6c1d8d33-ae5e-44be-b9f0 | 1.1.1.1 | | elastic6 | | 9f79a7dc-fd19-4f8f-9c26 |1.1.1.1 | | elastic3 | | 05b1c52b-6ced-4db0-8af2 | 11.1.1.1 | | elastic1 | | c8302366-f2f9-4c36-8f7a | 1.1.1.1 | | app5 | | b159cd07-8e68-49bc-83ee | 1.1.1.1 | | app6 | | f1f31eef-97c6-4c68-b01a | 1.1.1.1 | | ruby1 | | 64b7f0fd-8f2f-4d5f-8f89 | 1.1.1.1 | | build3 | | e320c051-b5cf-473a-9f96 | 1.1.1.1 | mysql2 | | 4fddd022-59a8-4502-bf6e | 1.1.1.1 | | mysql1 | | c9ad6951-f5f9-4351-b31d | 1.1.1.1 | | worker2 | +--------------------------------------+-----------------+-----------+
This is pretty useful for managing autoscale permissions if you need to make sure your corporate network can be connected to from your cloud-servers when new cloud-servers with new IP are built out. considerations like this are really important when putting together a solution. The nice thing is the tools are really quite simple and flexible. If I wanted I could have pulled out detail for servicenet instead. I hope this helps make some folks lives a bit easier and works to demystify API to others that haven’t had the opportunity to use it.
You are probably wondering though, what field names can I use? a nova show will reveal this against one of your server UUID’s
# supernova lon show someuuidgoeshere
+-------------------------------------+------------------------------------------------------------------+
| Property | Value |
+-------------------------------------+------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-SRV-ATTR:host | censored |
| OS-EXT-SRV-ATTR:hypervisor_hostname | censored |
| OS-EXT-SRV-ATTR:instance_name | instance-734834278-sdfdsfds- |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| censorednet network | censored |
| accessIPv4 | censored |
| accessIPv6 | censored |
| created | 2015-12-11T14:12:08Z |
| flavor | 15 GB I/O v1 (io1-15) |
| hostId | 860... |
| id | 9f79a7dc-fd19-4f8f-9c26-72a335ed2be8 |
| image | Debian 8 (Jessie) (PVHVM) (cf16c435-7bed-4dc3-b76e-57b09987866d) |
| metadata | {"build_config": "", "rax_service_level_automation": "Complete"} |
| name | elastic3 |
| private network | |
| progress | 100 |
| public network | |
| status | ACTIVE |
| tenant_id | |
| updated | 2016-02-27T09:30:20Z |
| user_id | |
+-------------------------------------+------------------------------------------------------------------+
I censored some of the fields.. but you can see all of the column names, so if you wanted to see metadata and progress only, with the server uuid and server name.
nova list --fields name, metadata, progress
This could be pretty handy for detecting when a process has finished building, or detecting once automation has completed. The possibilities with API are quite endless. API is certainly the future, and, there is no reason why, in the future, people won't be building and deploying websites thru API only, and some sophisticated UI wrapper like NOVA.
Admittedly, this is very far away, but that should be what the future technology will be made of, stuff like LAMBDA, serverless architecture, will be the future.
A customer was having some issues with their syncing, as was shown by their inotify
Error: Terminating since out of inotify watches. Consider increasing /proc/sys/fs/inotify/max_user_watches
Fix was quite simple, to remove other folders from sync that aren’t necessary.
Adding this line to the /etc/lsyncd.conf
excludeFrom="/etc/lsyncd-excludes.txt",
And creating the ‘excludes’ file for LsyncD, i.e. what folders you want to ignore, in this case we wanted to ignore old httpdocs.OLD backup.
# cat /etc/lsyncd-excludes.txt somewebsite.com/httpdocs.OLD/
A shockingly simple fix.
Please note that the path in lsyncd-excludes.txt is determined by the path in lsyncd. (do not give full path, give relative path inside the parent). It was a simple fix.
So, I had a friend who had recently bought his Raspberry Pi 3 and wanted to run retropie on it like I have been with my arcade cabinet.
The problem was the sandisk 64GB disk he had bought had some few less sectors on the disk, which meant my image was just a few bytes too big. What a bummer!
So I used this great tool by sirlagz to fix that.
#!/bin/bash # Automatic Image file resizer # Written by SirLagz strImgFile=$1 if [[ ! $(whoami) =~ "root" ]]; then echo "" echo "**********************************" echo "*** This should be run as root ***" echo "**********************************" echo "" exit fi if [[ -z $1 ]]; then echo "Usage: ./autosizer.sh" exit fi if [[ ! -e $1 || ! $(file $1) =~ "x86" ]]; then echo "Error : Not an image file, or file doesn't exist" exit fi partinfo=`parted -m $1 unit B print` partnumber=`echo "$partinfo" | grep ext4 | awk -F: ' { print $1 } '` partstart=`echo "$partinfo" | grep ext4 | awk -F: ' { print substr($2,0,length($2)-1) } '` loopback=`losetup -f --show -o $partstart $1` e2fsck -f $loopback minsize=`resize2fs -P $loopback | awk -F': ' ' { print $2 } '` minsize=`echo $minsize+1000 | bc` resize2fs -p $loopback $minsize sleep 1 losetup -d $loopback partnewsize=`echo "$minsize * 4096" | bc` newpartend=`echo "$partstart + $partnewsize" | bc` part1=`parted $1 rm 2` part2=`parted $1 unit B mkpart primary $partstart $newpartend` endresult=`parted -m $1 unit B print free | tail -1 | awk -F: ' { print substr($2,0,length($2)-1) } '` truncate -s $endresult $1
It was a nice solution to my friends problem… the only problem now is the working image for my pi is not working with audio for him, and for some reason when he comes out of hte game and goes back to emulation station he loses the joystick input controller. That is kind of bizarre.
Does anyone know what could cause those secondary issues? I’m a bit stumped on this one.
Today, a customer approached us after a Host Server Down complaining that, although the server is up again their website and application were down & not working. Even though the server was online and functioning correctly.
The customer discovered that the source of the issue was that there /etc/resolv.conf is blank, this means that they will not be able to resolve DNS A/PTR/CNAME record hostnames into a resolved IP. This is called hostname to IP resolution. Its means that if /etc/resolv.conf is blank and the customer uses hostnames in their calls, such a failure will break the connectivity due to failure to resolve to IP to communicate on the TCP stack.
There is actually a very simple way to prevent the /etc/resolv.conf file from being changed. But first, it’s important to understand why /etc/resolv.conf is being reset.
On All Rackspace cloud-servers there is a process called nova-agent, and when the server starts up, the /etc/resolv.conf file will be reset along with the networking configuration. This happens each time your server is restarted and is used to set new networking details, specifically if you take an image and build server on a new ip address or if your server is live-migrated to a new host, it makes sure on the next reboot it comes up with correct networking detail transparently. However this can cause some issues, such as in this case with the /etc/resolv.conf file. Fortunately there are some novel ways of preventing your /etc/resolv.conf being modified after you have added the correct nameservers you desire to it.
You can use the chattr immutable file setting to stop processes from modifying it after you have made the changes to your resolv.conf that are desired;
chattr +i /etc/resolv.conf
chattr -i /etc/resolv.conf
This /etc/resolv.conf issue is a common problem, however using immutable file flag and chattr should prevent it from being changed ever again.
So, I was testing with curl today and I know that it’s possible to direct to /dev/null to suppress the page. But that’s not very handy if you are checking whether html page loads, so I came up with some better body checks to use.
time curl https://www.google.com/ > 1; echo "non zero indicates server up and served content of n lines"; cat 1 | wc -l
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167k 0 167k 0 0 79771 0 --:--:-- 0:00:02 --:--:-- 79756
real 0m2.162s
user 0m0.042s
sys 0m0.126s
non zero indicates server up and served content of n lines
2134
$ time curl https://www.groundworkjobs.com/ > 1; echo "Checking for google analytics html elements string"; cat 1 | grep "www.google-analytics.com/analytics.js"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167k 0 167k 0 0 76143 0 --:--:-- 0:00:02 --:--:-- 76152
real 0m2.265s
user 0m0.042s
sys 0m0.133s
Checking for google analytics html elements string
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
Such commands might be useful when troubleshooting a cluster for instance, where one server shows more up to date versions, (different number of lines). There’s probably better way to do this with ls and awk and use the html filesize, since number of lines wouldn’t be so accurate.
$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167k 0 167k 0 0 79467 0 --:--:-- 0:00:02 --:--:-- 79461
real 0m2.170s
user 0m0.048s
sys 0m0.111s
Page size is: 171876 kB
Pretty simple.. but you could take the oneliner even further… populate a variable called $var with the filesize using ls and awk , and then use an if statement to check that var is not 0, indicating the page is answering positively, or alternatively not answering at all.
$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"; if [ "$var" -gt 0 ] ; then echo "The filesize was greater than 0, which indicates box is up but may be giving an error page"; fi
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167k 0 167k 0 0 78915 0 --:--:-- 0:00:02 --:--:-- 78950
real 0m2.185s
user 0m0.041s
sys 0m0.132s
Page size is: 171876 kB
The filesize was greater than 0, which indicates box is up but may be giving an error page
The second exercise is not particularly useful or practical as a means of testing, since if the site was timing out the script would take ages to reply and make the whole test pointless, but as a learning exercise being able to assemble one liners on the fly like this is an enjoyable, rewarding and useful investment of time and effort. Understanding such things are the fundamentals of automating tasks. In this case with output filtering, variable creation, and subsequent validation logic. It’s a simple test, but the concept is exactly the same for any advanced automation procedure too.