Comparing Files on the internet or CDN with MD5 to determine if they present same content

So, a customer today was having some issues with their CDN. They said that their SSL CDN was presenting a different image, than the HTTP CDN. So, I thought the best way to begin any troubleshooting process would firstly be to try and recreate those issues. To do that, I need a way to compare the files programmatically, enter md5sum a handly little shell application usually installed by default on most Linux OS.

[user@cbast3 ~]$ curl https://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.ssl.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file ; cat file | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  1726k      0 --:--:-- --:--:-- --:--:-- 1732k
e917a67bbe34d4eb2d4fe5a87ce90de0  -
[user@cbast3 ~]$ curl http://3485asd3jjc839c9d3-08e84cacaacfcebda9281e3a9724b749.r45.cf3.rackcdn.com/companies/5825cb13f2e6c9632807d103/header.jpeg -o file2 ; cat file2 | md5sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0  2071k      0 --:--:-- --:--:-- --:--:-- 2081k
e917a67bbe34d4eb2d4fe5a87ce90de0  -

As we can see from the output of both, the md5sum (the hashing) of the two files is the same, this means there is a statistically very very very high chance the content is exactly the same, especially when passing several hundred characters or more. The hashing algorithm is combination based, so the more characters, the less likely same combination is of coming around twice!

In this case I was able to disprove the customers claim’s. Not because I wanted to, but because I wanted to solve their issue. These results show me, the issue must be, if it is with the CDN, with a local edgenode local to the customer having the issue. Since I am unable to recreate it from my location, it is therefore not unreasonable to assume that it is a client side issue, or a failure on our CDN edgenode side, local to the customer. That’s how I troubleshooted this, and quite happy with this one! Took about 2 minutes to do, and a few minutes to come up with. A quick and useful check indeed, which reduces the number of possibilities considerably in tracing down the issue!

Cheers &
Best wishes,
Adam

Please note the real CDN location has been altered for privacy reasons

Troubleshooting Rackspace CDN not serving files

A customer came to us with an issue with their CDN which was strange and odd. I wanted to document this so that it is understood why this happened.

The customer is using two TLS origins, and HTTP2. Why that is a problem will become evidently clear. This is a general method of troubleshooting in terms of replicating behaviour of the CDN and origin with host headers. This can be applied no matter what the problem, to understand the HTTP code given by the origin, which at least half of the time turns out to be the cause. The origin being the cloud-server your CDN is backed by.

Question
Hi,
We are currently experiencing some issues with the Cloud CDN. We are using this for our CSS and images and now everything is getting a HTTP/503 SERVICE UNAVAILABLE. If you want to test, you may test this url:
https://cdn.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css

This is supposed to deliver this file:
https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css

Is something mis-configured or are there some issues on the appliance?

Answer

First we confirm the origin is UP

# curl -I https://originserver.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css
HTTP/1.1 200 OK
Date: Tue, 11 Oct 2016 08:45:42 GMT
Server: Apache
Last-Modified: Tue, 11 Oct 2016 06:57:53 GMT
ETag: "ed26-53e91653c61d0"
Accept-Ranges: bytes
Content-Length: 60710
Vary: Accept-Encoding
Cache-Control: max-age=31536000, public
Expires: Wed, 11 Oct 2017 08:45:42 GMT
Access-Control-Allow-Origin: *
X-Frame-Options: SAMEORIGIN
Content-Type: text/css

The origin is the cloud-server where the CDN pulls from. As we can see the site is up. So what is causing the issue? The way CDN works is it provides a host header for the domain, so the site has to have a host for both domains. The reason is that the CDN uses CNAME hostnames to identify which CDN is which. I.e. which path like /media/ directs to which static origins subdomain.

The best way to look further at the situation now is to check the origin (the subdomain that you’ve associated with your CDN subdomain that raxspace gives you, when sending the host header for the CDN url we get:

root@myweb:~# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: cdn.cusomerdomain.no'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:17:38 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

As we can see we get this odd HTTP 421 misdirected request.

# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: mycdnname1.scdn4.secure.raxcdn.com'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:18:06 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

~# curl -I https://origin.customerdomain.com/static/version1476169182/adminhtml/Magento/backend/nb_NO/extjs/resources/css/ext-all.min.css -H 'host: cdncustomercname.cusomerdomain.com'
HTTP/1.1 421 Misdirected Request
Date: Tue, 11 Oct 2016 08:17:45 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1

https://httpd.apache.org/docs/2.4/mod/mod_http2.html

Looking at the definition for HTTP 2, this issue was caused by different TLS configurations for your domains and mod http2 trying to reuse the same connection, which will not work if the TLS configurations are not the same on the origin cloud-server side.

You just need to disable HTTP2, or configure the TLS configurations to be the same on the apache2 side. I hope that this clarifies and makes sense to you, of course if you have additional questions, comments or concerns please don't hesitate to reach out to us, we are here to help!

As you can see the importance of debugging CDN by sending host header to the origin that the CDN uses, to replicate the issue the customer was experiencing, which was essentially, the CDN edgenodes (the machines around the world that pull from the origin for content distribution really worldwide), weren't able to retrieve the files from the origin with the host header domain that is defined in the control panel.

This customer needed to in this case adjust their Apache2 configuration. The problem was likely caused by updating Apache2 or similar.

Automating Backups in Public Cloud using Cloud Files

Hey folks, I know it’s been a little while since I put an article together. However I have been putting together a really article explaining how to write bespoke backup systems for the Rackspace Community. It’s a proof of concept/demonstration/tutorial as opposed to a production application. However people looking to create custom cloud backup scripts may benefit from the experience of reading thru it.

You can see the article at the below URL:

https://community.rackspace.com/products/f/25/t/7857

Simple way to perform a body check on a website

So, I was testing with curl today and I know that it’s possible to direct to /dev/null to suppress the page. But that’s not very handy if you are checking whether html page loads, so I came up with some better body checks to use.

A Basic body check using wc -l to count the lines of the site

 time curl https://www.google.com/ > 1; echo "non zero indicates server up and served content of n lines"; cat 1 | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  79771      0 --:--:--  0:00:02 --:--:-- 79756

real	0m2.162s
user	0m0.042s
sys	0m0.126s
non zero indicates server up and served content of n lines
2134

A body check for Google analytics

$ time curl https://www.groundworkjobs.com/ > 1; echo "Checking for google analytics html elements string"; cat 1 | grep "www.google-analytics.com/analytics.js"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  76143      0 --:--:--  0:00:02 --:--:-- 76152

real	0m2.265s
user	0m0.042s
sys	0m0.133s
Checking for google analytics html elements string
				})(window,document,'script','//www.google-analytics.com/analytics.js','ga');

Such commands might be useful when troubleshooting a cluster for instance, where one server shows more up to date versions, (different number of lines). There’s probably better way to do this with ls and awk and use the html filesize, since number of lines wouldn’t be so accurate.

Check Filesize from request

$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  79467      0 --:--:--  0:00:02 --:--:-- 79461

real	0m2.170s
user	0m0.048s
sys	0m0.111s
Page size is: 171876 kB

Pretty simple.. but you could take the oneliner even further… populate a variable called $var with the filesize using ls and awk , and then use an if statement to check that var is not 0, indicating the page is answering positively, or alternatively not answering at all.

Check Filesize and populate a variable with the filesize, then validate variable

$ time curl https://www.groundworkjobs.com/ > 1; var=$(ls -al 1 | awk '{print $5}') ; echo "Page size is: $var kB"; if [ "$var" -gt 0 ] ; then echo "The filesize was greater than 0, which indicates box is up but may be giving an error page"; fi
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  167k    0  167k    0     0  78915      0 --:--:--  0:00:02 --:--:-- 78950

real	0m2.185s
user	0m0.041s
sys	0m0.132s
Page size is: 171876 kB
The filesize was greater than 0, which indicates box is up but may be giving an error page

The second exercise is not particularly useful or practical as a means of testing, since if the site was timing out the script would take ages to reply and make the whole test pointless, but as a learning exercise being able to assemble one liners on the fly like this is an enjoyable, rewarding and useful investment of time and effort. Understanding such things are the fundamentals of automating tasks. In this case with output filtering, variable creation, and subsequent validation logic. It’s a simple test, but the concept is exactly the same for any advanced automation procedure too.