{"id":852,"date":"2016-11-07T12:44:23","date_gmt":"2016-11-07T12:44:23","guid":{"rendered":"http:\/\/www.haxed.me.uk\/?p=852"},"modified":"2016-11-07T14:32:46","modified_gmt":"2016-11-07T14:32:46","slug":"creating-proper-method-retrieving-sorting-parsing-rackspace-cdn-access-logs","status":"publish","type":"post","link":"https:\/\/haxed.me.uk\/index.php\/2016\/11\/07\/creating-proper-method-retrieving-sorting-parsing-rackspace-cdn-access-logs\/","title":{"rendered":"Creating a proper Method of Retrieving, Sorting, and Parsing Rackspace CDN Access Logs"},"content":{"rendered":"<p>So, this has been rather a bane on the life which is lived as Adam Bull. Basically, a large customer of ours had 50+ CDN&#8217;s, and literally hundreds of gigabytes of Log Files. They were all in Rackspace Cloud Files, and the big question was &#8216;how do I know how busy my CDN is?&#8217;.<\/p>\n<p><a href=\"http:\/\/www.haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png\" alt=\"screen-shot-2016-11-07-at-12-41-30-pm\" width=\"1505\" height=\"632\" class=\"alignnone size-full wp-image-853\" srcset=\"https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png 1505w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-300x126.png 300w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-768x323.png 768w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-1024x430.png 1024w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-500x210.png 500w\" sizes=\"auto, (max-width: 1505px) 100vw, 1505px\" \/><\/a><\/p>\n<p>This is a remarkably good question, because actually, not many tools are provided here, and the customer will, much like on many other CDN services, have to download those logs, and then process them. But that is actually not easier either, and I spent a good few weeks (albeit when I had time), trying to figure out the best way to do this. I dabbled with using tree to display the most commonly used logs, I played with piwik, awstats, and many others such as goaccess, all to no avail, and even used a sophisticated AWK script from our good friends in Operations. No luck, nothing, do not pass go, or collect $200. So, I was actually forced to write something to try and achieve this, from start to finish. There are 3 problems.<\/p>\n<p>1) how to easily obtain .CDN_ACCESS_LOGS from Rackspace Cloud Files to Cloud Server (or remote).<br \/>\n2) how to easily process these logs, in which format.<br \/>\n3) how to easily present these logs, using which application.<\/p>\n<p><strong>The first challenge was actually retrieving the files.<\/strong><\/p>\n<pre>\r\nswiftly --verbose --eventlet --concurrency=100 get .CDN_ACCESS_LOGS --all-objects -o .\/\r\n<\/pre>\n<p>Naturally to perform this step above, you will need a working, and setup swiftly environment. If you don&#8217;t know what swiftly, is or understand how to set up a swiftly envrionment, please see this article I wrote on the subject of deleting all files with swiftly (The howto explains the environment setup first! Just don&#8217;t follow the article to the end, and continue from here, once you&#8217;ve setup and installed swiftly)<\/p>\n<p>Fore more info see:<br \/>\n<a href=\"https:\/\/community.rackspace.com\/products\/f\/25\/t\/7190\">https:\/\/community.rackspace.com\/products\/f\/25\/t\/7190<\/a><\/p>\n<p><strong> Processing the Rackspace CDN Logs that we&#8217;ve downloaded, and organising them for further log processing <\/strong><br \/>\n<italic> This required a lot more effort, and thought <\/italic><\/p>\n<p>The below script sits in the same folder as all of the containers<\/p>\n<pre>\r\n# ls -al \r\ntotal 196\r\ndrwxrwxr-x 36 root root  4096 Nov  7 12:33 .\r\ndrwxr-xr-x  6 root root  4096 Nov  7 12:06 ..\r\n# used by my script\r\n-rw-rw-r--  1 root root  1128 Nov  7 12:06 alldirs.txt\r\n\r\n# CDN Log File containers as we downloaded them from swiftly Rackspace Cloud Files (.CDN_ACCESS_LOGS)\r\ndrwxrwxr-x  3 root root  4096 Oct 19 11:22 dev.demo.video.cdn..com\r\ndrwxrwxr-x  3 root root  4096 Oct 19 11:22 europe.assets.lon.tv\r\ndrwxrwxr-x  5 root root  4096 Oct 19 11:22 files.lon.cdn.lon.com\r\ndrwxrwxr-x  3 root root  4096 Oct 19 11:23 files.blah.cdn..com\r\ndrwxrwxr-x  5 root root  4096 Oct 19 11:24 files.demo.cdn..com\r\ndrwxrwxr-x  3 root root  4096 Oct 19 11:25 files.invesco.cdn..com\r\ndrwxrwxr-x  3 root root  4096 Oct 19 11:25 files.test.cdn..com\r\n-rw-r--r--  1 root root   561 Nov  7 12:02 generate-report.sh\r\n-rwxr-xr-x  1 root root  1414 Nov  7 12:15 logparser.sh\r\n\r\n# Used by my script\r\ndrwxr-xr-x  2 root root  4096 Nov  7 12:06 parsed\r\ndrwxr-xr-x  2 root root  4096 Nov  7 12:33 parsed-combined\r\n<\/pre>\n<pre>\r\n#!\/bin\/bash\r\n\r\n# Author : Adam Bull\r\n# Title: Rackspace CDN Log Parser\r\n# Date: November 7th 2016\r\n\r\necho \"Deleting previous jobs\"\r\nrm -rf parsed;\r\nrm -rf parsed-combined\r\n\r\nls -ld *\/ | awk '{print $9}' | grep -v parsed > alldirs.txt\r\n\r\n\r\n# Create Location for Combined File Listing for CDN LOGS\r\nmkdir parsed\r\n\r\n# Create Location for combined CDN or ACCESS LOGS\r\nmkdir parsed-combined\r\n\r\n# This just builds a list of the CDN Access Logs\r\necho \"Building list of Downloaded .CDN_ACCESS_LOG Files\"\r\nsleep 3\r\nwhile read m; do\r\nfolder=$(echo \"$m\" | sed 's@\/@@g')\r\necho $folder\r\n        echo \"$m\" | xargs -i find .\/{} -type f -print > \"parsed\/$folder.log\"\r\ndone < alldirs.txt\r\n\r\n# This part cats the files and uses xargs to produce all the Log oiutput, before cut processing and redirecting to parsed-combined\/$folder\r\necho \"Combining .CDN_ACCESS_LOG Files for bulk processing and converting into NCSA format\"\r\nsleep 3\r\nwhile read m; do\r\nfolder=$(echo \"$m\" | sed 's@\/@@g')\r\ncat \"parsed\/$folder.log\" | xargs -i zcat {} | cut -d' ' -f1-10  > \"parsed-combined\/$folder\"\r\ndone < alldirs.txt\r\n\r\n\r\n# This part processes the Log files with Goaccess, generating HTML reports\r\necho \"Generating Goaccess HTML Logs\"\r\nsleep 3\r\nwhile read m; do\r\nfolder=$(echo \"$m\" | sed 's@\/@@g')\r\ngoaccess -f \"parsed-combined\/$folder\" -a -o \"\/var\/www\/html\/$folder.html\"\r\ndone < alldirs.txt\r\n\r\n<\/pre>\n<p><strong>How to easily present these logs<\/strong><\/p>\n<p>I kind of deceived you with the last step. Actually, because I have already done it, with the above script. Though, you will naturally need to have an httpd installed, and a documentroot in \/var\/www\/html, so make sure you install apache2:<\/p>\n<pre>\r\nyum install httpd awstats\r\n<\/pre>\n<p>De de de de de de da! da da!<\/p>\n<p><a href=\"http:\/\/www.haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png\" alt=\"screen-shot-2016-11-07-at-12-41-30-pm\" width=\"1505\" height=\"632\" class=\"alignnone size-full wp-image-853\" srcset=\"https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM.png 1505w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-300x126.png 300w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-768x323.png 768w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-1024x430.png 1024w, https:\/\/haxed.me.uk\/wp-content\/uploads\/2016\/11\/Screen-Shot-2016-11-07-at-12.41.30-PM-500x210.png 500w\" sizes=\"auto, (max-width: 1505px) 100vw, 1505px\" \/><\/a><\/p>\n<p>Some little caveats:<\/p>\n<p><strong> Generating a master index.html file of all the sites <\/strong><\/p>\n<p><code><br \/>\n[root@cdn-log-parser-mother html]# pwd<br \/>\n\/var\/www\/html<br \/>\n[root@cdn-log-parser-mother html]# ls -al | awk '{print $9}' | xargs -i echo \"<a href=.\/{}> {} <\/a> <br \/>\" > index.html<br \/>\n<\/code><\/p>\n<p>I will expand the script to generate this automatically soon, but for now, leaving like this due to time constraints.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, this has been rather a bane on the life which is lived as Adam Bull. Basically, a large customer of ours had 50+ CDN&#8217;s, and literally hundreds of gigabytes of Log Files. They were all in Rackspace Cloud Files, &hellip; <a href=\"https:\/\/haxed.me.uk\/index.php\/2016\/11\/07\/creating-proper-method-retrieving-sorting-parsing-rackspace-cdn-access-logs\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[33,19,34,15,28,73,74,7,75],"tags":[],"class_list":["post-852","post","type-post","status-publish","format-standard","hentry","category-apache","category-bash","category-cdn","category-cloud","category-interweb","category-log-analysis","category-logging","category-management-tools","category-rackspace-cdn-logs"],"_links":{"self":[{"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/posts\/852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/comments?post=852"}],"version-history":[{"count":8,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/posts\/852\/revisions"}],"predecessor-version":[{"id":861,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/posts\/852\/revisions\/861"}],"wp:attachment":[{"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/media?parent=852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/categories?post=852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haxed.me.uk\/index.php\/wp-json\/wp\/v2\/tags?post=852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}