I had a customer this week that had some terrible rewrite woes with their magento site. They knew that a whole ton of their images were getting 404’s most likely because rewrite wasn’t getting to the correct filesystem path that the file resided. This was due to their cache being broken, and their second developer not creating proper rewrite rule.
As a sysadmin our job is not a development role, we are a support role, and in order to enable the developer to fix the problem, the developer needs to be able to see exactly what it is, enter the sysads task. I wrote this really ghetto script, which essentially hunts in the nginx error log for requests that failed with no such file, and then qualifies them by grepping for jpg file types. This is not a perfect way of doing it, however, it is really effective at identifying the broken links.
Then I have a seperate routine that strips the each of the file uri’s down to the filename, and locates the file on the filesystem, and matches the filename on the filesystem that the rewrite should be going to, as well as the incorrect path that the rewrite is presently putting the url to. See the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | #!/bin/bash
cat /var/log/nginx/error .log /var/log/nginx/error .log.1 | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination missing:",$7"\n"}'
zcat /var/log/nginx/ *error*.gz | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination detected missing:",$7"\n"}'
cat /var/log/nginx/error .log /var/log/nginx/error .log.1 | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' | sed 's/.*\///' | grep jpg > lost.txt
zcat /var/log/nginx/ *error*.gz | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' | sed 's/.*\///' | grep jpg >> lost.txt
cat /var/log/nginx/error .log /var/log/nginx/error .log.1 | grep 'No such file' | awk '{print "http://mycustomerswebsite.com",$21}' | sed 's/\"//g' | grep jpg > lostfullurl.txt zcat /var/log/nginx/ *error*.gz | grep 'No such file' | awk '{print "http://customerwebsite.com/",$21}' | sed 's/\"//g' | grep jpg >> lostfullurl.txt while true
do
read -r f1 <&3 || break
read -r f2 <&4 || break
printf '\n\n'
printf 'Found a broken link getting a 404 at : %s\n'
printf "$f1\n"
printf 'Locating the correct link of the file on the filesystem: %s\n'
find /var/www/magento | grep $f2
done 3<lostfullurl.txt 4<lost.txt
|
I was particularly proud of the last section which uses a ‘dual loop for two input files’ in a single while statement, allowing me to achieve the descriptions above.
Output is in the form of:
1 2 3 4 | Found a broken link getting a 404 at :
http: //customerswebsite .com /media/catalog/product/cache/1/image/800x700/9df78eab33525d08d6e5fb8d27136e95/b/o/image-magick-file-red .jpg
Locating the correct link of the file on the filesystem:
/var/www/magento/media/catalog/product/b/o/image-magick-file-red .jpg
|
As you can see the path is different on the filesystem to the url that the rewrite is putting the request to, hence the 404 this customer is getting.
This could be a really useful script, and, I see no reason why the script could not generate the rewrite rules programatically from the 404 failures it finds, it could actually create rules that are necessary to fix the problem. Now, this is not an ideal fix, however the script will allow you to have an overview either to fix this properly as a developer, or as a sysadmin to patch up with new rewrite rules.
I’m really proud of this one, even though not everyone may see a use for it. There really really is, and this customer is stoked, think of it like this, how can a developer fix it if he doesn’t have a clear idea of the things that are broken, and this is the sysads job,
Cheers &
Best wishes,
Adam