Rapid Troubleshooting mail not arriving at its destination

Today I had a customer whose mail was not arriving at it’s destination. IN my customers case, they knew that it was arriving at the destination but going into the spam folder. This is likely due to blacklisting, however some ISP or clients dont know it’s reaching the spam folder at the other end, and since the most common cause of this happening is due to a missing SPF (Sending policy framework record), MX record or DKIM record it is possibly to rapidly check the DNS of each using dig, if the sender domain is known.

To check for the IP Blacklistings on a mailserver use it’s ip in one of the many spam a checker;

https://mxtoolbox.com/SuperTool.aspx?action=blacklist

In other cases it might be being caused by email bounces, due to the PTR, MX or DKIM records, and not even getting into the inbox, you can see that on the sending mail server using a simple grep command;

cat /var/log/maillog | grep -i status=bounced

You probably want to save the file as well

cat /var/log/maillog | grep -i status=bounced > bouncedemail.txt 

If you wanted to know which domains bounced email, if you’ve ensured all sending domains are correctly configured via DNS..

[root@api ~]# cat /var/log/maillog | grep -i status=bounced | awk '{print $7}'
to=<[email protected]>,
to=<[email protected]>,

You could use sed to extract which domains failed..

Then you could use whois against the domains to reach out to the email contact with some automation that explains that ‘weve checked DKIM, MX and SPF and all are configured correctly and believe this is an error on your behalf, etc, blah blah’..

Such a thing would be a good idea to implement for some large email providers, and I’m sure you could automate the DNS checking as well. You could likely automate the whole thing, just by watching the logs and DNS records, and some intelligent grep and awking.

Not bad.

Calculating the Average Hits per minute en-mass for thousands of sites

So, I had a customer having some major MySQL woes, and I wanted to know whether the MySQL issues were query related, as in due to the frequency of queries alone, or the size of the database. VS it being caused by the number of visitors coming into apache, therefore causing more frequency of MySQL hits, and explaining the higher CPU usage.

The best way to achieve this is to inspect /var/log/httpd with ls -al,

First we take a sample of all of the requests coming into apache2, as in all of them.. provided the customer has used proper naming conventions this isn’t a nightmare. Apache is designed to make this easy for you by the way it is setup by default, hurrah!

[root@box-DB1 logparser]# time tail -f /var/log/httpd/*access_log > allhitsnow
^C

real	0m44.560s
user	0m0.006s
sys	0m0.031s

Time command prefixed here, will tell you how long you ran it for.

[root@box-DB1 logparser]# cat allhitsnow | wc -l
1590

The above command shows you the number of lines in allhitsnow file, which was written to with all the new requests coming into sites from all the site log files. Simples! 1590 queries a minute is quite a lot.

Lysncd fails with Error: Terminating since out of inotify watches.

This is a remarkably easy problem to solve.

From man:

The inotify API provides a mechanism for monitoring filesystem
       events.  Inotify can be used to monitor individual files, or to
       monitor directories.  When a directory is monitored, inotify will
       return events for the directory itself, and for files inside the
       directory.

The problem is a kernel based value, that determines how much memory is used by processes such as lsyncd. Lets just double check what value it is set to presently:

Check Inotify max_user_watches value presently

[root@server php]# cat /proc/sys/fs/inotify/max_user_watches
100000

Check the amount of available memory on the server to ensure it can deal with the increase

[root@server-new php]# free -m
             total       used       free     shared    buffers     cached
Mem:          7358       6577        781          1        376       2427
-/+ buffers/cache:       3773       3584
Swap:            0          0          0

The cache can be added to free, as a maximum saturation. Linux uses lots of memory it doesn’t really need for efficiency.

Each used inotify watch uses 1 kB RAM in 64 bit systems, for 32bit its half.

Increase Inotify watches to a higher value

sysctl fs.inotify.max_user_watches=200000

Enabling MySQL Slow Query Logs

In the case your seeing very long pageload times, and have checked your application. It always pays to check the way in which the database performs when interacting with the application, especially if they are on either same or seperate server, as these significantly affect the way that your application will run.


mysql> SET GLOBAL slow_query_log = 'ON' ;
Query OK, 0 rows affected (0.01 sec)

mysql> SET GLOBAL slow_query_log_file = '/slow_query_logs/slow_query_logs.txt';
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)

Setting X-Frame-Options HTTP Header to allow SAME or NON SAME ORIGINS

It’s possible to increase the security of a webserver running a website, by ensuring that the X-FRAME-OPTIONS header pushes a header to the browser, which enforces the origin (server) serving the site. It prevents the website then providing objects which are not local to the site, in the stream. An admirable option for those which wish to increase their server security.

Naturally, there are some reasons why you might want to disable this, and in proper context, it can be secure. Always be sure to discuss with your pentester or PCI compliance officer, such considerations before proceeding, especially making sure that if you do not want to use SAME ORIGIN you always use the most secure option for the required task. Always check if there is a better way to achieve what your trying to do, when making such changes to your server configuration.

Insecure X-Frame-Option allows remote non matching origins

Header always append X-Frame-Options ALLOWALL

Secure X-Frame-Option imposes on the browser to not allow non origin(al) connections for the domain, which can prevent clickjack and other attacks.

Header always append X-Frame-Options SAMEORIGIN

Moving a WordPress site – much ado about nothing !

Have you noticed, there is all kinds of advise on the internet about the best way to move WordPress websites? There is literally a myriad of ways to achieve this. One of the methods I read on
wp.com was:

Changing Your Domain Name and URLs

Moving a website and changing your domain name or URLs (i.e. from http://example.com/site to http://example.com, or http://example.com to http://example.net) requires the following steps - in sequence.

    Download your existing site files.
    Export your database - go in to MySQL and export the database.
    Move the backed up files and database into a new folder - somewhere safe - this is your site backup.
    Log in to the site you want to move and go to Settings > General, then change the URLs. (ie from http://example.com/ to http://example.net ) - save the settings and expect to see a 404 page.
    Download your site files again.
    Export the database again.
    Edit wp-config.php with the new server's MySQL database name, user and password.
    Upload the files.
    Import the database on the new server.

I mean this is truly horrifying steps to take, and I don’t see the point at all. This is how I achieved it for one my customers.

1. Take customer Database Dump
2. Edit the database searching for 'siteurl' with vi
vi mysqldump.sql
:?siteurl

And just swap out the values, confirming after editing the file;

[root@box]# cat somemysqldump.sql  | grep siteurl -A 2
(1, 'siteurl', 'https://www.newsiteurl.com', 'yes'),
(2, 'home', 'https://www.newsiteurl.com', 'yes'),
(3, 'blogname', 'My website name', 'yes'),

Job done, no stress https://codex.wordpress.org/Moving_WordPress.

There might be additional bits but this is certainly enough for them to access the wp-admin panel. If you have problems add this line to the wp-config.php file;

define('RELOCATE',true);

Just before the line which says

/* That’s all, stop editing! Happy blogging. */

And then just do the import/restore as normal;

mysql -u newmysqluser -p newdatabase_to_import_to < old_database.sql

Simples! I really have no idea why it is made to be so complicated on other hosting sites or platforms.

Reverting a yum transaction & controlling auto-updates for yum-cron with excludes

Hey, so a customer was running yum-cron, the Redhat version of Canoninical’s unattended upgrades. An auto update for a package ran, which actually broke their backend.

# yum history info 172
Loaded plugins: replace, rhnplugin
This system is receiving updates from RHN Classic or Red Hat Satellite.
Transaction ID : 172
Begin time     : Sat May  6 05:45:29 2017
Begin rpmdb    : 742:11fcb243cc5701b9d2293d90cb4161e5edc34bb8
End time       :            05:45:31 2017 (2 seconds)
End rpmdb      : 742:0afbbf517315f18985b2d01d4c2e5250caf0afb5
User           : root <root>
Return-Code    : Success
Transaction performed with:
    Installed     rpm-4.11.3-21.el7.x86_64                @rhel-x86_64-server-7
    Installed     yum-3.4.3-150.el7.noarch                @rhel-x86_64-server-7
    Installed     yum-metadata-parser-1.1.4-10.el7.x86_64 @anaconda/7.1
    Installed     yum-rhn-plugin-2.0.1-6.1.el7_3.noarch   @rhel-x86_64-server-7
Packages Altered:
    Updated php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
    Update                     2.5.3-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
history info

The customer asked us to reverse the transaction, so I did, this was quite simple to do;

[root@web user]# yum history undo 172
Loaded plugins: replace, rhnplugin
This system is receiving updates from RHN Classic or Red Hat Satellite.
Undoing transaction 172, from Sat May  6 05:45:29 2017
    Updated php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
    Update                     2.5.3-1.ius.el7.x86_64 @rackspace-rhel-x86_64-server-7-ius
Resolving Dependencies
--> Running transaction check
---> Package php56u-pecl-xdebug.x86_64 0:2.5.1-1.ius.el7 will be a downgrade
---> Package php56u-pecl-xdebug.x86_64 0:2.5.3-1.ius.el7 will be erased
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================================================================================
 Package                                                  Arch                                         Version                                               Repository                                                                Size
============================================================================================================================================================================================================================================
Downgrading:
 php56u-pecl-xdebug                                       x86_64                                       2.5.1-1.ius.el7                                       rackspace-rhel-x86_64-server-7-ius                                       205 k

Transaction Summary
============================================================================================================================================================================================================================================
Downgrade  1 Package

Total download size: 205 k
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64.rpm                                                                                                                                                                        | 205 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64                                                                                                                                                                                1/2
  Cleanup    : php56u-pecl-xdebug-2.5.3-1.ius.el7.x86_64                                                                                                                                                                                2/2
  Verifying  : php56u-pecl-xdebug-2.5.1-1.ius.el7.x86_64                                                                                                                                                                                1/2
  Verifying  : php56u-pecl-xdebug-2.5.3-1.ius.el7.x86_64                                                                                                                                                                                2/2

Removed:
  php56u-pecl-xdebug.x86_64 0:2.5.3-1.ius.el7

Installed:
  php56u-pecl-xdebug.x86_64 0:2.5.1-1.ius.el7

Complete!

In CentOS 7 and I believe RedHat RHEL 7 as well, if you don’t want to disable yum-cron altogether by running a yum remove yum-cron, you could exclude the specific package, or use wildcards to exclude all of them like php* , http*, etc.

vi /etc/yum/yum-cron.conf.

If you wish to exclude some packages from auto-update mechanism, you’ll have to add an exclude line, at the bottom of the file, in the base section.

[base]
exclude = kernel* php* httpd*

etc, I hope that this is of some assist,

cheers &
Best wishes,
Adam

Magento Rewrite Woes … really woes

I had a customer this week that had some terrible rewrite woes with their magento site. They knew that a whole ton of their images were getting 404’s most likely because rewrite wasn’t getting to the correct filesystem path that the file resided. This was due to their cache being broken, and their second developer not creating proper rewrite rule.

As a sysadmin our job is not a development role, we are a support role, and in order to enable the developer to fix the problem, the developer needs to be able to see exactly what it is, enter the sysads task. I wrote this really ghetto script, which essentially hunts in the nginx error log for requests that failed with no such file, and then qualifies them by grepping for jpg file types. This is not a perfect way of doing it, however, it is really effective at identifying the broken links.

Then I have a seperate routine that strips the each of the file uri’s down to the filename, and locates the file on the filesystem, and matches the filename on the filesystem that the rewrite should be going to, as well as the incorrect path that the rewrite is presently putting the url to. See the script below:

#!/bin/bash

# Author: Adam Bull
# Company: Rackspace LTD, Hayes
# Purpose:
#          This customer has a difficulty with nginx rewriting to the incorrect file
# this script goes into the nginx error.log and finds all the images which have the broken rewrite rules
# then after it has identified the broken rewrite rule files, it searches for the correct file on the filesystem
# then correlates it with the necessary rewrite rule that is required
# this could potentially be used for in-place upgrade by developers
# to ensure that website has proper redirects in the case of bugs with the ones which exist.

# This script will effectively find all 404's and give necessary information for forming a rewrite rule, i.e. the request url, from nginx error.log vs the actual filesystem location
# on the hard disk that the request needs to go to, but is not being rewritten to file path correctly already

# that way this data could be used to create rewrite rules programmatically, potentially
# This is a work in progress


# These are used for display output
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination missing:",$7"\n"}'
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "URL Request:",$21,"\nFilesystem destination detected missing:",$7"\n"}'

# These below are used for variable population for locating actual file paths of missing files needed to determine the proper rewrite path destination (which is missing)
# we qualify this with only *.jpg files

cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg > lost.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print $7}' | sed 's/\"//g' |  sed 's/.*\///' | grep jpg >> lost.txt

# FULL REQUEST URL NEEDED AS WELL
cat /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep 'No such file' | awk '{print "http://mycustomerswebsite.com",$21}' | sed 's/\"//g' | grep jpg > lostfullurl.txt
zcat /var/log/nginx/*error*.gz  | grep 'No such file' | awk '{print "http://customerwebsite.com/",$21}' | sed 's/\"//g' | grep jpg >> lostfullurl.txt

# The below section is used for finding the lost files on filesystem and pairing them together in variable pairs
# for programmatic usage for rewrite rules


while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf '\n\n'
  printf 'Found a broken link getting a 404 at : %s\n'
  printf "$f1\n"
  printf 'Locating the correct link of the file on the filesystem: %s\n'
        find /var/www/magento | grep $f2
done 3<lostfullurl.txt 4<lost.txt

I was particularly proud of the last section which uses a ‘dual loop for two input files’ in a single while statement, allowing me to achieve the descriptions above.

Output is in the form of:


Found a broken link getting a 404 at :
http://customerswebsite.com/media/catalog/product/cache/1/image/800x700/9df78eab33525d08d6e5fb8d27136e95/b/o/image-magick-file-red.jpg
Locating the correct link of the file on the filesystem:
/var/www/magento/media/catalog/product/b/o/image-magick-file-red.jpg

As you can see the path is different on the filesystem to the url that the rewrite is putting the request to, hence the 404 this customer is getting.

This could be a really useful script, and, I see no reason why the script could not generate the rewrite rules programatically from the 404 failures it finds, it could actually create rules that are necessary to fix the problem. Now, this is not an ideal fix, however the script will allow you to have an overview either to fix this properly as a developer, or as a sysadmin to patch up with new rewrite rules.

I’m really proud of this one, even though not everyone may see a use for it. There really really is, and this customer is stoked, think of it like this, how can a developer fix it if he doesn’t have a clear idea of the things that are broken, and this is the sysads job,

Cheers &
Best wishes,
Adam

Recovering Corrupt RPM DB

I had a support specialist today that had an open yum task they couldn’t kill gracefully, after kill -9, this was happening

	
[root@mybox home]# yum list | grep -i xml
rpmdb: Thread/process 31902/140347322918656 failed: Thread died in Berkeley DB library
error: db3 error(-30974) from dbenv-&gt;failchk: DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db3 -  (-30974)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:
Error: rpmdb open failed
[root@mybox home]#

The solution to fix this is to manually erase the yumdb and to manually rebuild it;


[root@mybox home]# rm -f /var/lib/rpm/__*
[root@mybox home]# rpm --rebuilddb

Increasing the Limits of PHP-FPM

It’s important to know how to increase the limits for php-fpm www pools, or any other named alias pools you might have setup.

You might see an error like

tail -f /var/log/php7.1-fpm.log
[24-Apr-2017 11:23:09] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 11 total

or

[24-Apr-2017 10:51:38] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

The solution is quite simple, we just need to go in and edit the php fpm configuration on the server and increase these values to safe ones that is supported by available RAM.

pm.max_children = 15

; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 2

; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 1

; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 8

Then monitor the site with

tail -f /var/log/php7.1-fpm.log

To ensure no further limits are being hit.

Obviously if you are using different version of fpm your log location might be different.