We had a problem in which a server began running extremely slowly. Primarily for one magento site, but then the entire machine.
- top: There was a high load average, but nothing was running out of the ordinary. No Disk waits, Zombies, 6G/8G mem.
- yum install iotop; iotop: there’s almost nothing running here, either
- yum install sysstat; iostat -dkx 60: at last, a sign of a problem: the io utilization is too high. No reads at all, but write is near 100%
- The only thing that ever writes to the disk (from iotop) is mysqld and [kjournald], so ask mysql what it is doing.
- There are no logged slow queries.
- mysql> SHOW STATUS; nothing seems odd.
- mysql> show processlist: That resulted in hundreds of lines of crap. So the problem in general is mysql, but specifically? Every site has a query in there.
- cd /var/lib/db; du -h –max-depth=1 shows that one db was much larger (20x) than the rest.
- That db is for a wordpress site. The site itself doesn’t even do anything. However, the comments page was enabled. There are no visible comments or submit forms, but the wp-comments-post page was still sitting there, and some bot found it.
- Deleting 400M of crap and disabling the submit page fixed the problem.
Weird part: the site that was having the problem was not the site with the comments. If it hadn’t slowed the entire machine down, the spam bot could have spent years filling up the database with invisible comments, and nobody would have noticed until the disk usage got too high. All of our visible comments pages (like this one) use captchas or a login system.
I’m more of a Postgres fan myself. I think it handles things like 1 giant table better than mysql apparently does. I’m still not sure why WordPress comments, no matter how many, would cripple the entire machine like that. We have other dbs on other servers that write more than once a second. Giant text blobs? Indexes? Also not sure why only 1 of the 3 magento sites on the same server were having a problem.
Other things we thought of first:
- /var/log/maillog pointed out some system logging thing is bouncing messages. Irritating, and took a while to fix (smtpd_recipient_restrictions does not seem to handle rejecting outbound recipients, but virtual_alias_maps can map an outbound address back to a local address, which can be aliased to /dev/null), but not the problem
- Magento’s cache files
- Magento’s plugins
- Magento in general
- The VPS: Chunkhost is very good about this, but it is possible. (and we are sorry for what our io load might have done to other hosts on the server.)
- This happened previously to the same server, but a dictionary attack on the root password placed a substantial load on the server, until I blocked pw logins in general.
The main problem was a 3 second (sometimes much more, but never less) Time To First Byte. After that, the page loaded in less that a second. I thought that something was timing out, but not logging it. It was successfully making a db connection, so no logged errors, but the db was so deadlocked that it took forever to respond.
The fact that a completely different site was having problems was the main block.