It's September, and in universities, that means tons of new people. New staff, new faculty, new students. Lots and lots of new people.
Here at The College of Computer and Information Science at Northeastern University, we've got a banner crop of incoming CS students. So many, in fact, that we bumped up against one of those things that we don't think about a lot. Email licenses.
Every year, we pay for a lot of licenses. We've never monitored the number used vs the number bought, but we buy many thousand seats. Well, we ran out last week. Oops.
After calling our reseller, who hooked us up with a temporary emergency bump, we made it through the day until we could buy more. I decided that it was time to start monitoring that sort of thing, so I started working on learning the Zimbra back-end.
Before you follow along with anything in this article, you should know - my version of Zimbra is old. Like, antique:
Today's win: Successfully instrumented a version of Zimbra so old that it gets senior discounts at Sizzler. #sysadmin #devops
— Matt Simmons (@standaloneSA) September 4, 2014
Zimbra was very cool about this and issued us some emergency licenses so that we could do what we needed until our new license block purchase went through. Thanks Zimbra!
In light of the whole "running out of licenses" surprise, I decided that the first thing I should start monitoring is license usage. In fact, I instrumented it so well that I can pinpoint the exact moment that we went over the number of emergency licenses we got:
Cool, right?
Well, except for the whole "now we're out of licenses" again thing. Sigh.
I mentioned a while back that I was going to be concentrating on instrumenting my infrastructure this year, and although I got a late start, it's going reasonably well. In that blog entry, I linked to a GitHub repo where I built a Vagrant-based Graphite installation. I used that work as the basis for the work I did when creating a production Graphite installation, using the echocat graphite module.
After getting Graphite up and running, I started gathering metrics in an automated fashion from the rest of the puppetized infrastructure using the pdxcat CollectD puppet module, and I wrote a little bit about how similar that was with my Kerbal Space Administration blog entry.
But my Zimbra install is old. Really old, and the server it's on isn't puppetized, and I don't even want to think about compiling collectd on the version of Ubuntu this machine runs. So I was going to need something else.
As it turns out, I've been working in Python for a little while, and I'd written a relatively short program that serves both as a standalone command that can send a single metric to Carbon or can function as a library, if you need to send a lot of metrics at a time. I'm sure there's probably a dozen tools to do this, but it was relatively easy, so I just figured I'd make my own. You can check it out on GitHub if you're interested.
So that's the script I'm using, but a script needs data. If you log in to the Zimbra admin interface (which I try not to do, because it requires Firefox in the old version we're using), you can actually see most of the stats you're interested in. It's possible to scrape that page and get the information, but it's much nicer to get to the source data itself. Fortunately, Zimbra makes that (relatively) easy:
In the Zimbra home directory (/opt/zimbra in my case), there is a "zmstats/" subdirectory, and in there you'll find a BUNCH of directories with dates as names, and some CSV files:
... snip ...
drwxr-x--- 2 zimbra zimbra 4096 2014-09-04 00:00 2014-09-03/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-05 00:00 2014-09-04/
drwxr-x--- 2 zimbra zimbra 4096 2014-09-06 00:00 2014-09-05/
-rw-r----- 1 zimbra zimbra 499471 2014-09-06 20:11 cpu.csv
-rw-r----- 1 zimbra zimbra 63018 2014-09-06 20:11 fd.csv
-rw-r----- 1 zimbra zimbra 726108 2014-09-06 20:12 imap.csv
-rw-r----- 1 zimbra zimbra 142226 2014-09-06 20:11 io.csv
-rw-r----- 1 zimbra zimbra 278966 2014-09-06 20:11 io-x.csv
-rw-r----- 1 zimbra zimbra 406240 2014-09-06 20:12 mailboxd.csv
-rw-r----- 1 zimbra zimbra 72780 2014-09-06 20:12 mtaqueue.csv
-rw-r----- 1 zimbra zimbra 2559697 2014-09-06 20:12 mysql.csv
drwxr-x--- 2 zimbra zimbra 4096 2014-06-15 22:13 pid/
-rw-r----- 1 zimbra zimbra 259389 2014-09-06 20:12 pop3.csv
-rw-r----- 1 zimbra zimbra 893333 2014-09-06 20:12 proc.csv
-rw-r----- 1 zimbra zimbra 291123 2014-09-06 20:12 soap.csv
-rw-r----- 1 zimbra zimbra 64545 2014-09-06 20:12 threads.csv
-rw-r----- 1 zimbra zimbra 691469 2014-09-06 20:11 vm.csv
-rw-r----- 1 zimbra zimbra 105 2014-09-06 19:08 zmstat.out
-rw-r----- 1 zimbra zimbra 151 2014-09-06 06:28 zmstat.out.1.gz
-rw-r----- 1 zimbra zimbra 89 2014-09-04 21:15 zmstat.out.2.gz
-rw-r----- 1 zimbra zimbra 98 2014-09-04 01:41 zmstat.out.3.gz
Each of those CSV files contains the information you want, in one of a couple of formats. Most are really easy.
sudo head mtaqueue.csv
Password:
timestamp, KBytes, requests
09/06/2014 00:00:00, 4215, 17
09/06/2014 00:00:30, 4257, 17
09/06/2014 00:01:00, 4254, 17
09/06/2014 00:01:30, 4210, 16
... snip ...
In this case, there are three columns, which include the timestamp, the number of kilobytes in queue, and the number of requests. Most CSV files have (many) more columns, but this works pretty simply. That file is updated every minute, so if you have a cronjob run, grab the last line of that file, parse it, and send it into Graphite, then your work is basically done:
zimbra$ crontab -l
... snip ...
* * * * * /opt/zimbra/zimbra-stats/zimbraMTAqueue.py
And looking at that file, it's super-easy:
#!/usr/bin/python
import pyGraphite as graphite
import sys
import resource
CSV = open('/opt/zimbra/zmstat/mtaqueue.csv', "r")
lineList = CSV.readlines()
CSV.close()
GraphiteString = "MY.GRAPHITE.BASE."
rawLine = lineList[-1]
listVals = rawLine.split(',')
values = {
'kbytes': listVals[1],
'items': listVals[2],
}
graphite.connect()
for value in values:
graphite.sendData(GraphiteString + "." + value + " ", values[value])
graphite.disconnect()
So there you go. My python isn't awesome, but it gets the job done. Any includes not used here are because some of the other scripts I needed them, and by the time I got to this one, I was just copying and pasting my code for the most part. #LazySysAdmin
The only CSV file that took me a while to figure out was imap.csv. The format of that one is more interesting:
msimmons@zimbra:/opt/zimbra/zmstat$ sudo head imap.csv
timestamp,command,exec_count,exec_ms_avg
09/06/2014 00:00:13,ID,11,0
09/06/2014 00:00:13,FETCH,2,0
09/06/2014 00:00:13,CAPABILITY,19,0
...snip...
So you get the timestamp, the IMAP command, the number of times that command is being executed, and how long, on average, it took, so you can watch latency. But the trick is that you only get one command per line, so the previous tactic of only grabbing the final line won't work. Instead, you have to grab the last line, figure out the timestamp, and then grab all of the lines that match the timestamp. Also, I've found that not all IMAP commands will show up every time, so make sure that your XFilesFactor is set right for the metrics you'll be dealing with.
The code is only a little more complicated, but still isn't too bad:
#!/usr/bin/python import pyGraphite as graphite import sys import resource imapCSV = open('/opt/zimbra/zmstat/imap.csv', "r") lineList = imapCSV.readlines() imapCSV.close() GraphiteString = "MY.GRAPHITE.PATH" class imapCommand: name = "" count = "" avgres = "" def __init__(self, name, count, avgres): self.name = name self.count = count self.avgres = avgres IMAPcmds = list() datestamp = lineList[-1].split(',')[0] record = len(lineList) while True: if ( lineList[record-1].split(',')[0] == datestamp ): CMD = lineList[record-1].split(',')[1] COUNT = lineList[record-1].split(',')[2] AVGRES = lineList[record-1].split(',')[3].strip() IMAPcmds.append(imapCommand(CMD, COUNT, AVGRES)) else: break record = record - 1 graphite.connect() for command in IMAPcmds: graphite.sendData(GraphiteString + "." + command.name + ".count ", command.count) graphite.sendData(GraphiteString + "." + command.name + ".avgres ", command.avgres) graphite.disconnect()
You can read much more about all of the metrics in the online documents, Monitoring Zimbra.
Now, so far, this has been the runtime metrics, which is helpful, but doesn't actually give me account information. To get that, we're going to use some of the built-in Zimbra tools. zmaccts lists all accounts, and then prints a summary at the end. We can just grab the summary and learn the number of accounts. We can also use the zmlicense -p command to get the number of licensed accounts we have.
The shell script is pretty easy:
$ cat zimbra-stats/zimbraAccountStatuses.sh
#!/bin/bash
# Creates $GRAPHITESERVER and $GRAPHITEPORT
. /opt/zimbra/zimbra-stats/graphite.sh
OUTPUT="`/opt/zimbra/bin/zmaccts | tail -n 1`"
ACTIVE=`echo $OUTPUT | awk '{print $2}'`
CLOSED=`echo $OUTPUT | awk '{print $3}'`
LOCKED=`echo $OUTPUT | awk '{print $4}'`
MAINT=`echo $OUTPUT | awk '{print $5}'`
TOTAL=`echo $OUTPUT | awk '{print $6}'`
NEVERLOGGEDIN=`/opt/zimbra/bin/zmaccts | grep "never$" | wc -l`
MAX="`/opt/zimbra/bin/zmlicense -p | grep ^AccountsLimit= | cut -d \= -f 2`"
STATPATH="MY.GRAPHITE.PATH."
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.active ${ACTIVE}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.closed ${CLOSED}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.locked ${LOCKED}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.maintenance ${MAINT}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.total ${TOTAL}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.neverloggedin ${NEVERLOGGEDIN}
/opt/zimbra/zimbra-stats/pyGraphite.py ${STATPATH}.max ${MAX}
Forgive all of the shortcuts taken in the above. Things aren't quoted when they should be and so on. Use at your own risk. Warranty void in Canada. Etc etc.
Overall, it's to get that additional transparency into the mail server. Even after we get the server upgraded and on a modern OS, this kind of information is a welcome addition.
Oh, and for the record?
$ find ./ -name "*wsp" | wc -l 8783
Over 8,500 metrics coming in. Sweet. Most of that is coming from collectd, but that's another blog entry...