As a parting graduation gift, friends and I recently decided that we wanted to create a heat map of the git commits of one of our mutual friends. Since this was a rather fun hack, I decided to share the script I wrote for this purpose (thanks to Jan for providing the initial script which kicked this off):

#!/usr/bin/env python3

import argparse
import datetime
import subprocess
import os
import sys

parser = argparse.ArgumentParser(description="Create heatmaps of git commits")
parser.add_argument("--author",  help="Author whose git commits are to be counted", type=str)
parser.add_argument("directory", help="git directory to use", metavar="DIR")

arguments = parser.parse_args()

directory = os.path.join(arguments.directory, ".git")
author    = arguments.author or ""

commits   = subprocess.check_output( ["git", "--git-dir=%s" % directory,
                                             "log",
                                             "--pretty=format:%ct",
                                             "--author=%s" % author ] )
counts = [ [0]*24 for _ in range(7) ]

for commit in commits.decode().split():
    d   = datetime.datetime.fromtimestamp(int(commit))
    row = d.weekday()
    col = d.hour

    counts[row][col] += 1

print('set size ratio 7.0/24.0\n'
      'set xrange [-0.5:23.5]\n'
      'set yrange [-0.5: 6.5]\n'
      'set xtics 0,1\n'
      'set ytics 0,1\n'
      'set xtics offset -0.5,0.0\n'
      'set tics scale 0,0.001\n'
      'set mxtics 2\n'
      'set mytics 2\n'
      'set grid front mxtics mytics linetype -1 linecolor rgb \'black\'\n'
      'plot "-" matrix with image notitle')
for row in range(7):
    for col in range(24):
        print("%d " % counts[row][col], end="")
    print("")
print("e")

Note that the numerical values on the y-axis of the heat map refer to the weekdays. I have not provided a mapping to their names. The x-axis refers to the hour in which a commit was made.

The usage is very simple: Point the script towards a directory that contains a git repository, use the optional --author parameter to filter commits, and pipe the output into a file. The script will generate code for further processing with gnuplot. A basic session might look like this:

$ ./git-heatmap.py ~/Projects/Skynet > skynet.dat
$ gnuplot
gnuplot > set terminal png
gnuplot > set output "skynet.png"
gnuplot > load "skynet.dat"
gnuplot > set output
$ # Post skynet.png on all the media to get spared when the
$ # inevitable robot uprising starts...

For further customizations, I recommend Anna Schneider's ColorBrewer colour palette for gnuplot.

Here's a heat map of the commits for Scifer, our research group's visualization framework:

You can see that people check in stuff at all hours. The largest amount of commits still happens during regular business hours, though. Note that the scale is logarithmic to ensure that the few commits during irregular hours are not overshadowed.
Posted late Saturday evening, January 24th, 2015 Tags:

The German Ɯberwach project aims at logging the access of certain governmental institutions, such as the German intelligence service. In an attempt to answer the question Quis custodiet ipsos custodes? with Me, of course, I desired to roll out a log file analysis tool—I wanted to find out if there are any computers of interested that access my web server. Starting with Wikipedia's list of sensitive IP addresses, I quickly obtained a nice collection of candidates. The result is Little Brother, a small Python script for checking if an Apache log file contains IP addresses from a predefined list of networks.

Usage

The data format is straightforward: Each non-empty line shall contain an IPv4 address or an IPv4 network specification and a description. These two fields shall be separated by at least one whitespace character. For example:

156.33.0.0/16       United States Senate
138.162.0.0/16      United States Department of the Navy and United States Marine Corps

The script is able to scan an Apache log file, or any log file that starts with a valid IPv4 address. The following lines will be parsed correctly, for example:

192.0.2.0.42 - - [01/Jan/2015:04:04:04 +0200] "GET / HTTP/1.1" 200 3834347 "-" "Foo"
192.0.2.0.23 - - [01/Jan/2015:05:05:05 +0200] "GET / HTTP/1.1" 200 3834347 "-" "Bar"
192.0.2.0.5  Random information that is going to be ignored anway 

A full analysis session works like this:

$ ./lb.py test.log IP_networks.txt
Counted 1 visits from 192.0.2.0.1 (TEST-NET-1)
Counted 2 visits from 198.51.100.2 (TEST-NET-2)
Counted 3 visits from 203.0.113.3 (TEST-NET-3)

Real example

As it turns out, there are indeed some interesting IP addresses in the server logs for my personal website. Here is an excerpt of some real data from the last month:

Counted 17 visits from 131.136.242.1 (Canadian Department of National Defence)
Counted 11 visits from 138.162.0.41 (United States Department of the Navy and United States Marine Corps)
Counted 13 visits from 216.81.81.84 (United States Department of Homeland Security)

Apparently, I have some sort of following in the military and the DHS. I feel strangely honoured and promise that I will remain as moto as possible.

Code

The code is released under the MIT Licence. You may download Little Brother from its git repository.

Seeing that the USMC is indeed visiting my website, I feel that there is only one way to end the post:

Oorah!

Posted Sunday evening, January 25th, 2015 Tags: