I recently had to convert several structured grids in VTK format for further processing. Since I could not find any other code for doing this, here is my quick and dirty approach:

#!/usr/bin/env python3

from PIL import Image

import numpy
import os
import re
import sys

reDimensions = re.compile( r'DIMENSIONS\s+(\d+)\s+(\d+)\s+(\d+)' )
reSpacing    = re.compile( r'SPACING\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)' )
rePointData  = re.compile( r'POINT_DATA\s+(\d+)' )
reTableData  = re.compile( r'(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)' )

for filename in sys.argv[1:]:
    with open( filename ) as f:
        width  = 0
        height = 0
        depth  = 0

        data = None # Contains matrix data
        ii   = 0    # Current data index for filling matrix

        # Stores only those coordinates (given as an x and y position)
        # that are nonzero. This permits to match adjacent time steps
        # in a very concise and sparse manner.
        nonZero = list()

        for line in f:
            if reDimensions.match(line):
                result = reDimensions.match(line)
                width  = int( result.group(1) )
                height = int( result.group(2) )
                depth  = int( result.group(3) )

                assert width * height * depth > 0, "Invalid dimensions"

                data = numpy.zeros( (height,width) )
            elif reSpacing.match(line):
                result = reSpacing.match(line)
                sx     = result.group(1)
                sy     = result.group(2)
                sz     = result.group(3)

                # TODO: I am ignoring the spacing, hence I am only
                # able to load uniform rectilinear grids.
                assert sx == sy and sy == sz, "Invalid spacing"
            elif rePointData.match(line):
                result = rePointData.match(line)
                num    = int( result.group(1) )

                assert num == width * height * depth, "Inconsistent dimensions"
            elif reTableData.match(line):
                result = reTableData.match(line) 
                d      = (u,v,w) = (
                                    float( result.group(1) ),
                                    float( result.group(2) ),
                                    float( result.group(3) )
                                   )

                for k in range(0,3):
                    ix     = ii % width
                    iy     = ii // width 
                    ii     = ii + 1

                    if d[k] > 0:
                        nonZero.append( (ix,iy) )

                    data[iy][ix] = d[k]

    # TODO: `data` is now a matrix corresponding to the dimensions of
    # the original grid.Do something with it...


    #
    # Store non-zero coordinates
    #

    outputCoordinates = "/tmp/"\
             + os.path.splitext( os.path.basename(filename) )[0]\
             + ".txt"

    with open(outputCoordinates, "w") as g:
        for (x,y) in nonZero:
            print("%d\t%d" % (x,y), file=g)

    #
    # Store image
    #

    outputImage =   "/tmp/"\
             + os.path.splitext( os.path.basename(filename) )[0]\
             + ".png"

The code from above iterates over a list of VTK files (assumed to be in ASCII format) that contains POINT_DATA, i.e. a scalar value for each grid cell. Provided that the grid follows uniform spacing, the code is capable of parsing it and stores the POINT_DATA values in a numpy matrix. Moreover, the list of non-zero coordinates of the matrix is written to /tmp, along with a pixel-based image of grid. Both outputs are highly useful for debugging and understanding the input.

Feel free to do more with this code—I am hereby releasing it into the public domain.

Posted Wednesday afternoon, February 8th, 2017 Tags:

Most of the time, PhD comics hits too close to home. Recently, they ran a strip showing the temporal evolution of a typical thesis document. As expected, the comic proves that efforts are “doubly redoubled” when the deadline approaches. Naturally, I wondered how my own thesis—still under review—fares in that regard.

I keep my thesis under version control because I am a paranoid and cranky person when it comes to data security and backups. Consequently, I required a script for reconstructing the amount of changes I did to the thesis sources over time. Here’s what I came up with:

If I let this script run, with the output being stored in a file somewhere, I can use the venerable gnuplot to create a nice graph. This is what it looks like for my dissertation:

Word count in my thesis over time

Note that the script is unable to count “proper” words for now. Hence, changes in the LaTeX markup will also result in increasing the total number of words. I guess the slope of my own efforts is not too similar to the comic. At the same time, you can see that most of the content of the thesis was added in a comparatively short amount of time.

Feel free to apply this to your own documents. The code is available as a gist on GitHub. This is my first real gist—thanks to Conrad for the suggestion!

Posted late Thursday evening, February 9th, 2017 Tags: