As an extension to my previous post on rectangular selections with Qt and OpenScenegraph, I just updated the example project with a very simple pick handler. A pick handler is a simple event handler that permits picking, i.e. selecting objects in a scene.

Doing it manually

The basic implementation is straightforward. We first inherit from the base class osgGA::GUIEventHandler and provide our own implementation of the handle() function. In this function, we use a line segment intersector class to determine intersections within the window:

bool PickHandler::handle( const osgGA::GUIEventAdapter& ea, osgGA::GUIActionAdapter& aa )
{
  if( ea.getEventType() != osgGA::GUIEventAdapter::RELEASE &&
      ea.getButton()    != osgGA::GUIEventAdapter::LEFT_MOUSE_BUTTON )
  {
    return false;
  }

  osgViewer::View* viewer = dynamic_cast<osgViewer::View*>( &aa );

  if( viewer )
  {
    osgUtil::LineSegmentIntersector* intersector
        = new osgUtil::LineSegmentIntersector( osgUtil::Intersector::WINDOW, ea.getX(), ea.getY() );

    osgUtil::IntersectionVisitor iv( intersector );

    osg::Camera* camera = viewer->getCamera();
    if( !camera )
      return false;

    camera->accept( iv );

    if( !intersector->containsIntersections() )
      return false;

    auto intersections = intersector->getIntersections();

    std::cout << "Got " << intersections.size() << " intersections:\n";

    for( auto&& intersection : intersections )
      std::cout << "  - Local intersection point = " << intersection.localIntersectionPoint << "\n";
  }

  return true;
}

The localIntersectionPoint variable contains the intersection point in local coordinates. It is also possible to obtain the intersection point in world coordinates by calling the getWorldIntersectPoint() function of the intersection.

A shortcut

Instead of setting up our own line segment intersector, we could also call the osgViewer::View::computeIntersections of our view. This involves a dynamic cast in the handle function. The way outlined above is for instructive purposes only.

Code

As always, the code is available in a git repository.

Posted late Sunday afternoon, December 7th, 2014 Tags:

I have always been fascinated by Markov chains for text generation. Despite their simple structure, a sufficiently large body of text will result in strikingly interesting products. Naturally, I had to provide my own implementation for simple text generation.

Algorithm

My algorithm first tokenizes each input file. I consider each punctuation mark to be a token as well because the results sound slightly more natural that way. This treatment requires me to check for punctuation marks when writing the text to STDOUT because punctuation marks should of course not introduce additional whitespace.

Having tokenized the file, I extract prefixes of a certain length (say 2) and map them to the subsequent. This yields a function from the set of prefixes to the set of potential candidate words. To start the Markov chain generation, I choose a prefix at random and choose a new candidate using a uniform distribution. Since I do not check for duplicates in the list of candidate words, I do not need to take relative frequencies into account. The old prefix is updated to include the new candidate. This yields a new prefix from which I may choose a new candidate, and so on.

I opted for the lazy implementation and simple repeat this step until a predefined number of iterations has been reached.

Examples

This is an example of a new Christmas song. Some of the lines appear to be taken verbatim from a song because there is too little text for the algorithm to choose from. Still, the result is quite nice:

Mary, and the bitter weather. Sire, the baby awakes, But the fire is so hard to sleep tonight They know that Santa's on his sleigh And ev'ry mother's child is gonna spy To see if reindeer really know how to fly And so, I'm Telling you why: Santa Claus comes tonight. Here were are as follows, and on my back I fell; A gent was riding by, in a pear tree.

Using the King James bible (and hence a larger text corpus), the result sounds rather ominous:

Restrained they the things that are desolate in the eyes of my people, then thou shalt take all the inhabitants of the land of Judah, and fall; and they opened their mouths, that they might know thee, O LORD, I shall eat of the giant. 21:19 And it came to pass. 24:13 And I went to Pilate, and they let her own, and the vail that is written, He that hath not sent to the wilderness of Paran.

Usage and code

$ mkdir build
$ cd build
$ cmake ../
$ make
$ markov --prefix 2 --iterations 10 ../Christmas.txt

Anything that does not belong to either one of the optional --prefix or --iterations parameters is considered a filename. The program currently combines all files into a large database. This is subject to change.

This program is released under the MIT Licence. You may get the source from my git repository.

Posted at teatime on Thursday, December 25th, 2014 Tags:

While cleaning out my project directory, I stumbled over some rather ancient code for generating Lissajous curves on the console. These were probably my first steps with the ncurses library, back in the days when I knew nothing much about anything.

I decided to share the program anyway, and after half an hour of cleaning the source and having fun with CMake, the program still builds on my Archlinux desktop.

Short usage information:

  • Press a/A to decrease/increase the value of the "a" variable
  • Press b/B to decrease/increase the value of the "b" variable
  • Press -/+ to decrease/increase the value of the "delta" variable, resulting in the curve being rotated
  • Press q to quit the program

The program is released under the MIT Licence. You may get the source from the Lissacurses git repository. This repository also contains more detailed instructions for building the program.

Posted early Friday morning, December 26th, 2014 Tags:

I have written a number of posts about Qt and OpenSceneGraph this year. Starting from how to design a basic widget for OSG, I showed how to extend the widget to enable rectangular selections. In the last post so far, I explained how to do some simple object picking.

In the process of receiving some feedback about these articles, I learned even more about OpenSceneGraph. To ensure that the code and articles remain valid in the future, there are a couple of remaining issues on which I want to expand briefly.

Porting to Microsoft Windows

I do not have much experience in developing for this platform. However, thanks to Andrew Cunningham, I now know that the program compiles & runs under Microsoft Windows if the selection rectangle is not drawn. Apparently, QPainter (which I use for drawing the selection rectangle) uses GDI, which does not mix well with OSG's OpenGL rendering.

As a remedy, I made the selection rectangle (as well as the object picking) configurable. Just set the variables WITH_PICK_HANDLER and WITH_SELECTION_PROCESSING accordingly using ccmake in your build directory. I hope that I may fix this properly one time.

Porting to OS X

Again, I have only limited experience with this platform. However, I can at least guarantee that the program compiles and runs reasonable well. I had to install Qt4 using the excellent Homebrew package manager for OS X. OS X also handles graphics context switches differently that Linux, which exposed another bug in the program—I had to use makeCurrent() and doneCurrent() in the drawing routines to ensure that the OSG widget is allowed to use the graphics context.

There are still some glitches, though: The QMdiArea widgets do not overlap correctly when being moved. I strongly suspect that this is an issue with Qt and not with OSG (since OSG knows nothing about the widget geometry). Furthermore, the widgets will not be rendered correctly all the time. This seems to be caused by OS X switching between two graphics cards.

Unfortunately, I don't have any definite solutions for this. I will update the program, though, if I find the root cause.

Using Qt 5

To end on a more positive note: The program is now using Qt 5 instead of Qt 4. This at least was rather easy to achieve and only involved some changes in the CMakeLists.txt file. Note that I incremented the minimum required version to 2.8.11 in order to ensure that Qt 5 is being integrated correctly.

Code

The code is available in a git repository. Please report any bugs or issues.

Posted at teatime on Sunday, December 28th, 2014 Tags:

I always wanted to look inside my own head. During the last few years, I was given the opportunity to do so. On several occasions, doctors took some MRI images of my head (there is nothing wrong with my head—this was the standard operating procedure in some situations). Some time later, they also took MRA images, meaning that I got to see my grey matter and my blood vessels. Even better, the doctors were only happy to provide me with the raw data from their examinations. A true treasure for a visualization scientists, as both MRI and MRA result in Voxel data that may be visualized by volume rendering techniques. In mathematical terms, I wanted to visualize a 3-dimensional scalar field in a cubical volume. In Hollywood terms, I wanted a program to zoom into my brain and take a look at my blood vessels.

This post describes how I went from a stack of DICOM images to a set of raw volume data sets that can be visualized with ParaView, for example.

The setup

The German healthcare system being what it is, I quickly received a bunch of DVDs with my images. The DVDs contained some sort of Java-based viewer application, developed by CHILI GmbH. Unfortunately, medical laypersons such as myself are probably not the targeted user group for this software—I thus decided that it would be easier to rely on other tools for visualizing my head. A quick investigation showed that the data are ordered in a hiearchy such as DATA/PAT/STUD??/SER??/IMG???. The STUD part refers to the examination that was performed, e.g. an MRI of a certain part of the head. Each SER part contains up to (in my case) 33 different directories with the results of a single series of the MRI/MRA. Each subdirectory consists of lots of single images in DICOM format.

DICOM describes a standard for exchanging information in medical imaging. It is rather complicated, though, so I used the ImageJ image processing program, to verify that each of the images describes a slice of the MRI/MRA data. To obtain a complete volume from these image, I had to convert them into a single file. Luckily, some brave souls have spent a considerable amount of time (and, I would expect, a sizeable amount of their sanity, as well) to write libraries for parsing DICOM. One such library is pydicom, which permits reading and (to a limited extent) writing DICOM files using Python. It is currently maintained by Darcy Mason. Since pydicom's documentation is rather raw at some places, I also perused Roni's blog "DICOM is easy" to grasp the basics of DICOM. It turned out that each of the DICOM files contains a single entry with raw pixel data. Since all images of a given series should have the same dimensions, I figured that it would be possible to obtain a complete volume by extracting the raw data of each image and concatenating them. I wrote multiple Python scripts for doing these jobs.

Sample data

If you do not have any DICOM images at your disposal, I would recommend visiting the homepage of Sébastien Barré and download the CT-MON2-16-brain file, for example. I would very much like to include some sample files in the repository of the scripts, but I am not sure how these samples are licenced, so I prefer linking to them.

dicomdump.py

This script extracts the "pixel data" segment of a DICOM file while writing some information to STDERR:

$ ./dicomdump.py CT-MONO2-16-brain > CT-MONO2-16-brain.raw
##############
# Pixel data #
##############

Width:                      512
Height:                     512
Samples:                    1
Bits:                       16
Photometric interpretation: MONOCHROME2
Unsigned:                   0

You should now be able to read CT-MONO2-16-brain.raw using ImageJ, for example.

dicomcat.py

This script concatenates multiple images from a series of DICOM files to form a volume. Assuming you have files with a prefix of IMG in the current directory, you could do the following:

$ ./dicomcat.py IMG* > IMG.raw
[ 20.00%] Processing 'IMG001'...                                  
[ 40.00%] Processing 'IMG002'...
[ 60.00%] Processing 'IMG003'...
[ 80.00%] Processing 'IMG004'...
[100.00%] Processing 'IMG005'...
Finished writing data to '<stdout>'

Using the optional --prefix parameter, you can also create an output file automatically:

$ ./dicomcat.py --prefix IMG Data/2011/STUD02/SER01/IMG00* 
[ 20.00%] Processing 'IMG001'...
[ 40.00%] Processing 'IMG002'...
[ 60.00%] Processing 'IMG003'...
[ 80.00%] Processing 'IMG004'...
[100.00%] Processing 'IMG005'...
Finished writing data to 'IMG_320x320x5_16u'

The file names generated by the script follow the same pattern. The first three numbers indicate the dimensions of the volume, i.e. width, height, and depth. The last digits indicate the number of bits required to read the image and whether the data are signed (indicated by s) or unsigned (indicated by u). The example file is thus a volume of width 320, height 320, and depth 5, requiring 16 bit unsigned integers for further processing.

recursive_conversion.py

This script converts the aforementioned directory hierarchy. Just point it towards a root directory that contains subdirectories with series of images and it will automatically create raw images. For example, assuming that you have folders SER1, SER2, and SER3 in the folder STUD. To concatenate all images in all of these folders automatically, call the script as follows:

$ ./recursive_conversion.py STUD
[ 20.00%] Processing 'IMG001'...
[ 40.00%] Processing 'IMG002'...
[ 60.00%] Processing 'IMG003'...
[ 80.00%] Processing 'IMG004'...
[100.00%] Processing 'IMG005'...
Finished writing data to 'SER1_320x320x20_16u'
[...]
Finished writing data to 'SER2_320x320x15_16u'
[...]
Finished writing data to 'SER3_512x512x10_16u'

Viewing the volume

The easiest option for viewing the volume is ParaView. Open the file using the "Raw (binary)" file type. In the "Properties" section of the data set, enter the correct extents. A 352x512x88 file, for example, has extents of 351x511x87, respectively. Choose "Volume" in the "Representation" section and play with colour map until you see the structures you want to see.

This is what it looks like:

Blood vessels in my brain. If I were a neurologist, I could probably name them as well. Hazarding a guess, I would say that the prominent structure is probably the carotid artery.
Blood vessels

Code

The code is released under the MIT Licence. You may download the scripts from their git repository.

Posted late Monday afternoon, December 29th, 2014 Tags:

One of my pet projects involves detecting speech in audio signals. I am not interested in actual voice recognition but rather want to detect whether a signal is probably a person speaking or something else. Since this is a pet project and I have no knowledge of digital signal processing, I wanted to discover things on my own. This posts contains my very preliminary results.

Processing audio signals with Python

For simplicity, I only tackle uncompressed WAV files for now. SciPy is my friend here:

data = scipy.io.wavfile.read( "Test.wav" )[1]

data now contains the amplitude information of the signal. To make any calculations on the signal more robust, I chunk the signal without overlaps into portions of equal length, which I will refer to as frames. Each frame describes a short amount (say 10ms) of audio. This makes it possible to calculate statistics per frame and obtain distributions of said statistics for the complete recording. For proper signal processing, I should probably use some sort of windowing approach.

Time-domain statistics

For this post, I shall only use some features in the time domain of the signal. I am well aware that Fourier analysis exists, but my knowledge about these techniques is too sparse at the moment—I know about high- and low-pass filters and the theory behind algorithms such as FFT, but that's about it.

By browsing some books about audio analysis, I stumbled over some statistics that seemed worthwhile (because they are easy to implement):

  • Short-term energy
  • Zero-crossing rate
  • Entropy of energy

I only have a cursory understanding of them (for now) so I hope that I got the terminology right.

Short-term energy

The short-term energy of a frame is defined as the sum of the squared absolute values of the amplitudes, normalized by the frame length:

def shortTermEnergy(frame):
  return sum( [ abs(x)**2 for x in frame ] ) / len(frame)

Zero-crossing rate

The zero-crossing rate counts how often the amplitude values of the frame change their sign. This value is then normalized by the length of the frame:

def zeroCrossingRate(frame):
  signs             = numpy.sign(frame)
  signs[signs == 0] = -1

  return len(numpy.where(numpy.diff(signs))[0])/len(frame)

Entropy of energy

By further subdividing each frame into a set of sub-frames, we can calculate their respective short-term energies and treat them as probabilities, thus permitting us to calculate their entropy:

def chunks(l, k):
  for i in range(0, len(l), k):
    yield l[i:i+k]

def entropyOfEnergy(frame, numSubFrames):
  lenSubFrame = int(numpy.floor(len(frame) / numSubFrames))
  shortFrames = list(chunks(frame, lenSubFrame))
  energy      = [ shortTermEnergy(s) for s in shortFrames ]
  totalEnergy = sum(energy)
  energy      = [ e / totalEnergy for e in energy ]

  entropy = 0.0
  for e in energy:
    if e != 0:
      entropy = entropy - e * numpy.log2(e)

  return entropy

Combining everything

My brief literature survey suggested that these three statistics can be rather simply used to determine speech. More precisely, we need to take a look at the following quantities:

  • The coefficient of variation of the short-term energy. Speech tends to have higher values here than non-speech. Some experimentation with a few speech/music files shows that a threshold of 1.0 is OK for discriminating between both classes.
  • The standard deviation of the zero-crossing rate. Again, speech tends to have higher values here. A threshold of 0.05 seems to work.
  • The minimum entropy of energy. This is where speech usually has lower values than music. Also, different genres of music seem to exhibit different distributions here—this seems pretty cool to me. Here, I used a threshold of 2.5 to decide whether an audio file contains speech.

This is in no way backed by anything other than me trying and fiddling with some audio files. I hope this is interesting and useful to someone. I also hope that I will find the time to do more with it! Thus, no special git repository for this one (at least for now), but rather the complete source code. Enjoy:

#!/usr/bin/env python3
#
# Released under the HOT-BEVERAGE-OF-MY-CHOICE LICENSE: Bastian Rieck wrote
# this script. As long you retain this notice, you can do whatever you want
# with it. If we meet some day, and you feel like it, you can buy me a hot
# beverage of my choice in return.

import numpy
import scipy.io.wavfile
import scipy.stats
import sys

def chunks(l, k):
  """
  Yields chunks of size k from a given list.
  """
  for i in range(0, len(l), k):
    yield l[i:i+k]

def shortTermEnergy(frame):
  """
  Calculates the short-term energy of an audio frame. The energy value is
  normalized using the length of the frame to make it independent of said
  quantity.
  """
  return sum( [ abs(x)**2 for x in frame ] ) / len(frame)

def rateSampleByVariation(chunks):
  """
  Rates an audio sample using the coefficient of variation of its short-term
  energy.
  """
  energy = [ shortTermEnergy(chunk) for chunk in chunks ]
  return scipy.stats.variation(energy)

def zeroCrossingRate(frame):
  """
  Calculates the zero-crossing rate of an audio frame.
  """
  signs             = numpy.sign(frame)
  signs[signs == 0] = -1

  return len(numpy.where(numpy.diff(signs))[0])/len(frame)

def rateSampleByCrossingRate(chunks):
  """
  Rates an audio sample using the standard deviation of its zero-crossing rate.
  """
  zcr = [ zeroCrossingRate(chunk) for chunk in chunks ]
  return numpy.std(zcr)

def entropyOfEnergy(frame, numSubFrames):
  """
  Calculates the entropy of energy of an audio frame. For this, the frame is
  partitioned into a number of sub-frames.
  """
  lenSubFrame = int(numpy.floor(len(frame) / numSubFrames))
  shortFrames = list(chunks(frame, lenSubFrame))
  energy      = [ shortTermEnergy(s) for s in shortFrames ]
  totalEnergy = sum(energy)
  energy      = [ e / totalEnergy for e in energy ]

  entropy = 0.0
  for e in energy:
    if e != 0:
      entropy = entropy - e * numpy.log2(e)

  return entropy

def rateSampleByEntropy(chunks):
  """
  Rates an audio sample using its minimum entropy.
  """
  entropy = [ entropyOfEnergy(chunk, 20) for chunk in chunks ]
  return numpy.min(entropy)

#
# main
#

# Frame size in ms. Will use this quantity to collate the raw samples
# accordingly.
frameSizeInMs = 0.01

frequency          = 44100 # Frequency of the input data
numSamplesPerFrame = int(frequency * frameSizeInMs)

data        = scipy.io.wavfile.read( sys.argv[1] )
chunkedData = list(chunks(list(data[1]), numSamplesPerFrame))

variation = rateSampleByVariation(chunkedData)
zcr       = rateSampleByCrossingRate(chunkedData)
entropy   = rateSampleByEntropy(chunkedData)

print("Coefficient of variation  = %f\n"
      "Standard deviation of ZCR = %f\n"
      "Minimum entropy           = %f" % (variation, zcr, entropy) )

if variation >= 1.0:
  print("Coefficient of variation suggests that the sample contains speech")

if zcr >= 0.05:
  print("Standard deviation of ZCR suggests that the sample contains speech")

if entropy < 2.5:
  print("Minimum entropy suggests that the sample contains speech")
Posted late Tuesday evening, December 30th, 2014 Tags: