This is the personal blog of Bastian Rieck. I add posts in irregular intervals about things I consider relevant. Send any feedback, comments etc. to bastian@annwfn.net.

When I joined my new lab at the beginning of this year, I was very happy to see that researchers in the biomedical sciences already do what every scholar needs to do, namely keeping a lab book. It is not only a good way to structure your thoughts, it will also give you an opportunity to go back in time, track your work, and even provide summaries of your work day to your supervisor.

My setup

I actually keep two lab books. One is a “Moleskine Classic Notebook” that I use to scribble down thoughts, write out proof ideas, sketches, and anything else that tickles my fancy. The advantage of this old-school way of keeping notes is that I am independent of any technology, making it possible to quickly jot down something in almost any situation. Doing anything electronically here would just raise the bar for taking down an idea or a sketch. If I need my notes in a searchable form, I can always scan and OCR them—but so far, there was never a need for that, and I fear that most OCR software would be unable to decipher my handwriting even if I endeavour to write well.

For the second lab book, I use a Markdown document. While there is technically not a formal specification of this format, it is reasonably simple to parse, easy to look at (even if your editor is incapable of parsing it directly) and can be transformed into a variety of other formats. Being a big proponent of LaTeX, the idea of separating content and layout (or rendering) resonates well with me. I keep this second document under version control using git, which is another bonus of text-only formats. Even after a few weeks on the road, I can thus easily synchronize any changes to other computers.

The format

Most of my entries are roughly arranged based on their corresponding project. Hence, all first-level headers in my lab book are project names. I also have special section “Dormant” for projects that are, well, dormant. The second-level headers are dates, specified in YY-MM-DD format. This makes it easy for me to figure out what I did on a specific date. Apart from that, I use sub-sub-sections and so on whenever I deem it appropriate. This is how it might look:

Project 1
=========

2018-08-01
----------

Added `foo` widget. Refactored code for `bar` class, but ran into
troubles with function `baz()`.

Project 2
=========

2018-08-01
----------

Read paper on topic 1. Checked out example.com for some additional
results. Loss term in appendix is not sufficiently explained. Need
to follow up on this.

vim integration

If you are a vim user, there are several plugins that make working with Markdown easier. The obvious choice for syntax highlighting is Tim Pope's vim-markdown. The one I found most useful is VOoM. This plugin gives you a second pane for many document types (not only Markdown!) in which you can easily see the document structure, jump to specific sections, and so on. To make VOoM detect the file type automatically, add this to your .vimrc:

let g:voom_ft_modes = {'markdown': 'markdown'}

This will make the plugin aware of any Markdown document. Use the command :Voom to create an outline of your document. This can be toggled with :VoomToggle.

That’s it—a short and simple way of keeping a lab book. Until next time!

Posted late Wednesday afternoon, August 1st, 2018 Tags:

CMake arguably makes life a lot easier for developers and users alike. Instead of fiddling around with autotools madness, we ideally now just issue the following sequence of commands to build a piece of software:

$ mkdir build
$ cd build
$ cmake ../
$ make

However, often things are not that easy. I have found that sometimes, anarchy reigns supreme in the world of CMake—different modules, different ways of doing the same thing, and a complete lack of enforced standards are big hurdles in using or integrating software written by other people.

In this article, I want to outline some rules for writing a simple CMake find module for your own software or for unsupported libraries. Moreover, I want to present some snippets for common situations. The code for this blog post is available on GitHub.

Writing a CMake module

CMake modules are the prime reason for its success. Briefly put, they permit you to find other libraries that you need to link against. A good module can make your life easy. A bad module can lead to weird error messages.

Finding a header-only library

For an easy start, assume that we are looking for a header-only library. The same structure will be used when looking for other libraries, by the way—it will just be a little bit longer.

Suppose we want to look for a header-only library foo that has one main header foo.h within some subdirectory foo. Yes, the creativity in names is strong here, but bear with me. The module below will look for this package in a standardized manner:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{FOO_DIR} )
  SET( FOO_DIR "$ENV{FOO_DIR}" )
ENDIF()

FIND_PATH(
  FOO_INCLUDE_DIR
    foo/foo.h
  HINTS
    ${FOO_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( FOO DEFAULT_MSG
  FOO_INCLUDE_DIR
)

IF( FOO_FOUND )
  SET( FOO_INCLUDE_DIRS ${FOO_INCLUDE_DIR} )

  MARK_AS_ADVANCED(
    FOO_INCLUDE_DIR
    FOO_DIR
  )
ELSE()
  SET( FOO_DIR "" CACHE STRING
    "An optional hint to a directory for finding `foo`"
  )
ENDIF()

Let us take a look at this in more detail. The module first imports a CMake standard module, the FindPackageHandleStandardArgs macro, which permits us to delegate and standardize package finding. Next, we check the environment variables of the client for FOO_DIR. The user can specify such a variable to point to a non-standard include directory for the package, such as $HOME, or any directory that is not typically associated with libraries. A classical use case is the local installation on a machine where you lack root privileges.

In any case, information about the variable is being used and a new variable called FOO_DIR is not set in CMake. Next, we supply it to the FIND_PATH function of CMake. This function tries to find a specified path or file (foo/foo.h in our case) while looking in a standardized set of directories. See the CMake documentation for more details.

Information about the path is stored in FOO_INCLUDE_DIR. The nice thing is that we do not need to evaluate this variable, because the function FIND_PACKAGE_HANDLE_STANDARD_ARGS handles it: using some short descriptor (FOO) of the package, we can hand all the paths that we need to the function and it will automatically result in the appropriate status or warning message. Moreover, it will set the variable FOO_FOUND if the package was found.

If this is the case, we set FOO_INCLUDE_DIRS to point to the path that we found before. Notice that it is customary to use the plural form here because a package might conceivably have multiple include paths. Using the plural in all cases makes it simpler for clients to employ our module because they can just issue

TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

somewhere in their code.

As a final step, we hide the variables by marking them as advanced, so that CMake users have to explicitly toggle them. This is merely for not cluttering up the output of cmake-gui.

This is the most basic skeleton for finding a header-only library. To actually use this module, you can now just issue

FIND_PACKAGE( foo REQUIRED )
TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

in your code. Provided that CMake knows where to look for modules, this is all you need to do. To extend the module search path, just create a directory cmake/Modules in your main project folder and add the following lines to the main CMakeLists.txt:

LIST( APPEND CMAKE_MODULE_PATH
  ${CMAKE_SOURCE_DIR}/cmake/Modules
)

A caveat: the FIND_PACKAGE call is one of the few parts in CMake where capitalization matters. If you do FIND_PACKAGE( FOO ), the CMake parser will look for a file named FindFOO.cmake. Hence, in this case, since we are doing FIND_PACKAGE( foo ), the module is named Findfoo.cmake. Notice that I am strongly encouraging you to use uppercase spelling in all the variables that you export, as it makes life easier and developers do not have to think about the proper capitalization.

Finding a shared object or a static library

As a slightly more advanced topic, suppose you are looking for one library called bar that comes with an include directory plus a shared object. This requires some additions to the code above:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{BAR_DIR} )
  SET( BAR_DIR "$ENV{BAR_DIR}" )
ENDIF()

FIND_PATH(
  BAR_INCLUDE_DIR
    bar/bar.h
  HINTS
    ${BAR_DIR}
)

FIND_LIBRARY( BAR_LIBRARY
  NAMES bar
  HINTS ${BAR_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( BAR DEFAULT_MSG
  BAR_INCLUDE_DIR
  BAR_LIBRARY
)

IF( BAR_FOUND )
  SET( BAR_INCLUDE_DIRS ${BAR_INCLUDE_DIR} )
  SET( BAR_LIBRARIES ${BAR_LIBRARY} )

  MARK_AS_ADVANCED(
    BAR_LIBRARY
    BAR_INCLUDE_DIR
    BAR_DIR
  )
ELSE()
  SET( BAR_DIR "" CACHE STRING
    "An optional hint to a directory for finding `bar`"
  )
ENDIF()

The most salient change is the use of FIND_LIBRARY to find, you guessed it, the library. The optional NAMES argument can be used to supply more names for a library, which is useful if a library ships with different flavours, such as bar_cxx or bar_hl.

Similar to what I wrote above, I am also exporting the single library as BAR_LIBRARIES in order to simplify usage. In the best case, clients can just use

TARGET_LINK_LIBRARIES( example ${BAR_LIBRARIES} )

and the code will continue to work even if, some years down the road, bar suddenly starts shipping with two libraries. Again, I advocate for having a sane and simple standard rather than having to think hard about how to use the darn module.

Other than that, this works exactly the same as the previous example from above!

Things that frequently need doing

Having written almost exhaustively about how to find libraries, I want to end this post with several common tasks. For each of them, I have seen various kinds of weird workarounds, so I would like to point out a more official way.

Versions checks

Sometimes, it is unavoidable to support previous versions of CMake, or detect whether a specific version of a library has been installed. For this purpose, there are special VERSION comparison operators. Do not write your own code to do so but rather do something like this:

IF( CMAKE_CXX_COMPILER_VERSION VERSION_LESS "5.4.1" )
  MESSAGE( STATUS "This compiler version might cause problems" )
ENDIF()

Similarly, there are VERSION_EQUAL and VERSION_GREATER checks. They are tested and bound to work—even when you are comparing packages and their versions.

Detecting an operating system

Your code can be as agnostic with respect to the operating system as you want, but there might still be that one situation where you need to have a way of determining whether your code is being compiled under a certain operating system.

This is easy to accomplish:

IF( APPLE )
  MESSAGE( STATUS "Running under MacOS X" )
# Watch out, for this check also is TRUE under MacOS X because it
# falls under the category of Unix-like.
ELSEIF( UNIX )
  MESSAGE( STATUS "Running under Unix or a Unix-like OS" )
# Despite what you might think given this name, the variable is also
# true for 64bit versions of Windows.
ELSEIF( WIN32 )
  MESSAGE( STATUS "Running under Windows (either 32bit or 64bit)" )
ENDIF()

Detecting a compiler

Sometimes, you need to disable or enable certain things depending on the compiler. Suppose you want to figure out the version of the C++ compiler and its type:

IF( CMAKE_CXX_COMPILER_ID MATCHES "GNU" )
  MESSAGE( STATUS "g++ for the win!" )
  MESSAGE( STATUS ${CMAKE_CXX_COMPILER_VERSION} )
ENDIF()

For LLVM/clang, you can use:

IF( CMAKE_CXX_COMPILER_ID MATCHES "Clang" )
  MESSAGE( STATUS "LLVM, yeah!" )
ENDIF()

Please refer to the documentation for more IDs.

Enabling C++11 or C++14

While it is possible (and also necessary for older versions) to enable C++11 by modifying CMAKE_CXX_FLAGS directly, the standard way involves only two lines:

SET( CMAKE_CXX_STANDARD 11 )
SET( CMAKE_CXX_STANDARD_REQUIRED ON )

This is guaranteed to work with all supported compilers.

Coda

I hope this article convinced you of the power of CMake and of the need for standardizing its usage. You can find the code of the modules, plus some boilerplate CMake code, in the GitHub repository for this post.

Have fun using CMake, until next time!

Update (2018-05-30): Using TARGET_INCLUDE_DIRECTORIES as suggested on HN. Thanks!

Posted late Monday evening, May 28th, 2018

When writing Aleph, my library for persistent homology and computational topology, I decided to add a few Python bindings one idle afternoon. Using the magnificent pybind11 library, this was easier than I anticipated. Much to my chagrin, though, it turns out that using such a bindings with Python interpreter is more complicated if you want to do it the right way.

Of course, having built a single .so file that contains the code of your module, the easy way is to modify the PYTHONPATH variable and just point it to the proper path. But I wanted to do this right, of course, and so, together with Max, I started looking at ways how to simplify the build process.

The situation

I am assuming that you have installed pybind11 and wrote some small example that you want to distribute now. If you are unsure about this, please refer to my example repository for this blog post.

The module may look like this:

#include <pybind11/pybind11.h>

#include <string>

class Example
{
public:
  Example( double a )
    : _a( a)
  {
  }

  Example& operator+=( const Example& other )
  {
    _a += other._a;
    return *this;
  }

private:
  double _a;
};

PYBIND11_MODULE(example, m)
{
  m.doc() = "Python bindings for an example library";

  namespace py = pybind11;

  py::class_<Example>(m, "Example")
    .def( py::init( []( double a )
            {
              return new Example(a);
            }
          )
    )
    .def( "__iadd__", &Example::operator+= );
}

Pretty standard stuff so far: one class, with one constructor and one addition operator exposed (for no particular reason whatsoever).

Building everything

Building such a module is relatively easy with CMake if you are able to find the pybind11 installation (the example repository has a ready-to-use module for this purpose). Since we want to do this the right way, we need to check whether the Python interpreter exists:

SET( PACKAGE_VERSION "0.1.1" )

FIND_PACKAGE( pybind11 REQUIRED )

FIND_PACKAGE(PythonInterp 3)
FIND_PACKAGE(PythonLibs   3)

Next, we can build the library using CMake. Some special treatment for MacOS X is required (obviously) in order to link the module properly.

IF( PYTHONINTERP_FOUND AND PYTHONLIBS_FOUND AND PYBIND11_FOUND )
  INCLUDE_DIRECTORIES(
    ${PYTHON_INCLUDE_DIRS}
    ${PYBIND11_INCLUDE_DIRS}
  )

  ADD_LIBRARY( example SHARED example.cc )

  # The library must not have any prefix and should be located in
  # a subfolder that includes the package name. The setup will be
  # more complicated otherwise.
  SET_TARGET_PROPERTIES( example
    PROPERTIES
      PREFIX ""
      LIBRARY_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/example"
  )

  # This is required for linking the library under Mac OS X. Moreover,
  # the suffix ensures that the module can be found by the interpreter
  # later on.
  IF( APPLE )
    SET_TARGET_PROPERTIES( example
      PROPERTIES
        LINK_FLAGS "-undefined dynamic_lookup"
        SUFFIX     ".so"
    )
  ENDIF()


  # Place the initialization file in the output directory for the Python
  # bindings. This will simplify the installation.
  CONFIGURE_FILE( example/__init__.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/__init__.py
  )

  # Ditto for the setup file.
  CONFIGURE_FILE( example/setup.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/setup.py
  )
ENDIF()

The salient points of this snippet are:

  • Changing the output directory of the library to a subordinate directory. We will later see that this simplifies the installation.
  • Configuring (and copying) __init__.py and setup.py files and make them available in the build directory.

__init__.py is rather short:

from .example import *

This will tell the interpreter later on to import all symbols from the example module in the current directory.

The setup.py is slightly more complicated:

from distutils.core import setup

import sys
if sys.version_info < (3,0):
  sys.exit('Sorry, Python < 3.0 is not supported')

setup(
  name        = 'cmake_cpp_pybind11',
  version     = '${PACKAGE_VERSION}', # TODO: might want to use commit ID here
  packages    = [ 'example' ],
  package_dir = {
    '': '${CMAKE_CURRENT_BINARY_DIR}'
  },
  package_data = {
    '': ['example.so']
  }
)

The important thing is the package_data dictionary. It specifies the single .so file that is the result of the CMake build process. This ensures that the file will be installed alongside the __init__.py file.

Testing it

First, we have to build our package:

$ mkdir build
$ cd build
$ cmake ../
$ make
$ cd example
$ ls
example.so  __init__.py  setup.py
$ sudo python setup.py install

Afterwards, the package should be available for loading:

$ python
Python 3.6.5 (default, Apr 12 2018, 22:45:43)
[GCC 7.3.1 20180312] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
>>> example.Example(0.0)
<example.example.Example object at 0x7f54e7f77308>
>>> example.Example("Nope")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. example.example.Example(arg0: float)

Invoked with: 'Nope'

Everything seems to work as expected.

Conclusion

This is certainly not the easiest or most modern way to install your own Python module. However, it is the easiest one in case you already have a large project that exports Python bindings. In a truly optimal world, we would use setuptools directly and build wheel packages—but this will have to wait for another article.

In the meantime, the code for this example is available on GitHub.

Happy packaging, until next time!

Posted Tuesday afternoon, May 1st, 2018 Tags:

Even though C++ is still my favourite programming language, I am always amazed by the sheer number of headers I have to include to get something done. Most of my source files could contain #include <vector> at the beginning, because this is most likely what I am going to include anyway. Other headers are not treated this way. In fact, I am displaying a wanton disregard for many of the other headers. For example, I have never consciously use the execution header in real-world code.

I thus started wondering how other projects fared in that regard. So I cloned several of the larger C++ repositories on GitHub. More precisely, I started with the following repositories:

Counting individual headers

In total, these projects comprise more than 2 million lines of code—a reasonably-sized sample, I would say. To figure out how these projects use headers, I first extracted all STL headers from all files and counted their occurrences. This resulted in the following histogram (the counts are relative):

Histogram
of STL header occurrences

Pretty interesting, I would say. This is a nice long-tail distribution for which a few headers are used much more often than the rest. In fact, for these repositories, only four headers make up more than 50% of the usage:

  • vector
  • string
  • memory
  • utility

For vector and string, this is not surprising. Virtually every C++ programmer uses vector for almost anything. The same goes for string. Similarly, memory is not so surprising as it contains the different smart pointer classes—most prominently, shared_ptr. The last one of the list, utility, was slightly unexpected for me. It contains things such as std::make_pair and std::move. At least the latter one is required for any class that does its own memory management.

At the tail of the distribution, the more exotic headers await. The stack header, for example, appears not to be used in these projects too often, while the future header comes in dead last. I must confess that I have not used in real-world projects so far because I did not yet have to deal with asynchronous operations. The lack of enthusiasm for the regex header is somewhat sad but maybe this is to be expected in a language that does not really encourage the use of regular expressions? Also, C++ regular expressions are said to perform worse than their counterparts in other languages. To what extent the unfamiliarity of C++ programmers with regular expressions might contribute to this, I cannot say.

Counting pairs of headers

Let’s delve into another aspect of the headers. In my code, I noticed that some headers are almost always used together. For example, if there is an algorithm header, there is often also a functional header. Extending this to projects, I thought that it might be interesting to analyse the co-occurrence patterns of STL headers. To this end, I counted how often pairs of headers are being included the same file. This naturally gives rise to a co-occurrence matrix, in which rows and columns indicate headers, and the value indicates how often those header occur in the same file. If headers are sorted by their counts, this results in a beautiful picture:

Co-occurrence
matrix of STL headers

This matrix tells us something about the universality of certain headers. The vector header, for example, co-occurs with almost every other header to some extent because vectors are such fundamental data types. The typeinfo header, on the other hand, is so very specific that is only co-occurs with typeindex. In fact, the structure of the matrix, i.e. the many dark cells, indicates that many combinations are highly unlikely to occur “in the wild”.

Some of the combinations really tell a story, though. For example, queue is used in conjunction with thread (possibly to implement patterns for multi-thread environments), but also with stack (possibly to implement different traversal strategies of implicit or explicit graph structures in these projects). I also see another pattern of my own code, namely the pair unordered_map and unordered_set. I tend to require either both of them (the set for iteration and storage, the map for, well associating more information with individual objects) or none of them.

Conclusion

As a next step, it would be interesting to see whether the co-occurrence of certain headers makes it possible to guess the domain of a C++ program, just like certain pairs of words (I guess I should rather speak of bigrams here, to use the NLP term) are more indicative of certain genres of literature. Treating code like literature would certainly make for an interesting art project.

The code for this project is available on GitHub. You only have to supply the repositories for scanning.

Happy coding, until next time!

Update (2018-04-06): Changed the title because I was using the term meta-analysis incorrectly. Thanks, HN!

Posted Tuesday evening, April 3rd, 2018 Tags:

This article details some of my experiences with Travis CI, aiming to make this service more accessible to those who—like me— do not consider themselves to be “proper” software developers.

Prelude and history

I am by no means a professional software developer, but my job offers me the possibility to churn out some code more often than not. Naturally, I want my code to be used by other people—especially when said code is part of a publication. There is just one catch: code does not exist in a vacuum but is part of some ecosystem. Even when you are developing everything in Python, a language that arguably tries to hide a lot of complexity so that one is able to focus on the task at hand, your code is not an island. Save for extremely trivial programs, you will probably be using libraries, such as the trinity of data science, i.e. import numpy as np, import scipy as sp, and last, but certainly not least, import matplotlib.pyplot as plt.

Your users, however, might have radically different configurations—it is well possible that some have more recent versions of all packages, while others will still be running the first release of Ubuntu. Some of them, and this is really hard for me to admit, may not even be running the best Linux distribution in the world. Some of them may not even be running a Linux operating system!

Given this untidy state of affairs, how can you at least make some notion of credible attempt at pretending to support your code for more than your machine only? That is where continuous integration comes in! Originally a technique in test-driven development, CI is supposed to make your lifer a tad easier by ensuring that the changes you make do not stop the project in total from working. This is achieved by integrating your changes often into the current master branch of your repository, and checking that all changes you make do not break the build.

Professional software development companies set up their own build servers, thereby ensuring that the product still is built as per spec. How does this pertain exactly to us academics?

Software development in academia

Most software developed in academia is held together by the same ingredients, viz. hope, the tears of Ph.D. students, and sheer faith. Code is supposed to work until the deadline and, preferably, until the paper has been accepted. For many students, git is something one has heard of in that pesky software engineering class. The full power of version control systems is often not appreciated, even by research group leaders. Let me describe you a nice scenario: Suppose you are working towards the important deadline. Suddenly, one person (Alice) in your team discovers that the main algorithm has a bug. “Woe is me”, you say. But Alice, being very good at what she does, fixes the bug and after a few tense minutes she announces that the results you report in the paper still hold. You submit the paper early and go home to sleep.

This is the kind of magic that continuous integration can bring to your projects if you care to use it. Read on if you are interested.

The magic

In a nutshell, Travis CI “merely” provides a set of virtual machines (featuring different operating systems even!) on which you can build and run your code. And the best thing is that this works automatically, whenever you update your repository via git push. No more worrying about whether that small change you did might have changed all the calculations—instead, a rather blissful existence.

Of course, this only works if you invest some time in setting up your project. At its core, Travis looks for a file called .travis.yml in your repository. It is this file that allows you to configure the steps that Travis performs for each update. This is what a simple file may look like if your project uses CMake as its build system and has no additional dependencies:

language: cpp

os:
  - linux

compiler:
  - gcc

script:
  - mkdir build
  - cd build
  - cmake ../
  - make

After each git push, Travis will dutifully clone the repository and execute all steps that you provide in the script section. Here, this means creating a separate build directory and checking that the project can be built. This is not much, of course, and does not really help in catching problems with an algorithm, but it is a start. Suppose you want to ensure that your project is also compilable with clang. Just change the corresponding section:

compiler:
  - clang
  - gcc

Or suppose that you want to add Mac OS support:

os:
  - linux
  - osx

It really is that simple. Travis will now create a build matrix, consisting of two operating systems (Linux and MacOS X) and two compilers (clang and gcc).

But I want more magic!

Coming back to the scenario above, how can Travis help in this case? Well, of course you have to provide the tests that Travis needs to run. For example, I like to create a special test build target for CMake and let it execute my unit tests for me. Unit tests range from banal checks of classes to longer programs that compare expected results of algorithms to current ones. You will have to take my word for it, but tests like this have helped me multiple times in the past for various publications. If you are interested in how they may look, please refer to Aleph, my library for experimenting with topological data analysis. I do not claim that the code is perfect, but the tests subdirectory contains numerous unit tests that may drive home the point I am trying to make here. The best thing is that the configuration file is not getting needlessly complicated. Due to the nice feaures of CMake, it is again sufficient to extend the script section:

script:
  - mkdir build
  - cd build
  - cmake ..
  - make
  - CTEST_OUTPUT_ON_FAILURE=1 make test

The ugly last line is only necessary in order to force the testing harness of CMake to be a little more verbose. You can see the output of Travis for this project here, and you will see that the unit tests are always executed—giving me peace and tranquility to some extent.

So?

That is the basic gist of Travis. I think that using such a service can be beneficial for academical projects. Not only does give you the confidence that your code is doing something right, it can also be used to promote your research—by providing a repository that is tested under different platforms, with different compilers, you make it easier for people to actually use your cool algorithm (and cite you, of course, lest we forget about that).

Dependencies

Now you are probably huffing at this article because your program is more complicated and has some dependencies. Well, Travis has you covered to some extent.

Travis permits you to change the build environment to some extent. For example, Aleph uses Boost and Eigen. Since those exist as packages under Ubuntu, the default Linux distribution used by Travis, I can easily install them in .travis.yml:

language: cpp

os: linux
sudo: false

addons:
  apt:
    packages:
      - libboost-dev
      - libeigen3-dev

For Mac OS X, support for Homebrew is available, but the use is slightly more complex:

before_install:
  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install boost; fi
  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install eigen; fi

Overall, though, this works just fine.

Troubles

I do not want to sound overly enthusiastic here. Adding support for multiple operating systems and configurations can be painful. Heck, I often have to commit multiple times because the build just breaks for some darn reason. It is a fine balance that one has to achieve here: code should be usable, but your time is not infinite.

Travis unfortunately makes my life sometimes harder than it has to be. While I am grateful for the free (free!) service they offer, some things just irk me:

  • The default Ubuntu image is old. Like really, really old, viz. the version of Ubuntu from frigging 2014. As an Arch Linux user, I am stunned. Some of my time is thus spent correcting for some arcane problems with old package versions.

  • The OS X images are strange: the update process of brew just stalls, and the same goes for the builds themselves sometimes. I get many e-mails that tell me that my build errored (which in Travis lingo does not mean that the build failed, but rather that some virtual machine could not be started).

  • The number of available images is very small. Ideally, I would like to be able to check my software on even more variants of Ubuntu but also on other flavours of Linux. And what about BSD? One of my users may want to install the software while running an old version of OpenBSD on toaster, so where is the support for that?

  • Do not get me started on supporting older and newer package versions…

Nonetheless, I am grateful for the Travis CI engineers. They are doing a heck of a job just so that I can pretend that my GitHub projects are actually useful to someone out there. Thank you (I really mean it)!

Coda

All in all, Travis CI has been very beneficial for my projects. I sleep a little bit better knowing that I do not have to worry about breaking my algorithms just by changing the interface somewhere else. Even though it has it shortcomings—but nothing is perfect—I urge you to consider using it for your projects and your publications!

By the way: if you are a hardcore Bitbucket user, take a look at Pipelines. They provide similar functionality for those of us that are not living in the land of GitHub.

Happy coding/testing/integrating, until next time!

Posted late Tuesday evening, March 13th, 2018

ETH Zurich offers multiple wireless networks around campus. Our local IT people drilled into me that I should prefer to use the eth-5 SSID whenever possible. Obviously I had to figure out a way to support this profile for netctl. If you count yourself among the few, the happy few, the band of users if this fine piece of software, here is what you have to do:

  • Create a new file /etc/netctl/WiFi_eth5
  • Paste the following file, while taking care to replace $USER with your ETH username and $PASSWORD with your ETH network password (this is not your regular login password; I am talking about the other one), and $INTERFACE with the WiFi interface, e.g. wlp4s0
  • Activate the new profile with netctl start WiFi_eth5
Description='ETH'
Interface='$INTERFACE'
Connection='wireless'
IP='dhcp'
ESSID='eth-5'
Security='wpa-configsection'
WPAConfigSection=(
  'ssid="eth-5"'
        'key_mgmt=WPA-EAP'
  'eap=PEAP'
  'proto=WPA RSN'
  'identity="$USER"'
  'password="$PASSWORD"'
  'phase2="auth=MSCHAPV2"'
)

Happy surfing, until next time!

Posted late Sunday evening, March 11th, 2018 Tags:

When I was writing my dissertation, I was planning on creating all images as vector graphics if possible. They scale better and give the document an altogether published look. Some of the figures containing meshes (of some 3D objects for an IEEE TVCG publication about the merits of topological data analysis) turned out to be problematic, though. The state-of-the-art tool for mesh analysis, the aptly named MeshLab, is not capable of exporting vector graphics. Luckily, the Inkscape vector graphics editor, which I already used for other graphics, shipped with a plugin for “rendering” meshes in Wavefront OBJ format.

There were only two snags with the plugin:

  1. It seemed to be unmaintained or at the very least, somewhat undocumented.
  2. It did not support colours.

I could not easily change the first point, but boy, do I love me some fancy colours, in particular when I am talking about the curvature of meshes, for example. So I added colour support for the plugin and re-christened it to the highly creative name of 3D Polyhedron with vertex colours. You can find the code on GitHub. In contrast to most of my projects, it is released under the GPL. This is due to the fact that I was modifying the code of the original plugin.

Usage instructions are available on GitHub, so I would rather end this brief post with an example of the output of the plugin.

A torus, colour-coded by its
mean curvature

Until next time!

Posted late Monday evening, March 5th, 2018 Tags:

After defending my thesis last year, I wanted to show my newfound admiration for all things LaTeX. Since I spent a lot of time getting the formatting just right for my purposes (following typographical advice of numerous sources), I decided to create some LaTeX packages for research-based documents.

latex-mimore

latex-mimore is a minimal and modern template for reports, such as the ones you have to do for a seminar. You can also use the class for longer documents, such as a bachelor's thesis, but I would recommend using latex-mimosis, which I describe below. If you clone the repository and set up your LaTeX environment correctly, using the class is as easy as writing

\documentclass{mimore}

as the preamble to your document. Please take a look at the GitHub repository for more details. This is what a document looks like if formatted with latex-mimore.

latex-mimosis

latex-mimosis is the bigger sibling of latex-mimore. It is meant for your Ph.D. dissertation thesis, your master's thesis, or your bachelor's thesis. Again, using it is as easy as adding

\documentclass{mimosis}

to your preamble. Please take a look at the GitHub repository for more details. This is what a document looks like if formatted with latex-mimosis.

Design considerations

Both packages have been carefully crafted. They aim to be…

  • clean: there is no LaTeX trickery involved; the page is neatly divided using standard typesetting practices. Adjustments to the defaults are documented and make sense. At least to me.
  • minimal: there are no unnecessary adjustments of anything in there, no spurious decorations. The layout is inspired by Robert Bringhurst and his ideas about typography. You can also detect a hint of Edward Tufte in there, even though I am not such a big fan of the layout in his books; at least not for my own dissertation.
  • modern: the template should be pleasing to the eye, without any of the cruft that is a remnant of typewriter times.

The templates are released under an MIT licence and I would love to hear your feedback about them. If anything is missing or can be improved, please open an issue in the corresponding repository.

Happy TeXing, until next time!

Posted at lunch time on Thursday, February 15th, 2018 Tags:

If you are like me and a long-term git user, you will probably accumulate numerous remotes per repository (maybe one for your private server, one for work, and so on).

Pushing to all of them can get a bit tedious, so I defined a new commnad gpa, which is of course short for git push all:

git remote | xargs -L1 git push --all

Let us dissect this quickly:

  • The need for git remote is obvious because it permits us to enumerate all remotes
  • We need xargs -L1 in order to format the input to fit into one line
  • Finally, the git push --all pushes all branches to the specified remote

I have alias gpa='git remote | xargs -L1 git push --all' in my ZSH configuration, so that I can use this command globally.

Happy committing, until next time!

Update: Conrad was kind enough to point out that one can just as well modify .gitconfig accordingly and make this command an alias of the git command:

[alias]
  pushall = !git remote | xargs -L1 git push --all

Thanks!

Posted late Tuesday evening, February 13th, 2018 Tags:

I have taken up a new position as a postdoctoral researcher in the Machine Learning & Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Z├╝rich. In hindsight—as is always the case—this now feels to be the logical move. During the last year, my research started to touch more and more upon concepts and issues in machine learning, and I let myself be drawn more and more into this rich and exciting field. I will of course try to apply my knowledge in scientific and information visualization in my new job as well and I hope that there will be many interesting opportunities for papers that span multiple fields.

Moreover, I am really excited to be part of a group that actively uses GitHub in their scientific work. Aleph, my topological data analysis framework, will finally have some interesting company. Until my own contributions start to make an appearance in the repositories of my lab, please take a look at the existing projects of the MLCB Lab on GitHub.

At the same time, it goes without saying that the views expressed on this website are entirely my own and have neither been reviewed nor endorsed by ETH Zurich.

Posted at lunch time on Wednesday, January 10th, 2018 Tags: