When writing Aleph, my library for persistent homology and computational topology, I decided to add a few Python bindings one idle afternoon. Using the magnificent pybind11 library, this was easier than I anticipated. Much to my chagrin, though, it turns out that using such a bindings with Python interpreter is more complicated if you want to do it the right way.

Of course, having built a single .so file that contains the code of your module, the easy way is to modify the PYTHONPATH variable and just point it to the proper path. But I wanted to do this right, of course, and so, together with Max, I started looking at ways how to simplify the build process.

The situation

I am assuming that you have installed pybind11 and wrote some small example that you want to distribute now. If you are unsure about this, please refer to my example repository for this blog post.

The module may look like this:

#include <pybind11/pybind11.h>

#include <string>

class Example
{
public:
  Example( double a )
    : _a( a)
  {
  }

  Example& operator+=( const Example& other )
  {
    _a += other._a;
    return *this;
  }

private:
  double _a;
};

PYBIND11_MODULE(example, m)
{
  m.doc() = "Python bindings for an example library";

  namespace py = pybind11;

  py::class_<Example>(m, "Example")
    .def( py::init( []( double a )
            {
              return new Example(a);
            }
          )
    )
    .def( "__iadd__", &Example::operator+= );
}

Pretty standard stuff so far: one class, with one constructor and one addition operator exposed (for no particular reason whatsoever).

Building everything

Building such a module is relatively easy with CMake if you are able to find the pybind11 installation (the example repository has a ready-to-use module for this purpose). Since we want to do this the right way, we need to check whether the Python interpreter exists:

SET( PACKAGE_VERSION "0.1.1" )

FIND_PACKAGE( pybind11 REQUIRED )

FIND_PACKAGE(PythonInterp 3)
FIND_PACKAGE(PythonLibs   3)

Next, we can build the library using CMake. Some special treatment for MacOS X is required (obviously) in order to link the module properly.

IF( PYTHONINTERP_FOUND AND PYTHONLIBS_FOUND AND PYBIND11_FOUND )
  INCLUDE_DIRECTORIES(
    ${PYTHON_INCLUDE_DIRS}
    ${PYBIND11_INCLUDE_DIRS}
  )

  ADD_LIBRARY( example SHARED example.cc )

  # The library must not have any prefix and should be located in
  # a subfolder that includes the package name. The setup will be
  # more complicated otherwise.
  SET_TARGET_PROPERTIES( example
    PROPERTIES
      PREFIX ""
      LIBRARY_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/example"
  )

  # This is required for linking the library under Mac OS X. Moreover,
  # the suffix ensures that the module can be found by the interpreter
  # later on.
  IF( APPLE )
    SET_TARGET_PROPERTIES( example
      PROPERTIES
        LINK_FLAGS "-undefined dynamic_lookup"
        SUFFIX     ".so"
    )
  ENDIF()


  # Place the initialization file in the output directory for the Python
  # bindings. This will simplify the installation.
  CONFIGURE_FILE( example/__init__.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/__init__.py
  )

  # Ditto for the setup file.
  CONFIGURE_FILE( example/setup.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/setup.py
  )
ENDIF()

The salient points of this snippet are:

  • Changing the output directory of the library to a subordinate directory. We will later see that this simplifies the installation.
  • Configuring (and copying) __init__.py and setup.py files and make them available in the build directory.

__init__.py is rather short:

from .example import *

This will tell the interpreter later on to import all symbols from the example module in the current directory.

The setup.py is slightly more complicated:

from distutils.core import setup

import sys
if sys.version_info < (3,0):
  sys.exit('Sorry, Python < 3.0 is not supported')

setup(
  name        = 'cmake_cpp_pybind11',
  version     = '${PACKAGE_VERSION}', # TODO: might want to use commit ID here
  packages    = [ 'example' ],
  package_dir = {
    '': '${CMAKE_CURRENT_BINARY_DIR}'
  },
  package_data = {
    '': ['example.so']
  }
)

The important thing is the package_data dictionary. It specifies the single .so file that is the result of the CMake build process. This ensures that the file will be installed alongside the __init__.py file.

Testing it

First, we have to build our package:

$ mkdir build
$ cd build
$ cmake ../
$ make
$ cd example
$ ls
example.so  __init__.py  setup.py
$ sudo python setup.py install

Afterwards, the package should be available for loading:

$ python
Python 3.6.5 (default, Apr 12 2018, 22:45:43)
[GCC 7.3.1 20180312] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
>>> example.Example(0.0)
<example.example.Example object at 0x7f54e7f77308>
>>> example.Example("Nope")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. example.example.Example(arg0: float)

Invoked with: 'Nope'

Everything seems to work as expected.

Conclusion

This is certainly not the easiest or most modern way to install your own Python module. However, it is the easiest one in case you already have a large project that exports Python bindings. In a truly optimal world, we would use setuptools directly and build wheel packages—but this will have to wait for another article.

In the meantime, the code for this example is available on GitHub.

Happy packaging, until next time!

Posted Tuesday afternoon, May 1st, 2018 Tags:

CMake arguably makes life a lot easier for developers and users alike. Instead of fiddling around with autotools madness, we ideally now just issue the following sequence of commands to build a piece of software:

$ mkdir build
$ cd build
$ cmake ../
$ make

However, often things are not that easy. I have found that sometimes, anarchy reigns supreme in the world of CMake—different modules, different ways of doing the same thing, and a complete lack of enforced standards are big hurdles in using or integrating software written by other people.

In this article, I want to outline some rules for writing a simple CMake find module for your own software or for unsupported libraries. Moreover, I want to present some snippets for common situations. The code for this blog post is available on GitHub.

Writing a CMake module

CMake modules are the prime reason for its success. Briefly put, they permit you to find other libraries that you need to link against. A good module can make your life easy. A bad module can lead to weird error messages.

Finding a header-only library

For an easy start, assume that we are looking for a header-only library. The same structure will be used when looking for other libraries, by the way—it will just be a little bit longer.

Suppose we want to look for a header-only library foo that has one main header foo.h within some subdirectory foo. Yes, the creativity in names is strong here, but bear with me. The module below will look for this package in a standardized manner:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{FOO_DIR} )
  SET( FOO_DIR "$ENV{FOO_DIR}" )
ENDIF()

FIND_PATH(
  FOO_INCLUDE_DIR
    foo/foo.h
  HINTS
    ${FOO_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( FOO DEFAULT_MSG
  FOO_INCLUDE_DIR
)

IF( FOO_FOUND )
  SET( FOO_INCLUDE_DIRS ${FOO_INCLUDE_DIR} )

  MARK_AS_ADVANCED(
    FOO_INCLUDE_DIR
    FOO_DIR
  )
ELSE()
  SET( FOO_DIR "" CACHE STRING
    "An optional hint to a directory for finding `foo`"
  )
ENDIF()

Let us take a look at this in more detail. The module first imports a CMake standard module, the FindPackageHandleStandardArgs macro, which permits us to delegate and standardize package finding. Next, we check the environment variables of the client for FOO_DIR. The user can specify such a variable to point to a non-standard include directory for the package, such as $HOME, or any directory that is not typically associated with libraries. A classical use case is the local installation on a machine where you lack root privileges.

In any case, information about the variable is being used and a new variable called FOO_DIR is not set in CMake. Next, we supply it to the FIND_PATH function of CMake. This function tries to find a specified path or file (foo/foo.h in our case) while looking in a standardized set of directories. See the CMake documentation for more details.

Information about the path is stored in FOO_INCLUDE_DIR. The nice thing is that we do not need to evaluate this variable, because the function FIND_PACKAGE_HANDLE_STANDARD_ARGS handles it: using some short descriptor (FOO) of the package, we can hand all the paths that we need to the function and it will automatically result in the appropriate status or warning message. Moreover, it will set the variable FOO_FOUND if the package was found.

If this is the case, we set FOO_INCLUDE_DIRS to point to the path that we found before. Notice that it is customary to use the plural form here because a package might conceivably have multiple include paths. Using the plural in all cases makes it simpler for clients to employ our module because they can just issue

TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

somewhere in their code.

As a final step, we hide the variables by marking them as advanced, so that CMake users have to explicitly toggle them. This is merely for not cluttering up the output of cmake-gui.

This is the most basic skeleton for finding a header-only library. To actually use this module, you can now just issue

FIND_PACKAGE( foo REQUIRED )
TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

in your code. Provided that CMake knows where to look for modules, this is all you need to do. To extend the module search path, just create a directory cmake/Modules in your main project folder and add the following lines to the main CMakeLists.txt:

LIST( APPEND CMAKE_MODULE_PATH
  ${CMAKE_SOURCE_DIR}/cmake/Modules
)

A caveat: the FIND_PACKAGE call is one of the few parts in CMake where capitalization matters. If you do FIND_PACKAGE( FOO ), the CMake parser will look for a file named FindFOO.cmake. Hence, in this case, since we are doing FIND_PACKAGE( foo ), the module is named Findfoo.cmake. Notice that I am strongly encouraging you to use uppercase spelling in all the variables that you export, as it makes life easier and developers do not have to think about the proper capitalization.

Finding a shared object or a static library

As a slightly more advanced topic, suppose you are looking for one library called bar that comes with an include directory plus a shared object. This requires some additions to the code above:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{BAR_DIR} )
  SET( BAR_DIR "$ENV{BAR_DIR}" )
ENDIF()

FIND_PATH(
  BAR_INCLUDE_DIR
    bar/bar.h
  HINTS
    ${BAR_DIR}
)

FIND_LIBRARY( BAR_LIBRARY
  NAMES bar
  HINTS ${BAR_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( BAR DEFAULT_MSG
  BAR_INCLUDE_DIR
  BAR_LIBRARY
)

IF( BAR_FOUND )
  SET( BAR_INCLUDE_DIRS ${BAR_INCLUDE_DIR} )
  SET( BAR_LIBRARIES ${BAR_LIBRARY} )

  MARK_AS_ADVANCED(
    BAR_LIBRARY
    BAR_INCLUDE_DIR
    BAR_DIR
  )
ELSE()
  SET( BAR_DIR "" CACHE STRING
    "An optional hint to a directory for finding `bar`"
  )
ENDIF()

The most salient change is the use of FIND_LIBRARY to find, you guessed it, the library. The optional NAMES argument can be used to supply more names for a library, which is useful if a library ships with different flavours, such as bar_cxx or bar_hl.

Similar to what I wrote above, I am also exporting the single library as BAR_LIBRARIES in order to simplify usage. In the best case, clients can just use

TARGET_LINK_LIBRARIES( example ${BAR_LIBRARIES} )

and the code will continue to work even if, some years down the road, bar suddenly starts shipping with two libraries. Again, I advocate for having a sane and simple standard rather than having to think hard about how to use the darn module.

Other than that, this works exactly the same as the previous example from above!

Things that frequently need doing

Having written almost exhaustively about how to find libraries, I want to end this post with several common tasks. For each of them, I have seen various kinds of weird workarounds, so I would like to point out a more official way.

Versions checks

Sometimes, it is unavoidable to support previous versions of CMake, or detect whether a specific version of a library has been installed. For this purpose, there are special VERSION comparison operators. Do not write your own code to do so but rather do something like this:

IF( CMAKE_CXX_COMPILER_VERSION VERSION_LESS "5.4.1" )
  MESSAGE( STATUS "This compiler version might cause problems" )
ENDIF()

Similarly, there are VERSION_EQUAL and VERSION_GREATER checks. They are tested and bound to work—even when you are comparing packages and their versions.

Detecting an operating system

Your code can be as agnostic with respect to the operating system as you want, but there might still be that one situation where you need to have a way of determining whether your code is being compiled under a certain operating system.

This is easy to accomplish:

IF( APPLE )
  MESSAGE( STATUS "Running under MacOS X" )
# Watch out, for this check also is TRUE under MacOS X because it
# falls under the category of Unix-like.
ELSEIF( UNIX )
  MESSAGE( STATUS "Running under Unix or a Unix-like OS" )
# Despite what you might think given this name, the variable is also
# true for 64bit versions of Windows.
ELSEIF( WIN32 )
  MESSAGE( STATUS "Running under Windows (either 32bit or 64bit)" )
ENDIF()

Detecting a compiler

Sometimes, you need to disable or enable certain things depending on the compiler. Suppose you want to figure out the version of the C++ compiler and its type:

IF( CMAKE_CXX_COMPILER_ID MATCHES "GNU" )
  MESSAGE( STATUS "g++ for the win!" )
  MESSAGE( STATUS ${CMAKE_CXX_COMPILER_VERSION} )
ENDIF()

For LLVM/clang, you can use:

IF( CMAKE_CXX_COMPILER_ID MATCHES "Clang" )
  MESSAGE( STATUS "LLVM, yeah!" )
ENDIF()

Please refer to the documentation for more IDs.

Enabling C++11 or C++14

While it is possible (and also necessary for older versions) to enable C++11 by modifying CMAKE_CXX_FLAGS directly, the standard way involves only two lines:

SET( CMAKE_CXX_STANDARD 11 )
SET( CMAKE_CXX_STANDARD_REQUIRED ON )

This is guaranteed to work with all supported compilers.

Coda

I hope this article convinced you of the power of CMake and of the need for standardizing its usage. You can find the code of the modules, plus some boilerplate CMake code, in the GitHub repository for this post.

Have fun using CMake, until next time!

Update (2018-05-30): Using TARGET_INCLUDE_DIRECTORIES as suggested on HN. Thanks!

Posted late Monday evening, May 28th, 2018