This is the personal blog of Bastian Rieck. I add posts in irregular intervals about things I consider relevant. Send any feedback, comments etc. to bastian@annwfn.net.

This HOWTO describes the installation of FreeBSD 6.2 on a Thinkpad R50e. For your convenience, I also included parts of the configuration files you might consider useful. All in all, the R50e is a perfect notebook for FreeBSD. Every important device is usable, allowing you to be productive without major annoyances.

Installation

There is a small partition containing something like a rescue system from IBM. To make things simpler, you might decide to keep it. But removing it and using the whole hard disk for your installation should not hurt, either (this is my personal experience).

The actual installation process is really straightforward. Install FreeBSD any way you want.

WLAN

The WLAN chipset driver is now included in the kernel. Simply add the following line to /boot/loader.conf in order to enable it:

if_iwi_load="YES"

Caveat: The driver version I used had some problems when connecting to hidden access points. I recommend turning the SSID broadcast on. Since you, being security-minded, are using WPA2 anyway, this won't decrease your security.

ACPI and powerd configuration

Assuming you want to use ACPI/powerd, there are several things that have to be configured first. Automatic CPU frequency adjustment has to be set up via /etc/rc.conf:

powerd_enable="YES"
powerd_flags="-b adaptive -a max"
performance_cx_lowest="C2"
performance_cpu_freq="1399"
economy_cx_lowest="C3"
economy_cpu_freq="NONE"

These settings seem reasonable to me. However, you might want to change them if you have other objectives. I was told that SpeedStep should be disabled on a R50e, so add hint.ichss.0.disabled="1" to /boot/loader.conf.

The ACPI configuration is quite fast, too. These lines belong into /etc/sysctl.conf:

kern.timecounter.hardware=i8254
debug.acpi.do_powerstate=1

hw.acpi.lid_switch_state=S3
hw.acpi.standby_state=S1
hw.acpi.suspend_state=S3
hw.acpi.sleep_button_state=S3
hw.acpi.sleep_delay=3
hw.acpi.reset_video=0

This allows your Thinkpad to suspend when you close the lid. To enable the Thinkpad keys that control suspension, add acpi_ibm_load="YES" to your /boot/loader.conf. You might also want to take a look at deskutils/tpb. This program is a port that allows you to redefine the (mostly unused) Thinkpad keys.

Sound

snd_ich_load="YES" in /boot/loader.conf enables the sound card.

Posted at teatime on Saturday, September 29th, 2018 Tags:

This short note explains how to use the font "Adobe Minion Pro" with the LaTeX typesetting system.

Luckily, all the work has already been done by other people. Plus, the font is available for free by downloading Adobe Reader.

Preparation

  • Download an older version of Adobe Acrobat Reader. Choose the .tar.gz for Linux. It is vital that you do not download the most recent version. Unfortunately, the font metrics (which we will install later) do not work with the most recent version of the font.
  • Extract the archive; the font is in the file COMMON.TAR in the directory Adobe/Reader8/Resource/Font.
  • Install otfinfo from /usr/ports/print/typetools (for FreeBSD users) or using the package manager of your choice.
  • Use otfinfo -v to inspect the fonts. For MinionPro-Bold.otf, I get Version 2.015;PS 002.000;Core 1.0.38;makeotf.lib1.7.9032. This version works for me.
  • If not already done, create a local texmf-structure. For the venerable teTeX-distribution, I had to set the environment variable TEXMFHOME to $HOME/.texmf. I also created the directories .texmf, .texmf-config, .texmf-var.
  • Copy updmap.cfg to $HOME/.texmf-config/web2c/. This file needs to be modified for the new font.
  • Run texhash

Font installation

Follow the excellent documentation from CTAN. You will be provided with detailed step-by-step instructions and some helpful scripts that convert the fonts. After placing everything in the correct directory (as the tutorial suggest), you are good to go: A simple \usepackage{MinionPro} should do the trick.

My gratitude goes to Achim Blumensath, Andreas Bühmann, and Michael Zedler for creating and maintaining the aforementioned package. Otherwise, I would have been hopelessly lost in the mazes of my own incompetence.

Posted Saturday afternoon, September 29th, 2018 Tags:

K3b is a great CD/DVD authoring software. This small tutorial explains how to use it with non-root user accounts. This HOWTO is also applicable if you want to get k9copy working. You can skip setting SUID flags in this case.

Kernel setup

First, let's check whether you have to do anything about your kernel setup at all. Execute camcontrol devlist in a terminal. If you see your CD/DVD writer, all is well. For example, this is the output of the command when being run on my good old Thinkpad R50e:

/home/bastian % camcontrol devlist
<MATSHITA DVD-RAM UJ-830Sx 1.00>   at scbus1 target 0 lun 0 (pass0,cd0)

If you don't see your hardware, let's try something else: This is as easy as it gets: Either add atapicam_load="YES" in /boot/loader.conf or compile a new kernel that contains device atapicam.

This should do the trick.

DevFs configuration

You need to set the proper permissions for the CD/DVD drive. If you look at the output from camcontrol devlist from above, you will notice the part (pass0,cd0). If you have multiple CD/DVD drives in your computer, the numbers will be different. Substitute the correct numbers for your system in the lines below.

Open /etc/devfs.conf and add the following lines at a convenient location:

# Allow CD/DVD authoring
perm cd0   0660
perm pass0 0660
perm xpt0  0660

If you are the only user of your computer, simply add your account to the operator group (which owns the devices by default).

If you want to enable burning for multiple users, however, I would strongly suggest creating an appropriate user group (e.g. burn) and adding the user accounts that are allowed to burn CDs/DVDs. In this case, in addition to the lines from above, the following lines should also be added to /etc/devfs.conf:

own cd0   root:burn
own pass0 root:burn
own xpt0  root:burn

Set SUID flags

Skip this step if you are trying to install k9copy.

I don't like it very much, but is necessary to endow cdrdao and cdrecord with root permissions. So, su to root and execute:

chmod u+s /usr/local/bin/cdrdao
chmod u+s /usr/local/bin/cdrecord

Enable DMA

Execute sysctl hw.ata.atapi_dma to check whether DMA is enabled for CD/DVD drives:

/home/bastian % sysctl hw.ata.atapi_dma
hw.ata.atapi_dma: 1

If the output is 0, add hw.ata.atapi_dma=1 to /etc/sysctl.conf.

Conclusion

K3b or k9copy should work now. If you want to enable user mounting of CDs/DVDs, add vfs.usermount=1 to /etc/sysctl.conf.

Posted Saturday afternoon, September 29th, 2018 Tags:

I recently wanted to download some contact data from the address book of my Nokia 6300 mobile. It turned out that this is less painful than it sounds: By using the gnokii package, I was able to get a dump of my address book in VCARD format.

Installation and configuration

  • gnokii is available in the ports collection. Great.
  • Copy /usr/local/etc/gnokiirc.sample to .gnokiirc
  • Modify .gnokiirc

Since the defaults of this file are good, I only had to change the following lines:

# Since I want a USB connection
port = 1
connection = dku2libusb

# Other phones may require other settings, of course
model = series40

Connecting the phone

  • Using a USB-to-mini-USB cable, connect the phone
  • When the phone asks about the connection mode, select Ǹokia mode (or something similar)
  • Issue a gnokii --identify command to check whether the phone is reachable

Dumping the address book

This required some reading. The manpage of gnokii is written rather well, fortunately. In the end I resorted to:

gnokii --getphonebook ME 1 end -v > contacts.vcf

This takes all contacts (hence the 1 end) from the internal memory of the phone, converts them to the VCARD format, and stores them in contacts.vcf, where they may be updated and processed.

Posted Saturday afternoon, September 29th, 2018 Tags:

This is a small HOWTO about the BPF device under FreeBSD. I will show you how to access and configure this device. You will also learn how to send and receive ethernet frames. If you want to see an example for possible BPF uses, you might want to consider taking a look at in medias res.

Any C compiler should be able to compile the example code. Thanks to Pedro for pointing out several syntax errors.

What is the BPF?

The Berkeley Packet Filter is one of FreeBSD's most impressive devices. It provides you full ("raw") access to your NICs data link layers, i.e. you are totally protocol-independent. In general, you should be able to capture and send all packets that arrive on your network card, even if they are meant to reach other hosts (for example: if you are using a hub instead of a switch, higher-leveled raw interfaces will probably discard frames that are not for your MAC address. The BPF won't...). To use this really powerful device, you need a kernel that contains device bpf. If you don't know how to create your own kernel, take a look at the excellent FreeBSD handbook.

More information about the BPF is readily available via man 4 bpf.

Creating and configuring a BPF device

In order to create a functional, readable instance of the BPF device, you have to:

  • Open /dev/bpfn, where n depends on how many other applications are using a BPF
  • Associate your file descriptor with one network interface
  • Set the "immediate mode" so that a call to read will return immediately if a packet has been received
  • Request the BPF's buffer size

Let's proceed chronologically. First, we will try to open the next available BPF device:

char buf[ 11 ] = { 0 };
int bpf = 0;

for( int i = 0; i < 99; i++ )
{
    sprintf( buf, "/dev/bpf%i", i );
    bpf = open( buf, O_RDWR );

    if( bpf != -1 )
        break;
}

Now we are going to associate it with a specific network device, such as fxp0:

const char* interface = "fxp0";
struct ifreq bound_if;

strcpy(bound_if.ifr_name, interface);
if(ioctl( bpf, BIOCSETIF, &bound_if ) > 0)
    return(-1);

All's well at the moment, so let's enable immediate mode and request the buffer size. The last point is very important, as the BPF is allowed to provide you with more than one packet after issuing a call to read. If you know the buffer size, you can advance to the next packet.

int buf_len = 1;

// activate immediate mode (therefore, buf_len is initially set to "1")
if( ioctl( bpf, BIOCIMMEDIATE, &buf_len ) == -1 )
    return( -1 );

// request buffer length
if( ioctl( bpf, BIOCGBLEN, &buf_len ) == -1 )
      return( -1 );

Reading packets

Now, as we are completely done with the initialization and have a working file descriptor, we want to capture incoming traffic. The good thing about BPF is that you can set up filter rules if you only want to receive specific traffic, such as TCP/IP packets.

In theory, there is no need to do more than making a call to read. The resulting buffer contains a bpf_hdr and following after that, a packet. So one could just do something like that to convert this buffer into a valid ethernet frame:

frame = (ethernet_frame*) ( (char*) bpf_buf + bpf_buf->bh_hdrlen);

Unfortunately, sometimes the kernel likes to add more than one packet to your buffer. Well, the lazy approach would just read one packet per buffer, and wait for the TCP retransmissions that may arrive. But being lazy is not a good solution. Therefore, we need a loop to read all packets that are in the buffer:

int read_byes = 0;

ethernet_frame* frame;
struct bpf_hdr* bpf_buf = new bpf_hdr[buf_len];
struct bpf_hdr* bpf_packet;

while(run_loop)
{
    memset(bpf_buf, 0, buf_len);

    if((read_bytes = read(bpf, bpf_buf, buf_len)) > 0)
    {
        int i = 0;

        // read all packets that are included in bpf_buf. BPF_WORDALIGN is used
        // to proceed to the next BPF packet that is available in the buffer.

        char* ptr = reinterpret_cast<char*>(bpf_buf);
        while(ptr < (reinterpret_cast<char*>(bpf_buf) + read_bytes))
        {
            bpf_packet = reinterpret_cast<bpf_hdr*>(ptr);
            frame = (ethernet_frame*)((char*) bpf_packet + bpf_packet->bh_hdrlen);

            // do something with the Ethernet frame
            // [...]

            ptr += BPF_WORDALIGN(bpf_packet->bh_hdrlen + bpf_packet->bh_caplen);
        }
    }
}

The above loop does the following things:

  • As long as the "distance" between the original bpf_buf and the auxiliary pointer ptr is not bigger as the number of bytes actually read...

  • ...the auxiliary pointer is advanced to the next ethernet frame. BPF_WORDALIGN rounds up to the next even multiple of BPF_ALIGNMENT. This means that you will jump over all bytes that are used for padding purposes. Hence, bpf_packet always points to the next bpf_hdr structure, always given the fact that there is more than one.

Please note that ethernet_frame is my own structure used to describe one ethernet frame (802.3). Read the standard RFCs or use Wireshark if you want to learn more.

Sending (your own) packets

Sometimes, you might want to send your own packets instead of sticking to the analysis of captured ones. No problem with the BPF. If the BPF is initialised as aforementioned, sending packets is really no problem at all. A quick call to write will do the trick:

write(bpf, frame, bpf_buf->bh_caplen);

In this snippet, bpf is the BPF's file descriptor, frame is a pointer to an ethernet frame that has a TCP/IP packet attached (remember the initialization of frame above?). Of course, this is totally useless, but if you want to write a little broadcast router or something like that, you could just change the destination MAC address and write the more or less unchanged frame plus the payload to the BPF. You won't have to care about the source MAC address, as the BPF does that for you (look at the man page and search for BIOCGHDRCMPLT if you want to disable this feature).

Ethernet frames

An ethernet frame is the basic structure that is sent through your network cables. You have to use it if you need to access the link layer, i.e. if you want to send your own raw packets. This is how an ethernet frame (802.3, ethernet version 2.0) could look like:

destination hardware (MAC) address [6 bytes] source hardware (MAC) address [6 bytes] layer-3 protocol type [2 bytes] payload [46 - 1500 bytes] FCS [4 bytes]

The FCS field is not necessarily needed. The other attributes should be initialised, except the source MAC address (see above for explanation). This is what you should do if you want to send your own packets:

  • Prepare one ethernet frame and supply it with the proper values
  • Pay particular attention to the type field. Otherwise, you might experience errors (for example: IP packets with an ARP type field).
  • Attach the payload. For an arbitrary TCP/IP packet, you would need:
    • IP header
    • TCP header
    • TCP payload
  • Send it!
  • For debugging purposes, you should have a network sniffer which will tell you if something went wrong.

Following the given example, your frame could look like this:

01:02:03:04:05:06 Destination MAC
01:02:03:04:05:06 Source MAC
0x0800 Type: IP
IP header
TCP header
TCP payload

Conclusion

The BPF clearly is a very powerful thing. If you know something about the underlying network structure, you can do unbelievable things with it. Of course, you do not need to stick to the TCP. For a nice example of using the BPF, take a look at IMR, a man-in-the-middle application that uses ARP and directs traffic between two victim hosts.

Additional information is available through these documents:

You could also take a closer look on the additional BPF flags, for example BIOCGHDRCMPLT. This flag allows you to fill in the link level source address of an ethernet frame by yourself, thus allowing you to create arbitrary spoofed packets that may trick other hosts in your network.

Posted Saturday afternoon, September 29th, 2018 Tags:

I want to dual-boot Microsoft Windows 7 and FreeBSD. Since the FreeBSD boot manager will be overwritten when installing Windows (I wonder if this will ever change. Probably not. OS evangelism aside, this is one of the things that truly sucks about a Windows installation), I decided to use the Microsoft Windows 7 boot manager to boot FreeBSD. This HOWTO outlines the necessary steps.

The procedure should work for the Windows Vista boot manager, too.

Setup

I am assuming a normal setup here, i.e.:

  • You have a working computer.
  • You want to install Windows 7 and FreeBSD on the same hard disk.
  • This hard disk is the primary disk.

FreeBSD installation

Perform a regular installation of FreeBSD. Create a partition, label the slices, choose the packages you want to install etc.

After the installation has finished, boot into your new system and copy /boot/boot1 to a safe location such as an USB stick.

Reboot and insert your Windows 7 DVD.

Windows 7 installation

Install Windows 7. You pretty much don't have any choices during the installation process except for selecting the correct partition. Double-check that you are not overwriting the FreeBSD partition.

After the installation has finished, boot into your new Windows system.

Configuring the Windows boot manager

Copy /boot/boot1 to C:\FreeBSD.mbr. We will now use some arcane magic that I have taken from OpenBSD's tome of answers and adapted for FreeBSD.

Open cmd.exe and create a new entry for the boot manager:

C:\Windows\system32> bcdedit /create /d "FreeBSD 7.2" /application bootsector
The entry {01234567-ABCD-ABCD-ABCD-0123456789AB} was successfully created.

C:\Windows\system32>

Substitute the GUID (i.e. {01234567-ABCD-ABCD-ABCD-0123456789AB}) that you received when executing the command from above for the next commands:

C:\Windows\system32> bcdedit /set {01234567-ABCD-ABCD-ABCD-0123456789AB} device boot
The operation completed successfully.

C:\Windows\system32> bcdedit /set {01234567-ABCD-ABCD-ABCD-0123456789AB} path \FreeBSD.mbr
The operation completed successfully.

C:\Windows\system32> bcdedit /set {01234567-ABCD-ABCD-ABCD-0123456789AB} device partition=c:
The operation completed successfully.

C:\Windows\system32> bcdedit /displayorder {01234567-ABCD-ABCD-ABCD-0123456789AB} /addlast
The operation completed successfully.

Further help is available if you call bcdedit /?. The documentation is also helpful if you want to fine-tune any settings.

Conclusion

This guide works perfectly for my system (FreeBSD 7.2 and Windows 7). There was not data loss or any other problem. However, I did a fresh install and restored from a backup. Your mileage may vary when you try to add Windows 7 to an existing installation of FreeBSD or vice versa.

In short: Be a man—take backups.

Posted Saturday afternoon, September 29th, 2018 Tags:

When I joined my new lab at the beginning of this year, I was very happy to see that researchers in the biomedical sciences already do what every scholar needs to do, namely keeping a lab book. It is not only a good way to structure your thoughts, it will also give you an opportunity to go back in time, track your work, and even provide summaries of your work day to your supervisor.

My setup

I actually keep two lab books. One is a “Moleskine Classic Notebook” that I use to scribble down thoughts, write out proof ideas, sketches, and anything else that tickles my fancy. The advantage of this old-school way of keeping notes is that I am independent of any technology, making it possible to quickly jot down something in almost any situation. Doing anything electronically here would just raise the bar for taking down an idea or a sketch. If I need my notes in a searchable form, I can always scan and OCR them—but so far, there was never a need for that, and I fear that most OCR software would be unable to decipher my handwriting even if I endeavour to write well.

For the second lab book, I use a Markdown document. While there is technically not a formal specification of this format, it is reasonably simple to parse, easy to look at (even if your editor is incapable of parsing it directly) and can be transformed into a variety of other formats. Being a big proponent of LaTeX, the idea of separating content and layout (or rendering) resonates well with me. I keep this second document under version control using git, which is another bonus of text-only formats. Even after a few weeks on the road, I can thus easily synchronize any changes to other computers.

The format

Most of my entries are roughly arranged based on their corresponding project. Hence, all first-level headers in my lab book are project names. I also have special section “Dormant” for projects that are, well, dormant. The second-level headers are dates, specified in YY-MM-DD format. This makes it easy for me to figure out what I did on a specific date. Apart from that, I use sub-sub-sections and so on whenever I deem it appropriate. This is how it might look:

Project 1
=========

2018-08-01
----------

Added `foo` widget. Refactored code for `bar` class, but ran into
troubles with function `baz()`.

Project 2
=========

2018-08-01
----------

Read paper on topic 1. Checked out example.com for some additional
results. Loss term in appendix is not sufficiently explained. Need
to follow up on this.

vim integration

If you are a vim user, there are several plugins that make working with Markdown easier. The obvious choice for syntax highlighting is Tim Pope's vim-markdown. The one I found most useful is VOoM. This plugin gives you a second pane for many document types (not only Markdown!) in which you can easily see the document structure, jump to specific sections, and so on. To make VOoM detect the file type automatically, add this to your .vimrc:

let g:voom_ft_modes = {'markdown': 'markdown'}

This will make the plugin aware of any Markdown document. Use the command :Voom to create an outline of your document. This can be toggled with :VoomToggle.

That’s it—a short and simple way of keeping a lab book. Until next time!

Posted late Wednesday afternoon, August 1st, 2018 Tags:

CMake arguably makes life a lot easier for developers and users alike. Instead of fiddling around with autotools madness, we ideally now just issue the following sequence of commands to build a piece of software:

$ mkdir build
$ cd build
$ cmake ../
$ make

However, often things are not that easy. I have found that sometimes, anarchy reigns supreme in the world of CMake—different modules, different ways of doing the same thing, and a complete lack of enforced standards are big hurdles in using or integrating software written by other people.

In this article, I want to outline some rules for writing a simple CMake find module for your own software or for unsupported libraries. Moreover, I want to present some snippets for common situations. The code for this blog post is available on GitHub.

Writing a CMake module

CMake modules are the prime reason for its success. Briefly put, they permit you to find other libraries that you need to link against. A good module can make your life easy. A bad module can lead to weird error messages.

Finding a header-only library

For an easy start, assume that we are looking for a header-only library. The same structure will be used when looking for other libraries, by the way—it will just be a little bit longer.

Suppose we want to look for a header-only library foo that has one main header foo.h within some subdirectory foo. Yes, the creativity in names is strong here, but bear with me. The module below will look for this package in a standardized manner:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{FOO_DIR} )
  SET( FOO_DIR "$ENV{FOO_DIR}" )
ENDIF()

FIND_PATH(
  FOO_INCLUDE_DIR
    foo/foo.h
  HINTS
    ${FOO_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( FOO DEFAULT_MSG
  FOO_INCLUDE_DIR
)

IF( FOO_FOUND )
  SET( FOO_INCLUDE_DIRS ${FOO_INCLUDE_DIR} )

  MARK_AS_ADVANCED(
    FOO_INCLUDE_DIR
    FOO_DIR
  )
ELSE()
  SET( FOO_DIR "" CACHE STRING
    "An optional hint to a directory for finding `foo`"
  )
ENDIF()

Let us take a look at this in more detail. The module first imports a CMake standard module, the FindPackageHandleStandardArgs macro, which permits us to delegate and standardize package finding. Next, we check the environment variables of the client for FOO_DIR. The user can specify such a variable to point to a non-standard include directory for the package, such as $HOME, or any directory that is not typically associated with libraries. A classical use case is the local installation on a machine where you lack root privileges.

In any case, information about the variable is being used and a new variable called FOO_DIR is not set in CMake. Next, we supply it to the FIND_PATH function of CMake. This function tries to find a specified path or file (foo/foo.h in our case) while looking in a standardized set of directories. See the CMake documentation for more details.

Information about the path is stored in FOO_INCLUDE_DIR. The nice thing is that we do not need to evaluate this variable, because the function FIND_PACKAGE_HANDLE_STANDARD_ARGS handles it: using some short descriptor (FOO) of the package, we can hand all the paths that we need to the function and it will automatically result in the appropriate status or warning message. Moreover, it will set the variable FOO_FOUND if the package was found.

If this is the case, we set FOO_INCLUDE_DIRS to point to the path that we found before. Notice that it is customary to use the plural form here because a package might conceivably have multiple include paths. Using the plural in all cases makes it simpler for clients to employ our module because they can just issue

TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

somewhere in their code.

As a final step, we hide the variables by marking them as advanced, so that CMake users have to explicitly toggle them. This is merely for not cluttering up the output of cmake-gui.

This is the most basic skeleton for finding a header-only library. To actually use this module, you can now just issue

FIND_PACKAGE( foo REQUIRED )
TARGET_INCLUDE_DIRECTORIES( example ${FOO_INCLUDE_DIRS} )

in your code. Provided that CMake knows where to look for modules, this is all you need to do. To extend the module search path, just create a directory cmake/Modules in your main project folder and add the following lines to the main CMakeLists.txt:

LIST( APPEND CMAKE_MODULE_PATH
  ${CMAKE_SOURCE_DIR}/cmake/Modules
)

A caveat: the FIND_PACKAGE call is one of the few parts in CMake where capitalization matters. If you do FIND_PACKAGE( FOO ), the CMake parser will look for a file named FindFOO.cmake. Hence, in this case, since we are doing FIND_PACKAGE( foo ), the module is named Findfoo.cmake. Notice that I am strongly encouraging you to use uppercase spelling in all the variables that you export, as it makes life easier and developers do not have to think about the proper capitalization.

Finding a shared object or a static library

As a slightly more advanced topic, suppose you are looking for one library called bar that comes with an include directory plus a shared object. This requires some additions to the code above:

INCLUDE( FindPackageHandleStandardArgs )

# Checks an environment variable; note that the first check
# does not require the usual CMake $-sign.
IF( DEFINED ENV{BAR_DIR} )
  SET( BAR_DIR "$ENV{BAR_DIR}" )
ENDIF()

FIND_PATH(
  BAR_INCLUDE_DIR
    bar/bar.h
  HINTS
    ${BAR_DIR}
)

FIND_LIBRARY( BAR_LIBRARY
  NAMES bar
  HINTS ${BAR_DIR}
)

FIND_PACKAGE_HANDLE_STANDARD_ARGS( BAR DEFAULT_MSG
  BAR_INCLUDE_DIR
  BAR_LIBRARY
)

IF( BAR_FOUND )
  SET( BAR_INCLUDE_DIRS ${BAR_INCLUDE_DIR} )
  SET( BAR_LIBRARIES ${BAR_LIBRARY} )

  MARK_AS_ADVANCED(
    BAR_LIBRARY
    BAR_INCLUDE_DIR
    BAR_DIR
  )
ELSE()
  SET( BAR_DIR "" CACHE STRING
    "An optional hint to a directory for finding `bar`"
  )
ENDIF()

The most salient change is the use of FIND_LIBRARY to find, you guessed it, the library. The optional NAMES argument can be used to supply more names for a library, which is useful if a library ships with different flavours, such as bar_cxx or bar_hl.

Similar to what I wrote above, I am also exporting the single library as BAR_LIBRARIES in order to simplify usage. In the best case, clients can just use

TARGET_LINK_LIBRARIES( example ${BAR_LIBRARIES} )

and the code will continue to work even if, some years down the road, bar suddenly starts shipping with two libraries. Again, I advocate for having a sane and simple standard rather than having to think hard about how to use the darn module.

Other than that, this works exactly the same as the previous example from above!

Things that frequently need doing

Having written almost exhaustively about how to find libraries, I want to end this post with several common tasks. For each of them, I have seen various kinds of weird workarounds, so I would like to point out a more official way.

Versions checks

Sometimes, it is unavoidable to support previous versions of CMake, or detect whether a specific version of a library has been installed. For this purpose, there are special VERSION comparison operators. Do not write your own code to do so but rather do something like this:

IF( CMAKE_CXX_COMPILER_VERSION VERSION_LESS "5.4.1" )
  MESSAGE( STATUS "This compiler version might cause problems" )
ENDIF()

Similarly, there are VERSION_EQUAL and VERSION_GREATER checks. They are tested and bound to work—even when you are comparing packages and their versions.

Detecting an operating system

Your code can be as agnostic with respect to the operating system as you want, but there might still be that one situation where you need to have a way of determining whether your code is being compiled under a certain operating system.

This is easy to accomplish:

IF( APPLE )
  MESSAGE( STATUS "Running under MacOS X" )
# Watch out, for this check also is TRUE under MacOS X because it
# falls under the category of Unix-like.
ELSEIF( UNIX )
  MESSAGE( STATUS "Running under Unix or a Unix-like OS" )
# Despite what you might think given this name, the variable is also
# true for 64bit versions of Windows.
ELSEIF( WIN32 )
  MESSAGE( STATUS "Running under Windows (either 32bit or 64bit)" )
ENDIF()

Detecting a compiler

Sometimes, you need to disable or enable certain things depending on the compiler. Suppose you want to figure out the version of the C++ compiler and its type:

IF( CMAKE_CXX_COMPILER_ID MATCHES "GNU" )
  MESSAGE( STATUS "g++ for the win!" )
  MESSAGE( STATUS ${CMAKE_CXX_COMPILER_VERSION} )
ENDIF()

For LLVM/clang, you can use:

IF( CMAKE_CXX_COMPILER_ID MATCHES "Clang" )
  MESSAGE( STATUS "LLVM, yeah!" )
ENDIF()

Please refer to the documentation for more IDs.

Enabling C++11 or C++14

While it is possible (and also necessary for older versions) to enable C++11 by modifying CMAKE_CXX_FLAGS directly, the standard way involves only two lines:

SET( CMAKE_CXX_STANDARD 11 )
SET( CMAKE_CXX_STANDARD_REQUIRED ON )

This is guaranteed to work with all supported compilers.

Coda

I hope this article convinced you of the power of CMake and of the need for standardizing its usage. You can find the code of the modules, plus some boilerplate CMake code, in the GitHub repository for this post.

Have fun using CMake, until next time!

Update (2018-05-30): Using TARGET_INCLUDE_DIRECTORIES as suggested on HN. Thanks!

Posted late Monday evening, May 28th, 2018

When writing Aleph, my library for persistent homology and computational topology, I decided to add a few Python bindings one idle afternoon. Using the magnificent pybind11 library, this was easier than I anticipated. Much to my chagrin, though, it turns out that using such a bindings with Python interpreter is more complicated if you want to do it the right way.

Of course, having built a single .so file that contains the code of your module, the easy way is to modify the PYTHONPATH variable and just point it to the proper path. But I wanted to do this right, of course, and so, together with Max, I started looking at ways how to simplify the build process.

The situation

I am assuming that you have installed pybind11 and wrote some small example that you want to distribute now. If you are unsure about this, please refer to my example repository for this blog post.

The module may look like this:

#include <pybind11/pybind11.h>

#include <string>

class Example
{
public:
  Example( double a )
    : _a( a)
  {
  }

  Example& operator+=( const Example& other )
  {
    _a += other._a;
    return *this;
  }

private:
  double _a;
};

PYBIND11_MODULE(example, m)
{
  m.doc() = "Python bindings for an example library";

  namespace py = pybind11;

  py::class_<Example>(m, "Example")
    .def( py::init( []( double a )
            {
              return new Example(a);
            }
          )
    )
    .def( "__iadd__", &Example::operator+= );
}

Pretty standard stuff so far: one class, with one constructor and one addition operator exposed (for no particular reason whatsoever).

Building everything

Building such a module is relatively easy with CMake if you are able to find the pybind11 installation (the example repository has a ready-to-use module for this purpose). Since we want to do this the right way, we need to check whether the Python interpreter exists:

SET( PACKAGE_VERSION "0.1.1" )

FIND_PACKAGE( pybind11 REQUIRED )

FIND_PACKAGE(PythonInterp 3)
FIND_PACKAGE(PythonLibs   3)

Next, we can build the library using CMake. Some special treatment for MacOS X is required (obviously) in order to link the module properly.

IF( PYTHONINTERP_FOUND AND PYTHONLIBS_FOUND AND PYBIND11_FOUND )
  INCLUDE_DIRECTORIES(
    ${PYTHON_INCLUDE_DIRS}
    ${PYBIND11_INCLUDE_DIRS}
  )

  ADD_LIBRARY( example SHARED example.cc )

  # The library must not have any prefix and should be located in
  # a subfolder that includes the package name. The setup will be
  # more complicated otherwise.
  SET_TARGET_PROPERTIES( example
    PROPERTIES
      PREFIX ""
      LIBRARY_OUTPUT_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/example"
  )

  # This is required for linking the library under Mac OS X. Moreover,
  # the suffix ensures that the module can be found by the interpreter
  # later on.
  IF( APPLE )
    SET_TARGET_PROPERTIES( example
      PROPERTIES
        LINK_FLAGS "-undefined dynamic_lookup"
        SUFFIX     ".so"
    )
  ENDIF()


  # Place the initialization file in the output directory for the Python
  # bindings. This will simplify the installation.
  CONFIGURE_FILE( example/__init__.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/__init__.py
  )

  # Ditto for the setup file.
  CONFIGURE_FILE( example/setup.py
    ${CMAKE_CURRENT_BINARY_DIR}/example/setup.py
  )
ENDIF()

The salient points of this snippet are:

  • Changing the output directory of the library to a subordinate directory. We will later see that this simplifies the installation.
  • Configuring (and copying) __init__.py and setup.py files and make them available in the build directory.

__init__.py is rather short:

from .example import *

This will tell the interpreter later on to import all symbols from the example module in the current directory.

The setup.py is slightly more complicated:

from distutils.core import setup

import sys
if sys.version_info < (3,0):
  sys.exit('Sorry, Python < 3.0 is not supported')

setup(
  name        = 'cmake_cpp_pybind11',
  version     = '${PACKAGE_VERSION}', # TODO: might want to use commit ID here
  packages    = [ 'example' ],
  package_dir = {
    '': '${CMAKE_CURRENT_BINARY_DIR}'
  },
  package_data = {
    '': ['example.so']
  }
)

The important thing is the package_data dictionary. It specifies the single .so file that is the result of the CMake build process. This ensures that the file will be installed alongside the __init__.py file.

Testing it

First, we have to build our package:

$ mkdir build
$ cd build
$ cmake ../
$ make
$ cd example
$ ls
example.so  __init__.py  setup.py
$ sudo python setup.py install

Afterwards, the package should be available for loading:

$ python
Python 3.6.5 (default, Apr 12 2018, 22:45:43)
[GCC 7.3.1 20180312] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
>>> example.Example(0.0)
<example.example.Example object at 0x7f54e7f77308>
>>> example.Example("Nope")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. example.example.Example(arg0: float)

Invoked with: 'Nope'

Everything seems to work as expected.

Conclusion

This is certainly not the easiest or most modern way to install your own Python module. However, it is the easiest one in case you already have a large project that exports Python bindings. In a truly optimal world, we would use setuptools directly and build wheel packages—but this will have to wait for another article.

In the meantime, the code for this example is available on GitHub.

Happy packaging, until next time!

Posted Tuesday afternoon, May 1st, 2018 Tags:

Even though C++ is still my favourite programming language, I am always amazed by the sheer number of headers I have to include to get something done. Most of my source files could contain #include <vector> at the beginning, because this is most likely what I am going to include anyway. Other headers are not treated this way. In fact, I am displaying a wanton disregard for many of the other headers. For example, I have never consciously use the execution header in real-world code.

I thus started wondering how other projects fared in that regard. So I cloned several of the larger C++ repositories on GitHub. More precisely, I started with the following repositories:

Counting individual headers

In total, these projects comprise more than 2 million lines of code—a reasonably-sized sample, I would say. To figure out how these projects use headers, I first extracted all STL headers from all files and counted their occurrences. This resulted in the following histogram (the counts are relative):

Histogram
of STL header occurrences

Pretty interesting, I would say. This is a nice long-tail distribution for which a few headers are used much more often than the rest. In fact, for these repositories, only four headers make up more than 50% of the usage:

  • vector
  • string
  • memory
  • utility

For vector and string, this is not surprising. Virtually every C++ programmer uses vector for almost anything. The same goes for string. Similarly, memory is not so surprising as it contains the different smart pointer classes—most prominently, shared_ptr. The last one of the list, utility, was slightly unexpected for me. It contains things such as std::make_pair and std::move. At least the latter one is required for any class that does its own memory management.

At the tail of the distribution, the more exotic headers await. The stack header, for example, appears not to be used in these projects too often, while the future header comes in dead last. I must confess that I have not used in real-world projects so far because I did not yet have to deal with asynchronous operations. The lack of enthusiasm for the regex header is somewhat sad but maybe this is to be expected in a language that does not really encourage the use of regular expressions? Also, C++ regular expressions are said to perform worse than their counterparts in other languages. To what extent the unfamiliarity of C++ programmers with regular expressions might contribute to this, I cannot say.

Counting pairs of headers

Let’s delve into another aspect of the headers. In my code, I noticed that some headers are almost always used together. For example, if there is an algorithm header, there is often also a functional header. Extending this to projects, I thought that it might be interesting to analyse the co-occurrence patterns of STL headers. To this end, I counted how often pairs of headers are being included the same file. This naturally gives rise to a co-occurrence matrix, in which rows and columns indicate headers, and the value indicates how often those header occur in the same file. If headers are sorted by their counts, this results in a beautiful picture:

Co-occurrence
matrix of STL headers

This matrix tells us something about the universality of certain headers. The vector header, for example, co-occurs with almost every other header to some extent because vectors are such fundamental data types. The typeinfo header, on the other hand, is so very specific that is only co-occurs with typeindex. In fact, the structure of the matrix, i.e. the many dark cells, indicates that many combinations are highly unlikely to occur “in the wild”.

Some of the combinations really tell a story, though. For example, queue is used in conjunction with thread (possibly to implement patterns for multi-thread environments), but also with stack (possibly to implement different traversal strategies of implicit or explicit graph structures in these projects). I also see another pattern of my own code, namely the pair unordered_map and unordered_set. I tend to require either both of them (the set for iteration and storage, the map for, well associating more information with individual objects) or none of them.

Conclusion

As a next step, it would be interesting to see whether the co-occurrence of certain headers makes it possible to guess the domain of a C++ program, just like certain pairs of words (I guess I should rather speak of bigrams here, to use the NLP term) are more indicative of certain genres of literature. Treating code like literature would certainly make for an interesting art project.

The code for this project is available on GitHub. You only have to supply the repositories for scanning.

Happy coding, until next time!

Update (2018-04-06): Changed the title because I was using the term meta-analysis incorrectly. Thanks, HN!

Posted Tuesday evening, April 3rd, 2018 Tags: