TCG TSS2 async APIs and event driven programming

API design is hard. Over the past year I’ve been contributing to an effort to standardize a set of APIs for interacting with TPM2 devices through the Trusted Computing Group’s (TCG) TPM2 Software Stack (TSS) working group (WG). The available APIs and implementations predate my involvement but I’m hoping to contribute some use cases to motivate the need for various aspects of the API and architecture. According to Joshua Bloch use-cases are core to API design so I figure having some thoughts and examples documented and available would be helpful.

Use-case driven design: asynchronous function calls

After watching Mr. Bloch’s talk a few times in the last year, the API design principle that’s stuck with me most is that the design should be motivated / driven by use cases. A use-case adopted by the TSS WG that I’ve found particularly interesting is support for event driven programming environments / frameworks. On several occasions this use case has come into question and generally the argument against the feature comes from the trade-offs involved: detractors argue that the utility of and demand for the use case does not outweigh the complexity that it introduces to the API.

But even though event driven programming is integral to many languages and frameworks, we don’t have any examples of applications using the TPM in such an environment. There aren’t many applications using the TPM to begin with and since the TSS for TPM 1.2 devices never supported async I/O (AFAIK) there was never an opportunity to do so. Since the TSS2 APIs include support for asynchronous operations I thought it was worth while to write some example code to demonstrate use of this part of the API in a popular programming framework.

You know what they say:

Integrating SAPI with GLib and GIO

The TSS2 APIs are limited to C. While some bindings to higher level languages have been emerging I’m expecting that C will be the primary language for interacting with the TPM for some time. There is no shortage of options for event frameworks in C, but for the purposes of this post I’m going to focus exclusively on GLib and the GIO libraries.

Before we dive any deeper though I want to include a quick note on event driven programming in general: The term “event driven” doesn’t necessarily mean asynchronous or even non-blocking. Further, not all asynchronous I/O operations are truly asynchronous. Many depend on polling file descriptors with some help from the kernel through the select or poll system calls. I don’t intend to get into the merits or purity of the various approaches though. In this post we’re just using the tools available in a way that works with GLib and GIO. If you’re interested in a “deep-dive” into event driven programming in C there are much better resources out there: start here

GLib and GIO primitives

At the heart of the GLib event model are 3 objects: the GMainLoop, the associated GMainContext and the GSource. These objects are very well documented so I’m going to assume general familiarity. For our purposes we only need to know a few things:

  1. GMainLoop is a thread that processes events. If used properly the GMainLoop will handle the asynchronous I/O operations so we don’t have to polling or threading code ourselves.
  2. GMainContext is a container used by the GMainLoop. It holds a collection of GSources that make up the events in the GMainLoop.
  3. GSource is an object that holds a number of function pointers that are invoked in response to some event.

Let’s do a quick example to illustrate before we mix in the TSS2 stuff. The following code is an example of a GMainLoop with a single timeout event source. The event source will invoke a callback after a specified time. The timeout event will occur on the interval so long as the callback returns G_SOURCE_CONTINUE.

#include 
#include 

#define TIMEOUT_COUNT_MAX 10
#define TIMEOUT_INTERVAL  100

typedef struct {
    GMainLoop *loop;
    size_t timeout_count;
} data_t;

gboolean
timeout_callback (gpointer user_data)
{
    data_t *data = (data_t*)user_data;

    g_print ("timeout_count: %zun", data->timeout_count);
    if (data->timeout_count timeout_count; 
        return G_SOURCE_CONTINUE;
    } else {
        g_main_loop_quit (data->loop);
        return G_SOURCE_REMOVE;
    }
}
int
main (void)
{
    GMainContext *context;
    GSource *source;
    data_t data = { 0 };

    source = g_timeout_source_new (TIMEOUT_INTERVAL);
    g_source_set_callback (source, timeout_callback, &data, NULL);
    context = g_main_context_new ();
    g_source_attach (source, context);
    g_source_unref (source);

    data.loop = g_main_loop_new (context, FALSE);
    g_main_context_unref (context);

    g_main_loop_run (data.loop);
    g_main_loop_unref (data.loop);
    return 0;
}

In ~40 LOC we have an event loop triggering a timeout on a fixed interval. After a fixed number of timeout events the program shuts itself down cleanly. Not bad for C. It’s worth noting as well that GLib has some convenience functions (in this case g_timeout_add) that would have saved us a few lines here but I’ve opted to do those bits manually to make future examples more clearly relatable to this one.

Event-driven TSS2 function calls

Now that we have the basic tools required to use a GMainLoop for asynchronous events we’ll play around with adapting them for use with asynchronous calls in the SAPI library. We’ll make this next step a small one though. We won’t go straight to creating GObjects to plug into the GMainLoop. Instead let’s take a look at *how* the TSS2 libraries expose the primitives we need to make async calls and *what* those primitives are.

Asynchronous TSS2

The TCG TSS2 APIs all work together to provide the mechanisms necessary to make asynchronous calls. Each layer in the stack has a purpose and an asynchronous mechanism for carrying out that purpose. From the perspective of an application, the lowest layer is the TCTI library. This API is extremely small and provides transmit / receive functions as an interface to the underlying IPC mechanism. This API can be used by higher layers to send TPM2 command buffers and receive the associated response without having to worry about the underlying implementation.

If the underlying IPC mechanism used by the TCTI library is capable of asynchronous operation then the TCTI may expose non-blocking behavior to the client. The TCTI API provides only one function capable of non-blocking I/O: The `receive` function can be passed a timeout value that will cause the calling thread to block for some time waiting for a response. If no response is received before the timeout then the caller will be returned a response code instructing them to “try again”. If this timeout is provided a timeout value of TSS2_TCTI_TIMEOUT_NONE then the function will not block at all. This is about as simple as non-blocking I/O can get.

Additionally, TCTI libraries provide the getPollHandle function. This function returns the TSS2_TCTI_POLL_HANDLE type which we map to an asynchronous I/O mechanism for various platforms. On Linux it defined as struct pollfd. On Windows it would make sense to map it to the HANDLE type.

But the TSS2_TCTI_POLL_HANDLE isn’t intended for use by the typical developer. Instead it’s intended for use as a mechanism to integrate the TSS2 stack into event driven frameworks. So if the TCTI mechanism works as advertised we should be able to use this type to drive events in the GLib main loop.

Asynchronous SAPI function calls

The mechanisms from the TCTI layer are necessary for asynchronous interaction with TPM2 devices but alone they’re not sufficient. The layers above the TCTI in the TSS2 architecture must participate by providing asynchronous versions of each function. Using the SAPI as an example, each TPM2 function is mapped to a SAPI function call. But the SAPI provides not only a synchronous version of the function call (let’s use Tss2_Sys_GetCapability in this example), but also an asynchronous version.

For our the TPM2 GetCapability function the SAPI header provides:

  1. Tss2_Sys_GetCapability – A synchronous function that will send a command to the TPM2 and block until a response is ready to be passed back to the caller.
  2. Tss2_Sys_GetCapability_Prepare – A function to prepare / construct the TPM2 command buffer from the function parameters before they’re transmitted to the TPM.
  3. Tss2_Sys_GetCapability_Complete – A function to convert the response received by the _ExecuteFinish from the byte stream representation to C structures.

Additionally the utility functions are required:

  1. Tss2_Sys_ExecuteAsync – A generic function to send the previously prepared TPM2 command buffer to the TPM. It will only call the underlying TCTI transmit function.
  2. Tss2_Sys_ExecuteFinish – A generic function to receive the TPM2 response buffer (the response to the command previously sent). The second parameter to this function is a timeout value that can be set to non-blocking.

These functions provide us with all we need to create the TPM2 command buffer, transmit it, receive the response and transform the response buffer into C structures that we can use in our program.
The event mechanism is able to trigger an event for us when there’s data ready on the struct pollfd from the underlying TCTI module.

Combining GSource and the TSS2

We’ve effectively enumerated the tools at our disposal. Now we put them together into a very minimal example. Our goal here isn’t to get perfect integration with the GLib object model, just to use the available tools to hopefully demonstrate the concept.

The TCTI layer and the TSS2_TCTI_POLL_HANDLE type mapping to the struct pollfd was the big hint in this whole thing. Using the pollfd structure we can extract the underlying file descriptor and use this well known IPC interface to learn of events from the fd. But before we set off to implement a GSource to drive events for the SAPI let’s see if there’s an existing one that we might use. File descriptors (fds) are pretty common on *nix platforms so it makes sense that there may already be a tool we can use here.

Sure enough the GLib UNIX specific utilities and integration provides us with just this. All we must do is provide the g_unix_fd_source_new function with the fd from the underlying TCTI while requesting the event be triggered when the G_IO_IN (input data ready) condition becomes true for the fd.

NOTE: Before we get too far though it’s important to point out that the TCTI we use for this example is libtcti-tabrmd. This TCTI requires the tpm2-abrmd user space resource management daemon. The other TCTIs for communicating with the kernel device driver and the TPM2 emulator do not support asynchronous I/O. The kernel driver does not support asynchronous I/O. We use a few utility functions for creating the TCTI and SAPI contexts.

The header:

#include 
#include 
    
/*  
 * Allocate and initialize an instance of the TCTI module associated with the
 * tpm2-abrmd. A successful call to this function will return a TCTI context
 * allocated by the function. It must be freed by the caller.
 * A failed call to this function will return NULL.
 */ 
TSS2_TCTI_CONTEXT*
tcti_tabrmd_init ();
    
/*  
 * Allocate and initialize an instance of a SAPI context. This context will be
 * configured to use the provided TCTI context. A successful call to this
 * function will return a SAPI context allocated by the function. It must be
 * freed by the caller.
 * A failed call to this function will return NULL.
 */
TSS2_SYS_CONTEXT*
sapi_init_from_tcti_ctx (TSS2_TCTI_CONTEXT *tcti_ctx);

/*
 * Allocate and initialize an instance of a TCTI and SAPI context. The SAPI
 * context will be configured to use a newly allocated instance of the tabrmd
 * TCTI. A successful call to this function will return a TCTI and SAPI
 * context allocated by the function. Both must be freed by the caller.
 * A failed call to this function will return NULL.
 */
TSS2_SYS_CONTEXT*
sapi_init_tabrmd (void);

And the implementation:

#include 
#include 
#include 
#include 
    
#include 
#include 
#include 

#include "context-util.h"

TSS2_TCTI_CONTEXT*
tcti_tabrmd_init ()
{   
    TSS2_RC rc;
    TSS2_TCTI_CONTEXT *tcti_ctx;
    size_t size;
    
    rc = tss2_tcti_tabrmd_init (NULL, &size);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to get allocation size for tabrmd TCTI context: "
                    "0x%" PRIx32, rc);
        return NULL;
    }
    tcti_ctx = calloc (1, size);
    if (tcti_ctx == NULL) {
        g_critical ("Allocation for TCTI context failed: %s",
                    strerror (errno));
        return NULL;
    }
    rc = tss2_tcti_tabrmd_init (tcti_ctx, &size);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to initialize tabrmd TCTI context: 0x%" PRIx32,
                    rc);
        free (tcti_ctx);
        return NULL;
    }
    return tcti_ctx;
}
TSS2_SYS_CONTEXT*
sapi_init_from_tcti_ctx (TSS2_TCTI_CONTEXT *tcti_ctx)
{
    TSS2_SYS_CONTEXT *sapi_ctx;
    TSS2_RC rc;
    size_t size;
    TSS2_ABI_VERSION abi_version = {
        .tssCreator = TSSWG_INTEROP,
        .tssFamily  = TSS_SAPI_FIRST_FAMILY,
        .tssLevel   = TSS_SAPI_FIRST_LEVEL,
        .tssVersion = TSS_SAPI_FIRST_VERSION,
    };

    size = Tss2_Sys_GetContextSize (0);
    sapi_ctx = (TSS2_SYS_CONTEXT*)calloc (1, size);
    if (sapi_ctx == NULL) {
        g_critical ("Failed to allocate 0x%zx bytes for the SAPI contextn",
                    size);
        return NULL;
    }
    rc = Tss2_Sys_Initialize (sapi_ctx, size, tcti_ctx, &abi_version);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to initialize SAPI context: 0x%xn", rc);
        free (sapi_ctx);
        return NULL;
    }
    return sapi_ctx;
}
TSS2_SYS_CONTEXT*
sapi_init_tabrmd (void) {
    TSS2_SYS_CONTEXT  *sapi_context;
    TSS2_TCTI_CONTEXT *tcti_context;

    tcti_context = tcti_tabrmd_init ();
    if (tcti_context == NULL) {
        return NULL;
    }
    sapi_context = sapi_init_from_tcti_ctx (tcti_context);
    if (sapi_context == NULL) {
        free (tcti_context);
        return NULL;
    }
    return sapi_context;
}

By combining all of this we can create a simple program that makes an asynchronous SAPI function call. At a high level our goal is to execute a TPM2 command without blocking. When the TPM2 device sends our application a response we want the GLib event loop to invoke our callback so that we can do something with the response. We accomplish this with roughly the following steps:

  1. Create the TCTI & SAPI contexts.
  2. Create a GSource for the fd associated with the TCTI using the g_unix_fd_source_new GSource constructor.
  3. Connect the GSource with the GMainContext and register a callback.
  4. Create the GMainLoop and associate it with the proper context.
  5. Make the asynchronous SAPI function call.
  6. Start the GMainLoop.

When data is ready in the TCTI our callback will be invoked and it will do the following:

  1. Finish the SAPI function call.
  2. Dump out some data returned from the SAPI function call.
  3. Quit the main loop (will terminate the program).

Here’s the code:

#include 
#include 
#include 
#include  
#include 
#include  
    
#include "context-util.h"

typedef struct { 
    TSS2_SYS_CONTEXT *sapi_context;
    GMainLoop *loop;
} data_t;
    
gboolean
fd_callback (gint fd,
             GIOCondition condition,
             gpointer user_data)
{
    data_t *data = (data_t*)user_data;
    TSS2_RC rc;
    TPMI_YES_NO more_data;
    TPMS_CAPABILITY_DATA capability_data = { 0 };

    rc = Tss2_Sys_ExecuteFinish (data->sapi_context, 0);
    rc = Tss2_Sys_GetCapability_Complete (data->sapi_context,
                                          &more_data,
                                          &capability_data);
    g_print ("Capability: 0x%" PRIx32 "n", capability_data.capability);
    g_print ("Capability data command count: %" PRIu32 "n",
             capability_data.data.command.count);
    g_main_loop_quit (data->loop);
    return G_SOURCE_REMOVE;
}
int
main (void)
{
    GMainContext *context;
    GSource *source;
    TSS2_TCTI_CONTEXT *tcti_context;
    TSS2_RC rc;
    TSS2_TCTI_POLL_HANDLE poll_handles[1];
    data_t data;
    size_t poll_handle_count;

    /* setup TCTI & SAPI contexts */
    tcti_context = tcti_tabrmd_init ();
    g_assert (tcti_context != NULL);
    data.sapi_context = sapi_init_from_tcti_ctx (tcti_context);
    g_assert (data.sapi_context != NULL);
    /* get fds to poll for I/O events */
    rc = tss2_tcti_get_poll_handles (tcti_context,
                                     poll_handles,
                                     &poll_handle_count);
    g_assert (rc == TSS2_RC_SUCCESS);
    g_assert (poll_handle_count == 1);
    if (!g_unix_set_fd_nonblocking (poll_handles[0].fd, TRUE, NULL)) {
        g_error ("failed to set fd %d to non-blocking", poll_handles[0].fd);
    }
    /* setup GLib source to monitor this fd */
    source = g_unix_fd_source_new (poll_handles[0].fd, G_IO_IN);
    context = g_main_context_new ();
    g_source_attach (source, context);
    /* setup callback */
    g_source_set_callback (source, (GSourceFunc)fd_callback, &data, NULL);
    data.loop = g_main_loop_new (context, FALSE);
    g_main_context_unref (context);
    /* make initial get capability call */
    rc = Tss2_Sys_GetCapability_Prepare (data.sapi_context,
                                         TPM_CAP_COMMANDS,
                                         TPM_CC_FIRST,
                                         MAX_CAP_CC);
    g_assert (rc == TSS2_RC_SUCCESS);
    rc = Tss2_Sys_ExecuteAsync (data.sapi_context);
    g_assert (rc == TSS2_RC_SUCCESS);
    /* run main loop */
    g_main_loop_run (data.loop);
    g_main_loop_unref (data.loop);
    /* cleanup TCTI / SAPI stuff */
    Tss2_Sys_Finalize (data.sapi_context);
    g_free (data.sapi_context);
    tss2_tcti_finalize (tcti_context);
    g_free (tcti_context);

    return 0;
}

Conclusion & Next Steps

Pretty straight forward all things considered. But this demonstration leaves a few things to be desired: Firstly no application should have to dig directly into the TCTI to pull out the underlying fd. Instead we should have a function to create a GSource from the SAPI context directly much like g_unix_fd_source_new`

Secondly, there is little in the program to convince us that the function call is truly asynchronous. To make our point we need additional tasks that the GMain loop can execute in the background while the TPM is handling our command. A GTK+ application with a chain of slow TPM2 functions that create a key hierarchy or perform some slow RSA key function (encrypt) while the UI remains responsive would be much more convincing. This may be the topic of a future post if I don’t get distracted.

Finally: this is a lot of machinery and it often comes under attack as adding too much complexity to the API. Instead I’d argue that complexity here is the minimum complexity necessary to enable easy integration of the TSS2 APIs into event driven programming frameworks. Most programmers will never have to interact with this machinery. Instead they should be used to create higher level APIs / Objects that integrate into these event-driven programming frameworks.

Haitus followed by wordpress update misery

I crawl under a rock for a few months and my penance is a miserable wordpress upgrade when I return. TL;DR: there’s a bug in Debian Jessie vsftpd that breaks the wordpress upgrade. Thankfully someone filed a bug and moving forward to the vsftp version (3.0.3) is the fix.

I had to do one more thing though: this newer version of vsftpd disables write access for ALL accounts, even authorized accounts. No big deal, just find the write_enable variable in the vsftpd.conf file, uncomment it, and set it to YES. That only took me an hour to figure out :/

But at least now my WP install is up to date, I’m out from under my rock and I can get back to my blogging routine.

TPM2.0-TSS “1.0” release

I had hoped to cut this 1.0 release last week but we received a few last minute contributions that fixed a number of stability issues and a buffer overflow in the resourcemgr. So better late than never, the 1.0 release of the TPM2.0-TSS code has been tagged!

As part of the release I went through and did some maintainer B.S. that was long overdue: adding an AUTHORS and MAINTAINERS file, as well as updating the CHANGELOG and porting it over to markdown (so it’s now CHANGELOG.md).

So now the work on the next release begins … I’m tempted to make a pile of resolutions outlining where I want to focus my efforts initially but given the hectic schedule I know how inaccurate and difficult to live up to they will be. So for now it’s just forward progress on all fronts!

TPM2 Response Codes

For those of you struggling to use the TPM2 SAPI (system API) I’ll put the TLDR here hopefully to save you time and suffering:

  • Decoding TPM2 response codes is a PITA
  • I wrote a tool to make it a bit easier on you. It’s called tpm2_rc_decode and you can find it in the tpm2.0-tools repo.

For those if you interested in the back-story and some general information about TPM2 response codes: read on.

A maintainers priorities

I’ve come to realize that the single most important thing (to me) when working with a new library or tool is having a simple / easy debugging cycle. In the most simple case, like a libc function, returning an integer error code that’s easy to understand is crucial. On Linux, the standard man pages and the obligatory ‘return value’ section are usually sufficient.

So when I took over maintenance of the TPM2.0-TSS project a few months back the first thing I did was try to write a simple program to use the most basic parts of the API. The program I wrote called a single function from the TSS API (Tss2_Sys_GetCapabilities) but of course it didn’t work initially. Worse yet the return value that I got from the function call wasn’t something that could be decoded with a simple look up.

This triggered a sort of revelation where I realized that we’ve made it nearly impossible for people to create meaningful bug reports. Back in June, someone reported the same issue on the tpm2.0-tools project: it’s extremely time consuming to decode TPM response codes by hand. This in turn means that neither the TPM2.0-TSS, or any of the projects consuming it will get good bug reports from their users. Generally this mean that they won’t have many users.

This is a pretty well bounded problem and seemed like an “easy win” that would solve two problems at the same time: make using the TPM2 TSS easier for users, and make for higher quality bug reports from said users.

TPM2 response code encoding

TPM2 response codes (hence forth RCs) aren’t simple integers. They’re unsigned 32bit integers with a pile of information encoded in them. The complexity has a purpose though: a single RC will tell you which part of the TPM software stack (TSS) produced the RC (from the TPM all the way up to the high level APIs), whether it’s an RC from the 1.2 or the 2.0 spec, which format the RC is in, the severity of the RC (warning vs error) and even which parameter caused the RC. Oh yeah, and what the actual error / warning code is. That’s a lot of data in my book.

The hardest part of decoding these RCs is tracking down data on the format and all of the other bits. A bit of reading and searching will turn up the following:

  • Part #1 of the spec (architecture) has a good overview and flow chart for decoding RCs in section 39.
  • Part #2 of the spec has the gory details on the bit fields and what they mean in section 6.6.
  • The TSS spec documents the response code layer indicators and the TSS specific RCs in section 6.1.2 (NOTE: this will change in the next iteration of the spec)

For the sake of keeping this post as accurate as possible for as long as possible I won’t reproduce much data from the specifications. That’s what they’re for. Instead I’ll keep things limited to discussion of the tool that I wrote, some of the major annoyances that I encountered and some of the work I’m doing upstream to fix things.

tpm2_rc_decode

The algorithm for decoding RCs in part 1 of the spec was a good starting point but it’s not sufficient. It omits the details around decoding RCs generated by software outside of the TPM. For this I had to account for the ‘layer indicator’ from the RC. The augmented algorithm is documented in the commit message for the tm2_rc_decode.

Once this algorithm was implemented and documented the tool mostly becomes an exercise in looking up strings in tables that map some integer value (the error code in bits 0 through 5 or 6, or the layer identifier in bits 16 through 23). I had hoped to be able to automate the generation of these tables from the specification but parsing a PDF is a pain in the ass. I’ll probably end up posting more on automating code generation from the TPM2 specs in the future so I won’t say much here.

The neglected TSS spec

This was really my first bit of work that required I dig into the RC portion of the TSS specification. When I started this work it was in pretty bad shape. The structure and definition of all of the TSS RCs and their meaning was well done but the language used in this part of the spec was horribly inconsistent. It interchanged terms like ‘error code’ with ‘response code’ and ‘return code’ seemingly at random. It played similar games with terms like ‘error level’ and ‘layer indicator’.

Now I’ve been called pedantic in the past. And I won’t argue with that label. Attention to detail is a hobby of mine. And I’m of the opinion that, when it comes to technical specifications, pedantry is a virtue. When you’re trying to pick up concepts like this with data densely packed into a single unsigned 32bit integer exactness is paramount.

There’s nothing worse (in my mind) than having loosely defined terms causing confusion between people new to the spec. As a maintainer it’s hard enough to figure out what noobs mean when they report a bug. If the terms they get from the specification can mean any number of things we’re increasingly the likelihood of miscommunication and frustration.

Conclusion

So my first contribution to the TSS specification was the complete rewriting of the RC section. The values of the constants have remained the same but we now use consistent language for ‘layer indicator’, ‘layer shift’ etc and we’ve removed uses of terms like ‘error level’ and ‘error code’ in favor of ‘layer’ and ‘response’.

If you’re developing software using the SAPI, or using the tpm2.0-tools I highly encourage you to use the tpm2_rc_decode utility. It should make your life a lot easier and the lives of those you may end up communicating with when you’re trying to debug your code.

Finally this tool isn’t perfect. None of the RCs that are generated by the resource manager are handled yet so there’s plenty of room for improvement. If you’ve got the cycles and you’re sufficiently motivated I’d gladly take patches to improve the tool.

TPM2 software stack maintainership

My last post left some work in progress on TPM2 event long handling hanging. Sadly the state of that work hasn’t progressed since. An now for the excuse: I’ve taken over maintainership of the TPM2.0-TSS project.

If you’ve ever taken over a large project with minimal time to plan for the “hand-off” then you’ll understand how that can derail pretty much every other technical project that you may have in progress. So I’ve had to back-burner the Grub / event log stuff for the short term. My focus these days is reverse engineering the internals of the TPM2.0-TSS and generally trying to make sense of the thousands of pages of relevant specs.

On the upside this is a great opportunity to get some much needed documentation and refactoring done for the TSS source tree. As the previous maintainer told me, the documentation for this code is the TPM specifications. This is synonymous with “no documentation” in my mind because while the TPM2 specs describe the functioning of a TPM and the associated software stack should behave it says nothing about how the source code is structured. This is the proverbial up hill battle that comes with taking over a “legacy software” project.

I’ve got a few posts in my backlog of lessons learned from trying to get my brain wrapped around this project. Up till now I’d just been making sure things built well and easily on Linux platforms and so my knowledge of the guts of the infrastructure was limited. Now that I’m neck deep in it I’ll do my best to get some lessons learned up here on the blog and unblock the writers block I’ve been suffering.

The first hurdle I had to overcome was making sense of the TPM2 response codes and their inherent complexity. After that I’m hoping to pick apart some of the unit tests that I’ve been pushing these last few weeks since those are mostly how I’m learning about the structure of the source code.

TPM2 UEFI Measurements and Event Log

Has it really been over 6 months since I last wrote something on the blog? Crap. The standard excuses apply: busy busy busy. I got sucked into the OSS TPM2 software stack project @ Intel in a huge way.

I screwed up something fundamental along the way though: I moved on to this new project before finishing off an older project. So I’m here to set things right. The OpenXT summit is in a week so I’m cleaning out my backlog and it’s time to write up the stuff I was working on back before I fell down the TPM2 TSS hole. Come to think of it I’ve got some interesting stuff to write about the TPM2 TSS work but first things first: TPM2 in EFI and Grub2.

This is likely the first post in what will become a series on the subject. I’ll try to keep this post to an overview of the TPM2 EFI protocol, the basics of what I’ve been adding to grub and I’ll do a slightly deeper dive in to the pre-boot TPM2 event log that’s maintained by the TPM2 EFI driver / firmware. Future posts will pick up the remaining bits and get into the details of how this was integrated into meta-measured.

TPM2 UEFI recap

I left off my last post describing some oddities I ran into in the TCG TPM 2.0 EFI Protocol specification. With that issue out of the way I spent some time playing around with the functions exposed by the protocol (in the UEFI sense) and what they would allow us to do in the pre-boot environment. Naturally there’s a purpose here: measuring stuff.

Before we get too far into the post here’s a pointer to the code I’ve been developing: . It’s been a few months since I seriously slowed down work on this to focus on the TSS for Linux but I intend to revive it so that I can finish it off in the near future. Intel has already demonstrated this code @ the TCG event @ RSA 2016 so it’s functional but still just ‘demo quality’.

The Protocol

The stuff we want to measure here is the code and the data that was run as part of booting our system. In this case we want to extend the measurement chain from the firmware up through Grub2 to the OS. Despite past efforts on a Grub2 fork known as “trusted grub” the majority of Linux systems have a gap between measurements taken by the firmware and measurements that are taken by kernel or user space code.

In our efforts to fill this gap, let’s start by taking a quick look at the functions that are exposed by the TPM2 UEFI protocol. In UEFI the concept of a ‘protocol’ isn’t what I typically think of when I hear the word. When I hear ‘protocol’ the first thing that comes to mind is a network protocol like TCP or IP. In the context of UEFI it’s a bit different: it defines the interface between UEFI applications and some set of services offered by the UEFI runtime.

The interaction model is very basic: You provide the UEFI runtime with a UUID identifying the protocol you want to use and it will return to you a structure. This structure is protocol specific and it’s effectively a table of function pointers. For the TPM2 there’s 7 commands and thus 7 entries in this structure, one for each function.

The TPM2 EFI Protocol Specification documents this structure and the functions it exposes in section 6. My implementation pulls the structure directly from the spec as follows:

typedef struct tdEFI_TCG2_PROTOCOL {
  EFI_TCG2_GET_CAPABILITY                      GetCapability;
  EFI_TCG2_GET_EVENT_LOG                       GetEventLog;
  EFI_TCG2_HASH_LOG_EXTEND_EVENT               HashLogExtendEvent;
  EFI_TCG2_SUBMIT_COMMAND                      SubmitCommand;
  EFI_TCG2_GET_ACTIVE_PCR_BANKS                GetActivePcrBanks;
  EFI_TCG2_SET_ACTIVE_PCR_BANKS                SetActivePcrBanks;
  EFI_TCG2_GET_RESULT_OF_SET_ACTIVE_PCR_BANKS  GetResultOfSetActivePcrBanks;
} GRUB_PACKED EFI_TCG2_PROTOCOL;

Each of these types is a function with a prototype / parameters etc just like any other function in C. The spec has all of the gory details, including the updated data about the unpacked structure returned by GetCapability. But given our stated goals above: measuring stuff, we really need just one command: HashLogExtendEvent.

This is probably a good time to reflect on how great UEFI is. In the PC BIOS world we would be writing assembly code to interact with the TPM using memory mapped IO or something. In UEFI we call a function to get a table of pointers to other functions and we then call one of these functions. All of this can be done in C. Pretty great IMHO. And if you’re working in the Grub2 environment Grub already has wrappers for invoking the UEFI command to get access to the protocol structure and to invoke functions from it.

On my Minnowboard Max (MBM) this is pretty easy. Add an FTDI serial cable (the one on sparkfun is way over priced but really nice) and you don’t even need a monitor. Anyways, back to it.

HashLogExtendEvent

The TPM is a pretty complicated little thing but it’s fundamental function: extending data into PCRs, is pretty simple. All you need is a buffer holding data, the size of the buffer, and a data structure describing the extend event. This last bit of data is used by the UEFI firmware to update the log of extend events (aka: “stuff we’ve measured”).

In my grub code I’ve broken this down into two functions. The first is a thin wrapper over the HashLogExtendEvent function that takes the data buffer,size and EFI_TCG2_EVENT structure as a parameter. The second is more of a convenience function that builds up the EFI_TCG2_EVENT structure for the caller using “standard” values. The two prototypes are as follows:

grub_err_t
grub_tpm2_extend (EFI_TCG2_PROTOCOL *tpm2_prot,
                  grub_efi_uint64_t  flags,
                  char              *data,
                  grub_efi_uint64_t  data_size,
                  EFI_TCG2_EVENT    *event)
grub_err_t
grub_tpm2_extend_buf (grub_uint8_t      *buf,
                      grub_efi_uint64_t  buf_len,
                      const char        *desc,
                      grub_uint32_t      pcr)

The EFI_TCG2_EVENT structure is pretty simple but it’s annoying to calculate the size values for each call so the convenience function is a good way to keep code sizes manageable. Take a look at the code here if you’re interested in the details.

When Grub does something like load a module of code, or a command from a config file or a kernel / initrd image we measure it into a TPM PCR using the grub_tpm2_extend_buf function providing it the data to hash, the length of the memory buffer holding said data, a brief description of the event (like “linux kernel” or “loadable module”) and the PCR we want to extend with the measurement.

Probably worth noting is that the current implementation uses the string obtained from the grub.cfg file in the Event field. For now this is a debugging mechanism. For a deployed implementation taking input directly from the user (the grub.cfg file should be considered as such) is pretty risky. Instead it’s probably better to use a generic string that describes what the input data is, but doesn’t contain the data directly.

TPM2 Preboot Event Log

It’s very difficult to extract meaning from the contents of a TPMs PCR. Up till now we’ve covered the bits necessary to extend data into a PCR from within Grub. We can extend value after value into it and at some time in the future read it back. But when someone looks at the value they’ll likely want to recreate it to verify every component that went into generating the final value. The name of the UEFI protocol function that extends a value into the PCR is particularly descriptive because, like it says, it also updates the preboot measurement log. This log is where we go to get the data we need to understand our PCR values.

Being able to parse this log and view the contents is also an integral part of verifying or debugging the work I’ve been doing. In this vein, after I managed to get a call to extend a value into a PCR doing more than returning an error code (yeah it happens a lot to me for some reason) I immediately wrote a bit of code to parse and dump / “prettyprint” the audit log.

Grub has a built in shell and an interface to load code modules and commands. This seemed like the “right” mechanism to load commands that I could execute on demand. If you boot grub without a config file you’ll land immediately at the shell and so I found myself debugging by running grub and then executing commands to extend arbitrary data into the log and then parsing the log and dumping it to the console output. If you’ve got a serial console hooked up you can capture this output using minicom or whatever. The commands that parse the log can be found in a module I’m calling tpm2cmd until I come up with a better name. This module has a few commands in it but to dump the event log the tpm2dumplog command is what you’ll need.

The code that walks through the event log data structure is pretty boring. It’s tightly packed so there’s a bit of work to be done in calculating offsets but it’s not rocket surgery. The interesting part is in interpreting the log and figuring out what to do with the data.

Let’s capture an audit log using these new commands. First we need to be able to boot grub and get to the grub shell. I’ve been integrating this work into the meta-measured OpenEmbedded meta-layer so if you want to see this at work you can build the core-image-tpm image or download one from my meta-measured autobuilder. If you want to build Grub by hand and install it on a thumb drive the results should be the same. Just grab the code from my github. I won’t cover the details for this here though.

Assuming you’ve got the latest core-image-tpm image from meta-measured built, just dd the ISO on to a thumb drive and boot it (on a UEFI system with a TPM2 of course). If you’re grabbing the ISO from my autobuilder the target MACHINE is an MBM running the 32bit firmware so YMMV on any other platform. When you finally get to the grub menu hit ‘c’ to get a shell. From here the tpm2cmd module provides a few functions (tpm2dumpcaps tpm2dumplog and tpm2extend) that you can execute to play around. But before we get too far into that it’s important to note that the firmware can support either the new TPM2 event log format or the old TPM 1.2 format. The capabilities structure will tell you which your firmware supports. On the Minnowboard Max I only get the 1.2 foramt:

grub> tpm2dumpcaps
TPM2 Capabilities:
  Size: 0x1c
  StructureVersion:
    Major: 0x01
    Minor: 0x00
  ProtocolVersion:
    Major: 0x01
    Minor: 0x00
  HashAlgorithmBitmap: 0x00000003
    EFI_TCG2_BOOT_HASH_ALG_SHA1: true
    EFI_TCG2_BOOT_HASH_ALG_SHA256: true
    EFI_TCG2_BOOT_HASH_ALG_SHA384: false
    EFI_TCG2_BOOT_HASH_ALG_SHA512: false
    EFI_TCG2_BOOT_HASH_ALG_SM3_256: false
  SupportedEventLogs: 0x00000001
    EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2: true
    EFI_TCG2_EVENT_LOG_FORMAT_TCG_2: false
  TPMPresentFlag: 0x01 : true
  MaxCommandSize: 0x0f80
  MaxResponseSize: 0x0f80
  ManufacturerID: 0x494e5443
  NumberOfPcrBanks: 0x00000000
  ActivePcrBanks: 0x00000000

The tpm2dumplog function is smart enough to check this structure before walking the log. It will prefer the TPM2 event log format if both are supported. There’s even a --format option so you can select which you want if both are supported but I haven’t had a chance to test this since I’ve not yet found a system with firmware supporting the 2.0 format. Booting straight into the grub shell and executing the tpm2dumplog command produces the log file below. Let’s pick a few example entries and see if we can figure out what they mean.

grub> tpm2
Possible commands are:

 tpm2dumpcaps tpm2dumplog tpm2extend tpm2sendlog
grub> tpm2dump
Possible commands are:

 tpm2dumpcaps tpm2dumplog
grub> tpm2dumplog 
TPM2 EventLog
  start: 0x79373000
  end: 0x79373c98
  truncated: false
prettyprint_tpm12_event at: 0x79373000
  PCRIndex: 0
  EventType: EV_S_CRTM_VERSION (0x00000008)
  digest: 1489f923c4dca729178b3e3233458550d8dddf29
  EventSize: 2
  Event: 
prettyprint_tpm12_event at: 0x79373022
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 76bd373351e3531ae3e2257e0b07951a2f61ae42
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x79373052
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: f46823e31fcb3059dbd9a69e5cc679a4465ea318
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x79373082
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 87ce3bc3cd17fe797cc04d507c2f8f3b6c418552
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x793730b2
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 8725eb98a49d6227459c6c90699c5187c5f11e16
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x793730e2
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 57cd4dc19442475aa82743484f3b1caa88e142b8
  EventSize: 53
  Event: a????
prettyprint_tpm12_event at: 0x79373137
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 9b1387306ebb7ff8e795e7be77563666bbf4516e
  EventSize: 36
  Event: a????
prettyprint_tpm12_event at: 0x7937317b
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 9afa86c507419b8570c62167cb9486d9fc809758
  EventSize: 38
  Event: a????
prettyprint_tpm12_event at: 0x793731c1
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 5bf8faa078d40ffbd03317c93398b01229a0e1e0
  EventSize: 36
  Event: ?:=?E????geo
prettyprint_tpm12_event at: 0x79373205
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 734424c9fe8fc71716c42096f4b74c88733b175e
  EventSize: 38
  Event: ?:=?E????geo
prettyprint_tpm12_event at: 0x7937324b
  PCRIndex: 7
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937326f
  PCRIndex: 1
  EventType: EV_EFI_HANDOFF_TABLES (0x80000009)
  digest: ed620fde0f449cec32fc0eaa040d04e1f1888b25
  EventSize: 24
  Event: 
prettyprint_tpm12_event at: 0x793732a7
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: aae4f77a57d91bf7beeee06e053a73eec78cc9ec
  EventSize: 62
  Event: a????
prettyprint_tpm12_event at: 0x79373305
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: d2ab4e0ed4185d5c846fb56980907a65aa8d3103
  EventSize: 168
  Event: a????
prettyprint_tpm12_event at: 0x793733cd
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 7c7e40269acd5fb401b6f49034b46699bc5c5777
  EventSize: 136
  Event: a????
prettyprint_tpm12_event at: 0x79373475
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: abb0830c9bef09a2c108ceba8f0ff8438991b20a
  EventSize: 206
  Event: a????
prettyprint_tpm12_event at: 0x79373563
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 79d23fb4ea34f740725783540657c37775fb83b0
  EventSize: 239
  Event: a????
prettyprint_tpm12_event at: 0x79373672
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: b8d59f3cde8b1fc5b89ff0b61f7096903734accb
  EventSize: 117
  Event: a????
prettyprint_tpm12_event at: 0x79373707
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 22d22e14e623d1714cd605fef81e114ad8882d78
  EventSize: 121
  Event: a????
prettyprint_tpm12_event at: 0x793737a0
  PCRIndex: 5
  EventType: EV_EFI_ACTION (0x80000007)
  digest: cd0fdb4531a6ec41be2753ba042637d6e5f7f256
  EventSize: 40
  Event: Calling EFI Application from Boot Option
prettyprint_tpm12_event at: 0x793737e8
  PCRIndex: 0
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937380c
  PCRIndex: 1
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373830
  PCRIndex: 2
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373854
  PCRIndex: 3
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373878
  PCRIndex: 4
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937389c
  PCRIndex: 5
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x793738c0
  PCRIndex: 6
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x793738e4
  PCRIndex: 5
  EventType: EV_EFI_ACTION (0x80000007)
  digest: b6ae9742d3936a4291cfed8df775bc4657e368c0
  EventSize: 47
  Event: Returning from EFI Application from Boot Option
prettyprint_tpm12_event at: 0x79373933
  PCRIndex: 4
  EventType: EV_EFI_BOOT_SERVICES_APPLICATION (0x80000003)
  digest: 2d13d06efd0330decd93f992991b97218410688d
  EventSize: 64
  Event: ?kx
prettyprint_tpm12_event at: 0x79373993
  PCRIndex: 4
  EventType: EV_EFI_BOOT_SERVICES_APPLICATION (0x80000003)
  digest: e0fcc7f88f096737f8a95d7a86e7a4b33c365ebc
  EventSize: 145
  Event: Pqx
prettyprint_tpm12_event at: 0x79373a44
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 5c88780f029068d6f5863b943575556d7b98c558
  EventSize: 76
  Event: Grub2 command: serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
prettyprint_tpm12_event at: 0x79373ab0
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 392dc11bf1eea9e5e933e0e401fba80fda90f912
  EventSize: 28
  Event: Grub2 command: default=boot
prettyprint_tpm12_event at: 0x79373aec
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: b993a2e236f21b72a05e401d241a5e7617367738
  EventSize: 26
  Event: Grub2 command: timeout=10
prettyprint_tpm12_event at: 0x79373b26
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: f6afd73c21afe5a5136e7fdf7068d71fb4fc014b
  EventSize: 148
  Event: Grub2 command: menuentry boot {
linux /vmlinuz LABEL=boot root=/dev/ram0 console=ttyS0,115200 console=ttyPCH0,115200 console=tty0 
initrd /initrd
}
prettyprint_tpm12_event at: 0x79373bda
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 02775bb98cd27a98178eee72a1cd66bec285c839
  EventSize: 158
  Event: Grub2 command: menuentry install {
linux /vmlinuz LABEL=install-efi root=/dev/ram0 console=ttyS0,115200 console=ttyPCH0,115200
console=tty0 
initrd /initrd
}
prettyprint_tpm12_event at: 0x79373c98
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 69dd51ec4450a2e7b6f3da83370998908aa3370e
  EventSize: 27
  Event: Grub2 command: tpm2dumplog
grub>

The fields here are documented in the specification (version 1.22 of the TPM EFI Protocol spec this time though and in section 3.1.3) so I won’t cover them in detail. Instead let’s look at the first two events:

prettyprint_tpm12_event at: 0x79373000
  PCRIndex: 0
  EventType: EV_S_CRTM_VERSION (0x00000008)
  digest: 1489f923c4dca729178b3e3233458550d8dddf29
  EventSize: 2
  Event: 
prettyprint_tpm12_event at: 0x79373022
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 76bd373351e3531ae3e2257e0b07951a2f61ae42
  EventSize: 16
  Event: 

Not a lot of data in these entries. Notably the ‘Event’ field is blank (this is where a textual description of the event should be). The first EventType is the telling bit though (that and that it’s the first thing measrued). This is the first measurement and thus the CRTM, well the CRTM version number at least? The second is much more generic: just some blob of firmware from the platform. What I do find puzzling though is that the first event is listed as 2 bytes long and the second as 16. This length is of the textual event data, the description that looks to be missing. What’s really happening here is that my code treats this field as a NULL terminated string, and the first byte is NULL so the code just doesn’t print anything. This is probably something worth investigating in the future (note to self).

This provides us with an interesting view of the boot process. Neither of these measurements (or the string of events that land in PCR[0]) have much meaning from this angle but the MBM firmware can be built from the EDK2 sources so it may be that with enough code reading we could divine where these values came from. The best thing we have to go on (without additional cooperation from the platform firmware provider) is the EventType and the PCR index where the data was extended. The TCG PC Client Platform Firmware Profile defines “PCR Usage” in section 2.2.4 and PCR[0] is for “SRTM, BIOS, Host Platform Extensions, Embedded Option ROMs and PI Drivers” so basically “firmware”.

For the code that measures the bits that grub loads and depends upon (modules and configuration data) we use PCRs 8 and 9. According to the PC Client PlatformPlatform Firmware Profile 8 and 9 are “Defined for use by the Static OS” which I guess includes the bootloader. According to convention (passed down to me by word of mouth) the even numbered PCRs are where binary data gets hashed and the odd PCRs are for configuration data. So I’m using PCR[8] for Grub modules and binary blobs (kernel, initrd) loaded by grub through the linux command. PCR[9] is where we extend the configuration data loaded by Grub and the commands that Grub executes for the user when they’re in the Grub shell.

The first event in the log for PCR[9] recorded by the core-image-tpm image from meta-measured is the command to set up the serial port. This comes straight from the grub.cfg produced by the openembedded-core recipe:

prettyprint_tpm12_event at: 0x79373a44
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 5c88780f029068d6f5863b943575556d7b98c558
  EventSize: 76
  Event: Grub2 command: serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1

And you can verify this is the correct value on the console if you want to check my hashing logic:

$ echo -n "serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1" | sha1sum
5c88780f029068d6f5863b943575556d7b98c558  -

Transferring the Pre-Boot Event Log to the Kernel

What’s conspicuously missing though are any measurements of Grub’s components (they would be in PCR[8]). The reason for this: building the TPM2 image from meta-measured produces an “embedded” Grub2 configuration with all modules built directly into the EFI executable so no modules to load. So what about the kernel and initrd you ask? In this example I’ve booted into the Grub shell so we haven’t booted a kernel yet. If we do boot the kernel we end up in a weird situation. You’d expect to see the following entries (that I’ve caused on my system manually by executing linux (hd0,msdos2)/vmlinux and a similar command for the initrd):

prettyprint_tpm12_event at: 0x79373e13
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: bfbd2797ce79ccf12c5e1abbf00c10950e12a8db
  EventSize: 42
  Event: Grub2 command: linux (hd0,msdos2)/vmlinuz
prettyprint_tpm12_event at: 0x79373e5d
  PCRIndex: 8
  EventType: EV_IPL (0x0000000d)
  digest: f50f32b01c3b927462c179d24b6ab8018b66e7bc
  EventSize: 40
  Event: Grub2 Linux initrd: (hd0,msdos2)/initrd

But once the kernel takes over the preboot event log is destroyed. The TPM2 spec is different from the 1.2 version in that there isn’t an ACPI table defined for the preboot audit log. It lives only in the UEFI runtime and when Grub calls the ExitBootServices function the event log is destroyed. So all of this code to measure various bits of Grub will produce data that the kernel and user space will never be able to analyze unless we come up with a mechanism to transfer the log to the kernel. This is surprisingly more difficult than I was hoping.

The two solutions I’ve come up with / run across are:

  1. Create a new UEFI config table and copy the contents of the audit log into it.
  2. Have the kernel copy the table before calling ExitBootServices itself, which requires that Grub be patched to not call this function itself (an idea I got from a conversation with Matthew Garrett).

Both of these approaches have their merits and their drawbacks. We’ll discuss those in more detail in the next post.

TrEE vs TCG

I’ve been learning a bit about UEFI and the new TPM2 specifications these past few weeks. My work in this space isn’t ready for public consumption but I’ve run into some interesting data worth mentioning here. I got really stuck on this on account of my UEFI stills being very weak. Thankfully others ran into the same issue and shared their knowledge. I’m putting this up in the hopes of keeping others from falling down the same hole that took up so much of my time.

Dueling Specifications

When I started poking at the 2.0 TPM on my Minnowboard Max, knowing the interplay between the Microsoft TrEE and the TCG TPM 2.0 EFI Protocol specifications would have saved me a pile of time. AFAIK Microsoft is a very big contributor to the TPM 2.0 specs and Windows 8 and 10 both build on it for security infrastructure. The new TPM specs from the TCG are still in draft form and out for pubic review which is good, but Microsoft is shipping real live systems that use with version 2.0 TPMs.

I don’t participate in any of the TCG stuff other than writing software that uses the TPM so I’ve got no inside knowledge here, but I speculate that it’s hard to ship systems when a core component doesn’t have a published interface specification. So what to do? Microsoft published their own version of the TPM UEFI version 2 Protocol specification under the Trusted Execution Environment EFI Protocol name. You can argue that this isn’t the same as the TCG EFI Protocol Specification for TPM Family 2.0 Revision 1.0 but by UEFI law these two protocols have the same GUID and so they’re the same protocol. Period.

Same GUID, Subtle Differences

So Microsoft published a version of the specification early to support their customers and developer community. I don’t see that they had any other choice. But how does this cause problems you ask? Grab your Minnowboard Max, install the 32bit firmware so you get the Intel PTT (firmware TPM2), code up an EFI application that calls the GetCapability function from the protocol and you’ll see. The short version is that there is a subtle difference in the two specs that causes incompatibility. The TCG spec clearly states that all structures in the spec are packed. The Microsoft spec on the other hand says nothing about struct packing outside of the code samples. Their samples however use macros #pragma pack(1) to show which structures should be packed and the TREE_BOOT_SERVICE_CAPABILITY (synonymous with the TCG EFI_TCG2_BOOT_SERVICE_CAPABILITY structure) is not defined with this macro in effect.

The end result is that since Microsoft is (AFAIK) the only platform currently supporting TPM2, all TPM2 hardware / firmware out there implements their TrEE spec. The UEFI protocol implemented on these platforms will return to you an unpacked struct when you call the GetCapability function and if you’re like me and you’re working from the TCG draft spec you’ll be banging your head against the wall for a bit.

Lessons Learned

Pretty interesting first exposure to UEFI and TPM2. Painful in that my UEFI debugging skills are pretty weak. Interesting caus I’ve learned a ton. Checking structure packing is now at the top of my debugging check list. It’s also very interesting to see dueling specs like this. The reality of having to ship hardware / firmware / software before a specification is finalized has got to be a tricky business so it’s hard to fault Microsoft for this.

The relevant upstreams know about this issue thanks to Matthew Garrett and I’m sure they’ll resolve it before the TCG spec is finalized. Most likely the TCG draft spec will be updated and the capability structure will officially not be packed. Don’t bank on this though. YMMV and every other possible disclaimer.

An additional foot note is that the current TCG spec prescribes conflicting version numbers for the StructureVersion and ProtocolVersion fields in the capability structure too. I’ve reported this as part of the public review process as well.

meta-measured updates

I’ve been grinding down updates to meta-measured slowly but surely over these past few days. This was started a while back but slowed to a stop when I got distracted by work on meta-selinux. Last week my work to upgrade the SELinux toolstack was merged upstream so I’m re-focusing on getting meta-measured back in shape. This is a quick run down on the changes.

General Maintenance and Release Tracking

I haven’t been chasing upstream much so most of this work has just been maintenance. The kernel in OE has moved forward with 4.1 the new stable in master. The read-only root file system class was broken when I tried a fresh master build the other day. When jumping in to start a refresh, finding core features broken can be a bit discouraging. Instead of taking on this fix immediately I figured it would be best to create a branch to track the latest release (fido) and get that working first. I’ve always avoided tracking OE releases because I don’t want to give the impression that meta-measured is in any way “stable”, as if having my master branch broken on a regular basis wasn’t a good enough indicator.

Seriously though, I kinda dropped the “chasing master” ball. Having a branch that follows the latest release is a good way to make getting everything up to date and it’s a good insurance policy against future breakage in master. So win / win on that. Fido still has a functional readonly-rootfs class so, for the 2 other people out there who have kicked the tires on meta-measured, fall back to a release branch. My goal is to keep at least one release branch in working order.

There are two changes between fido and the current master that have an impact on meta-measured. The first is the upgrade to GCC5. For the trousers and tpm-tools packages this requires a patch that I’ve gratefully borrowed from the hardworking people over at the Debian project. The other is meta-intel nuking the meta-nuc layer. This is a pretty minor change and I’m now using the measured-intel-corei7-64 that extends the intel-corei7-64 machine from meta-intel on the master branch. Fido still has the NUC machine so I’ll stick to that for Fido.

systemd

The first external submission to meta-measured came in a few months back. This was to add systemd support to the trousers package. I had done something similar to this as part of some work to get meta-measured working with the Edison but it was in the BSP layer. It really shouldn’t have been in the BSP layer to begin and my implementation was a hack regardless. The patch I received that adds this feature directly to meta-measured was well done so I’ve added it directly and removed my hack from the BSP.

The only bit I didn’t like from the patch was a utility script that loaded every TPM module under /lib/modules/$(uname -r). This seems a bit heavy handed given that modern Linux kernels autoload these modules when the device is detected. Pretty minor change though and this is now in meta-measured proper.

IMA and TPM2

I started playing around with configuring IMA a while back as a way to exercise the TPM2 that’s baked into the Minnowboard Max as part of PTT. I don’t have any specific plans to do much with IMA just yet but with the xattrs patches I pushed into meta-selinux I’m hoping to be able to support IMA appraisal. It would be a great use case and demonstration to show the utility of my xattr work with an OE image booting with IMA appraisal enforced from firstboot.

UEFI Breakage

I’ve been saving the bad news for last unfortunately. Somewhere in the changes that have been going on with OE, my bringing meta-measured up to date and flashing the firmware on my IVB NUC, measured-image-bootimg broke. I’m pretty sure this is a problem with the latest firmware for the NUC but that’s a guess. Grub2 comes up OK but tboot doesn’t seem to take over.

In prevous debugging I learned that tboot won’t log to VGA since UEFI doesn’t setup a text console for use by whatever executes after ExitBootServices() is called. It’s frame buffer only in UEFI world I guess. But tboot would still log to a serial device through direct I/O. Not this time though. Once Grub2 kicks off tboot my NUC is unresponsive and I get no additional output. Booting straight to the kernel from grub works fine but going the tboot route through multiboot2 isn’t working.

I’m pretty much out of debugging ideas here. My test system is getting a bit old and it’s the only UEFI system I have for use here so I’m writing this off as a loss for the time being. If anyone out there is testing this on a UEFI system I’d love to hear about whatever results you have to report.

Next steps

I’ve added the contributed systemd support to trousers but I haven’t enabled or tested systemd support for my reference images. This is next on my list of things to do. I’ve got trousers working on a systemd driven image previously in some work to wire a TPM up to an Edison system (more on this in the future hopefully) but I was working on on the reference Edison images. Stay tuned and hopefully I’ll have proper systemd support in the near future.

xattrs in OE rootfs part 2

This is an update on some work from a previous post. My xattr patches sat on the yocto mailing list waiting for meta-selinux maintainers to review them for about a month. I got a bit testy and even commented to another contributor off list about the possible need to fork given the lack of response from maintainers. Turns out that this contributor was using his gmail account but actually works for the same company as the maintainer. So about 24 hours later the maintainers magically reappeared and merged my patches!

I really have no interest in forking and maintaining meta-selinux so this was a win. Hopefully I didn’t look like too much of a jerk in the process … ¯_(ツ)_/¯ After a quick stint in a feature branch my xattr code was merged into meta-selinux master. This is only a step in the process though. These patches really belong in the openembedded-core repository. They can likely be moved with minimal effort on the technical side. Getting traction for these sorts of fringe features upstream however takes a bunch of time though, as demonstrated by the 2 months it took to get this into meta-selinux.

As a strategy I should probably prove out an additional use case beyond SELinux file labels. I did some work on meta-measured a while back to add basic IMA support. I really want to expand this to support IMA appraisal. I’m not a huge fan of local enforcement policies but this use case is a great example of build-time xattr usage that would demonstrate the utility of my xattr patches. This is all food for thought though at this point.

Building and Booting Grub2 for a UEFI System

I’ve been playing around with Grub2 a bit while trying to learn something about UEFI. Long story short: I’m really interested in the current measurement gap between the UEFI environment and the Linux OS. Before I go too far into those details I just wanted to get a quick note up on building Grub2 for a UEFI target, specifcally the x86_64 (aka amd64) target.

WARNING: This post won’t help you get Linux or any other OS booted from Grub. I’m only covering the steps necessary to build Grub for the efi target and setting up a UEFI boot disk to run it. That’s all.

Build

Doing the actual build is the easy part. First things first: get the source from the GNU project website. Then RTFM in the INSTALL file. For my setup, specifying --target and --with-platform. The build goes something like this:

./autogen.sh
./configure --target=x86_64 --with-platform=efi
make

UEFI bootable USB

UEFI is a huge departure from the legacy BIOS world. To keep this post short we won’t spend any time discussing BIOS but if you’ve ever wrestled with fixing a broken grub-pc install you’ll know what I mean. Installing or fixing a UEFI Grub system by comparison is pretty simple:

  1. Format a disk (USB in my case) with a GPT using your favorite partitioning tool.
  2. Create a partition with the appropriate type for UEFI. Unfortunately the name for this type is different depending on which partitioning tool you’re using. I use fdisk and the type is ‘EFI System’. It doesn’t need to be large, just enough to hold the grub executable. I made mine 100M which is extremely large. Make note of the UUID of the partition.
  3. Format the partition as FAT16 or FAT32.

Create the UEFI Executable

This builds everything but we don’t yet have a UEFI executable. To create one you’ll need to run the grub-mkimage utility. This is the root of the build directory and it’s responsible for stitching the grub kernel, core modules and a builtin config to bootstrap grub. Generally you’d only build the minimally necessary grub modules into the UEFI executable but since this is for my debug setup I’m going to build every module in. This makes the executable very large but it’s easier.

Before we can do this we need to grok the idea of the builtin config. Grub needs a hint when it comes to setting up the system. For my purposes all I care about is getting the FAT partition mounted so my config just tells Grub how to find the root partition and what to set as the prefix.

(
  cat < grub-buildin.cfg

The --fs-uuid is pretty self explanitory. The UUID in the above code fragment isn’t the actual UUID of my UEFI system partition. If you’re following along at home you can get the UUID for the FS you created above using the blkid utility. The prefix tells Grub where to look for the grub.cfg configuration. We won’t even supply one here so the system will boot straight to the grub shell.

Since we’ve specified the prefix in the builtin config the -p option below is useless but grub-mkimage throws an error if it’s not provided. The value we pass is obviously wrong but the image produced will still function.

Finally execute the command:

./grub-mkimage -d ./grub-core -o bootx64.efi -O x86_64-efi -c ./grub-buildin.cfg -p /foobar $(find . -name '*.mod' | tr 'n' ' ' | sed -e 's/.mod//g')

Setup the Boot Disk

By default UEFI firmware will look for the bootloader on a disk at /EFI/BOOT/bootx64.efi. So all we need to do is mount our EFI system partition, create the /EFI/BOOT directory and copy the bootx64.efi executable there. In the above command we built in ever module that Grub has so we don’t need to worry about copying over any modules.

And that should do it. Point your system to this USB device, boot up the grub efi executable and you should go straight to a Grub shell. If you want to get an OS up and running on top of this you’ve still got a long way to go. But if you’re like me and just screwing around with grub (more on this later) then you’ve got a pretty good test environment. Now all you need to do is write a quick config to get Grub logging to some serial hardware and crank up the debug output!