Creating a Varnish module


Besides being a rock-solid HTTP cache, Varnish is an invaluable source of architecture best practices. Here, I'm of course talking about Varnish 3, since it's the first major version providing a modular architecture. So, let's learn how to create a VMOD.

A word on Varnish's architecture

The famous design principle of Varnish is the ''work with the Kernel, not against it''. There are several aspects of the architecture that allow Varnish to achieve such a goal. The child process (which does the caching) is isolated from the management process which for instance compiles the VCLs. System calls are kept to a minimum with a workspace-oriented memory model for the threads processing requests. Those are mere examples, there is a lot more to Varnish, but this is what we need to understand how to create a module.

Varnish is written in C, so are the VMODs. I actually am a Java developer, so I know what our usual readers might think about this. It isn't that hard to switch from Java to C, and Varnish's memory model actually have similarities with the Java Memory Model. It isn't actually required to know C if you want to attend the VMOD training course.

The Varnish Configuration Language

If you know Varnish, you already know about the Varnish Configuration Language. As I mentioned earlier, VCLs are compiled by the management process. Varnish is configured with actual compiled source code, which allows greater performance than interpreted configuration. VCL is very limited as a programming language, you can't create loops for instance. This makes configuration limited, but also simpler and safer. It is possible to inline C code in the VCL to overcome the limitations, you might also need to create a VMOD depending on your needs.

Where to start ?

Varnish Software provides a helloworld module with autotools configuration, an implementation, and a test case out of the box. The easiest way to bootstrap a VMOD is simply by forking libvmod-example.

git clone https://github.com/varnish/libvmod-example.git

It's then easy to find which files you need to edit to rename the VMOD or the hello function.

cd libvmod-example
git grep -e example -e hello

Building the module

In order to build the module, you actually need to build Varnish from the source. You can download a source distribution of varnish or clone the git repository and follow the build instructions:

git clone https://github.com/varnish/Varnish-Cache.git
cd Varnish-Cache
git checkout varnish-3.0.2
./autogen.sh
./configure
make

Once Varnish is built, you can include headers, link to the libraries and invoke varnishtest (yes, I'm talking about TDD). It is required by the libvmod-example skeleton, but it isn't mandatory[1].

You module can be built with:

# go back to the libvmod-example repo
./autogen.sh
./configure VARNISHSRC=/path/to/varnish/sources
make

Developing the module

You only need two things to develop your module:

  • declare the functions
  • write the functions

You also need to know the mapping between C types and VCL types. This is documented here : https://www.varnish-cache.org/docs/3.0/reference/vmod.html#vcl-and-c-data-types.

Declaring the functions

This part is quite easy, you declare the module name, an initialization function and your module's functions. In this case, we have a single function hello that takes a string and returns a new string. If you're creating stateless functions, you don't need to care about the initialization functions.

Module example
Init init_function
Function STRING hello(STRING)

About the STRING type, please note that it is immutable, you are not supposed to edit a string, but rather create a new one (that's one thing I love about the java.lang.String class). This is mapped to const char *.

Implementing the functions

The hello function is very simple, hello("world") returns "Hello, world". If you know your standard C libraries, you can do that very easily :

const char *
vmod_hello(struct sess *sp, const char *name)
{
       unsigned length;
       char* result;
       
       /* strlen("Hello, ") + strlen(name) + trailing '\0' */
       length = 7 + strlen(name) + 1;
       result = malloc(length);
       
       if (result == NULL) {
               return NULL;
       }
       
       strcpy(result, "Hello, ");
       strcat(result, name);
       
       return result;
}

The problem here is that Varnish can't know how or when to free the string. It is actually possible to return a manually-allocated value and matching a free function, but you don't wan't to do that, unless you're dealing with third-party API that cannot integrate with Varnish's memory model.

The workspace memory model

In order to understand the underlying API of Varnish, I had to read the actual source code. Do I miss my usual javadoc ? Sometimes yes, Varnish's code base is not easy, but the workspace API is not that hard to understand.

Headers to include

First of all, let's take a look at the includes :

#include "vrt.h"
#include "bin/varnishd/cache.h"

#include "vcc_if.h"

The vrt.h header provides various VCL data structures and functions, such as regexp functions. In cache.h you will find many data structures and functions, including the workspace functions.

/* cache_ws.c */

void WS_Init(struct ws *ws, const char *id, void *space, unsigned len);
unsigned WS_Reserve(struct ws *ws, unsigned bytes);
void WS_Release(struct ws *ws, unsigned bytes);
void WS_ReleaseP(struct ws *ws, char *ptr);
void WS_Assert(const struct ws *ws);
void WS_Reset(struct ws *ws, char *p);
char *WS_Alloc(struct ws *ws, unsigned bytes);
char *WS_Dup(struct ws *ws, const char *);
char *WS_Snapshot(struct ws *ws);
unsigned WS_Free(const struct ws *ws);

The vcc_if.h is generated by the ./configure script and contains the declaration of your VMOD's functions. This how you know your functions signatures.

How it works

Every worker thread has its own workspace where it can allocate at will in virtual memory (of course it is bounded to a maximum size). Worker threads are those which receive and answer requests. The workspace is a "large" contiguous char array defined as such :

struct ws {
       unsigned                magic;
#define WS_MAGIC               0x35fac554
       unsigned                overflow;       /* workspace overflowed */
       const char              *id;            /* identity */
       char                    *s;             /* (S)tart of buffer */
       char                    *f;             /* (F)ree pointer */
       char                    *r;             /* (R)eserved length */
       char                    *e;             /* (E)nd of buffer */
};

The magic field must contain the WS_MAGIC value to assert we are pointing to an actual workspace (this is one of the sanity checks used by the workspace functions). The overflow field is a counter of buffer overflows (but it won't make you write past the buffer end, sanity checks again). The important part as a workspace user lies in the SFRE fields.

The start and end fields point to the actual start and theoretical end (remember this is virtual memory) of the char array. The free field points to currently available memory. It can be viewed as a head that moves forward every time memory is allocated. If you try to allocate too much memory, the free field will exceed the end boundary. As for the reserved field, it allows incremental allocation within the workspace. It also locks all workspace allocation functions until you actually release it (I think it is worth mentioning that you must ensure it will be released)!

A workspace allocation simply moves the free pointer... Workspace allocation

...unless there isn't enough free space in the workspace. Workspace allocation overflow

Many thanks to Guillaume Gaulard who made my original illutrations look a lot better

Using the Varnish API

It looks like we have to break the "hiding internals" principle in order to use the Varnish API. I actually read all the workspace implementation (just a few lines of code surprisingly) and learned quite a few tricks with just that. So, now, we can get rid of the obvious memory leak we coined earlier and simply replace malloc by WS_Alloc :

const char *
vmod_hello(struct sess *sp, const char *name)
{
       unsigned length;
       char* result;
       
       /* strlen("Hello, ") + strlen(name) + trailing '\0' */
       length = 7 + strlen(name) + 1;
       result = WS_Alloc(sp->wrk->ws, length);
       
       if (result == NULL) {
               return NULL;
       }
        
       strcpy(result, "Hello, ");
       strcat(result, name);
       
       return result;
}

Why am I still not satisfied with that ? It does a safe allocation in the workspace and produces the expected output, right ? Let's look at the vmod real implementation :

const char *
vmod_hello(struct sess *sp, const char *name)
{
       char *p;
       unsigned u, v;

       u = WS_Reserve(sp->wrk->ws, 0); /* Reserve some work space */
       p = sp->wrk->ws->f;                /* Front of workspace area */
       v = snprintf(p, u, "Hello, %s", name);
       v++;
       if (v > u) {
               /* No space, reset and leave */
               WS_Release(sp->wrk->ws, 0);
               return (NULL);
       }
       /* Update work space with what we've used */
       WS_Release(sp->wrk->ws, v);
       return (p);
}

It uses the WS_Reserve function (the 0 length means reserve any available space) instead of WS_Alloc. It is interesting when you can't predict the final size you need to allocate. If pre-computing required space is costly, you might want to change your algorithm. Remember that workspace allocation or reservation is of constant time and space complexity (almost free). If you indeed exceed the end of the workspace, you'll have wasted CPU instructions writing in your workspace. But those few CPU cycles are not your main problem if you encounter a workspace buffer overflow.

Conclusion

Writing a module for Varnish 3 could have been a real pain, but the libvmod-example module makes it possible to create a working project rapidly. The C language can be a barrier but if you have a good knowledge of programming and know a little about how a CPU works, it is only a matter of syntax and API learning. The tooling is also a problem when you are not familiar with C programming in Unix environments (I am still using a plain text editor with syntax highlighting for source editing and bash for make usage). Consider the VMOD training course to learn more about Varnish hacking, we even offer training in french. !

Last but not least, Varnish comes with a nice test framework. On top of the VCL, varnishtest uses another DSL called VTC (varnish test case). You can do TDD with Varnish and next time I'll explain how to use it!

Notes

[1] For instance, openSUSE has a varnish-devel package which provides what's needed to build a module, but then you have to rewrite your autotools configuration.


Fil des commentaires de ce billet

Ajouter un commentaire

Le code HTML est affiché comme du texte et les adresses web sont automatiquement transformées.