The Behance Network uses a number of nifty little binary files to create some of the useful services that our platform offers. Two of them in particular are wkhtmltoimage and wkhtmltopdf ( both 64-bit static binary files ). These two files convert HTML to either a PDF or a thumbnail image of a full webpage and dumps the output. These tools work flawlessly on both our sandbox environments ( Ubuntu 11.04 ) and our production image servers ( CentOS 5.5 ). When we try to execute these files on one of our new Imageservice cloud servers ( CentOS 5.4 ), we receive the dreaded:
Let’s start off with basics, what exactly is a segfault?
According to Wikipedia ( Article ):
A segmentation fault (often shortened to segfault) or bus error is generally an attempt to access memory that the CPU cannot physically address. It occurs when the hardware notifies a Unix-like operating system about a memory access violation. The OS kernel then sends a signal to the process which caused the exception. By default, the process receiving the signal dumps core and terminates.
But surely we can obtain a little information about why this is happening, no? Let’s try a couple things.
First, let’s try to use “strace” ( trace system calls / signals … should help us figure out if we’re missing a dependency or something like that ):
**$ strace ./wkhtmltopdf-amd64 execve("./wkhtmltopdf-amd64″, ["./wkhtmltopdf-amd64"], [/* 24 vars */]) = 0 mmap(0x26d8000, 34316715, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x26d8000 readlink("/proc/self/exe", "/path/to/imageserice/library/wkhtml/wkhtmltopdf-amd64″…, 4096) = 72 mmap(0×400000, 36536320, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0×400000 mmap(0×400000, 32300155, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0×400000 mprotect(0×400000, 32300155, PROT_READ|PROT_EXEC) = 0 mmap(0x24cd000, 1950128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0x1ecd000) = 0x24cd000 mprotect(0x24cd000, 1950128, PROT_READ|PROT_WRITE) = 0 mmap(0x26aa000, 188272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x26aa000 brk(0x26d8000) = 0x26d8000 — SIGSEGV (Segmentation fault) @ 0 (0) — +++ killed by SIGSEGV +++**
Well, that was useless. Right when it starts executing, we get hit with a Segfault ( SIGSEGV ).
Okay, lets try using “gdb” to debug the file and see whats going on “inside” while it executes. The “bt” command ( backtrace ) should be able to find out whats wrong right before it crashes.
**$ gdb GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Use the "file" or "exec-file" command. (gdb) file ./wkhtmltopdf-amd64 Reading symbols from /path/to/imageserice/library/wkhtml/wkhtmltopdf-amd64…(no debugging symbols found)…done. (gdb) run Starting program: /path/to/imageserice/library/wkhtml/wkhtmltopdf-amd64 ./wkhtmltopdf-amd64** **Program received signal SIGSEGV, Segmentation fault. 0×0000000003166025 in ?? () (gdb) bt #0 0×0000000003166025 in ?? () #1 0×0000000003166438 in ?? () #2 0×0000000000000000 in ?? () (gdb)**
Again, useless. Crashes before it even starts.
What’s the big difference between our Imageservice in production and the new one we plan on using ?
Imageservice ( CentOS 5.5 ) vs _Imageservice.2 _( CentOS 5.4 )
One word: Xen
What the hell is Xen? Wikipedia ( Article ):
The Xen® hypervisor, the powerful open source industry standard for virtualization, offers a powerful, efficient, and secure feature set for virtualization of x86, x86_64, IA64, ARM, and other CPU architectures
Simply put, Xen is a completely different type of Linux kernel that Rightscale ( our web based cloud computing management platform ) uses to deploy millions of cloud-servers onto the web. The creator of the wkhtmltoimage/PDF libraries currently does NOT support Xen ( and probably won’t ). As a result, the static binary files FAIL upon execution every single time. Currently the only solution, ironically, is to use the 32-bit versions of these tools. Hopefully this saves someone out there from spending a couple days trying to debug this issue!
Now back to your regularly scheduled programming.