If it’s a feature, your pull request had better have tests.
That’s our mantra here at Behance. We’ve found that this philosophy is probably one of the most reliable ways to enforce a strong and stable growth in our applications. But what happens when this is taken seriously in a team of ten developers? Twenty? More? Every week our team juggles multiple features, improvements and hot fixes, usually between multiple applications, so new tests are constantly making their way into builds. Our team is also constantly growing so the number of tests that make their way into master on a daily basis keeps on increasing. This became a monster of a problem fast, because our test build times started to increase rapidly and our continuous integration (CI) server became overburdened with queued tests that needed to be run.
Long test times == unreliable CI == sad developers
Our suite of smoke-tests were starting to take as long as an hour to run, serially (for you PHPUnit guys out there, this was our @small suite of tests). Sure, in reality they were integration tests and your “small” suite shouldn’t take that long to run but I wasn’t going to start running from developer to developer asking for test refactors, especially when some of those tests and the features that they were testing were years old.
One solution that I came up with to solve this problem was a parallel suite – I wrote a class that would take a suite of tests and distribute them among processes in a thread pool. At first it was nice; build times on our CI server (Jenkins, affectionately named Huboo) went from an hour to a solid range of 10-20 minutes … but that ended rather quickly. Our team was growing fast and consequently, our code base. With the explosion of tests the build times slowly started to creep up again and so did my levels of sadness. Developers were not getting build results & test metrics in a reasonable amount of time. 10-20 minutes was no longer cutting it.
And that’s just from trying to test our master branch. What made this situation worse was that I introduced a tool into our CI infrastructure that allowed us to automatically test specific branches other than master (which can be a completely separate blog post)…
Huboo & The Marionettes to the rescue! - Best. Band name. Evar.
Luckily, Chris Fortier joined us right around the time that this problem hit us at its worst. He and Ko worked together to get an OpenStack cluster up – a virtual machine manager – and immediately started working on scripts that could build stable environments mirroring production. This led to the utilization of one of the many features that I love about Jenkins: slaves! It was a very long and arduous trial-and-error process to get the environments to a stable place but once Chris & Ko did, it was a simple matter of giving Jenkins SSH access to the machines (The Marionettes).
Having the computational power of a cluster of machines was only solving half of the problem though. Our test suite was still quite large, and at that point probably littered with time consuming tests. The overarching goal was to run as many of our tests possible, not just the @small suite, in a short time frame so that our developers had more insight into what their features were doing – simply having a farm of slaves didn’t quite cut it.
The solution: divide and conquer (again). We started to split up our tests into more suites based on features, major components of the application, components of those components, and so forth.
So, in terms of phpunit:
Basing our Jenkins project on the fact that our one variant is –group, it became obvious that in order to split up work load between our Jenkins slaves we would have to:
- Build out the application on a specified set of branches
- Build out any services on specified branches (or in our case, other applications and services)
- Perform a database schema diff
- Apply our fixture data (images, DB rows, etc)
- Run PHPUnit on a specific group
Matrix projects in Jenkins are best suited for this type of situation. You have a series of build steps that you need to replicate across multiple environments, except a command / step that changes a tiny bit each run. This set of differences (ie: permutations) is called the configuration matrix. In our case the configuration matrix, defined in Jenkins, is a set of machines that we want to restrict the builds to and our list of test suites.
So in our matrix / multi configuration project we have a user-defined-axis for our suites and an axis-label which helps restrict all builds to our cluster of slaves. Jenkins automagically generates permutations of these axiis for us and runs a build for each. Each build is distinguished by what permutation / suite is injected into it.
To inject them as properties into the build we use the “Paramiterized Build” plugin. To aggregate any reports (jUnit, performance, coverage, git blame, etc) we use the “Flexible Publish” plugin (which, by the way, gets more awesome when you get more familiar with other build & publisher plugins).
Test suites are awesome.
The simple solution that we went with was to use the @group doc bloc tag and split out the tests into feature-specific suites. In retrospect this is a very scalable solution when combined with Matrix projects:
- Giving our developers a list of existing suites they can decide for themselves if their unit tests are logically bound to a specific suite or their own feature, all they need to do is let me know when to edit the list of suites.
- There is no harm in throwing in extra suites – if a developer has no idea where his test(s) should go they can just throw it into one of PHPUnits size groups (small, medium, large) – the test still gets run at some point.
- Organizing our collection of tests into suites allows us to perform feature / component specific metric analysis. Since phpunit can generate xUnit test reports & clover coverage reports, jenkins can publish those reports for each axis and we get a deeper understanding of how each component interacts with the environment.
Slaves are awesome (you know what I mean)
You might be reading this and wondering why we use slaves at all. Why not just do what TravisCI does? Spin up a new slave / VM, clone your repo, switch branches, build dependencies out and run the tests? Why not just use TravisCI?
- We’re kind of insane. We have more integration tests than unit tests – our tests actually require the entire application stack to run (yeah, I know)
- Cloning the repo multiple times (limited to the number of executors on your master Jenkins node) each time and running the tests – way too CPU intensive for one machine and obviously not scalable.
- Having slaves with the full application stack on them achieves what TravisCI does and more. You now have a cluster of machines that mirror production that can not only run integration tests but act as application servers for Selenium tests!
- Building on point #3: put your static / dynamically-built machines behind a load balancer. BAM. Full production environment. Run parallel selenium tests on them. Run load tests on them. You’re now swimming in a sea of possibilities!
- Even if you don’t need a full integration environment to run tests against, setting up slaves and dividing up your tests into feature suites gives you a tremendous amount of flexibility – more executors on more machines controlled for you by Jenkins means running more tests in parallel!
Finally, the biggest gain is that you have control over everything. You are not bound to the limitations of a third party service. All of your dev teams, from Backend to Frontend, can help build on the process. That’s probably a key factor in being able to scale out your CI process & infrastructure. Like development, QA / QE has to be a team effort and for us that’s further facilitated by our CI infrastructure because all teams have access to the process at some point in a production-like environment.
…and on a personal note: who wouldn’t want to see all of their tests running, multi-threaded, across multiple machines? Along with Karma, Selenium & screenshot tests? All at the same time?