Posted tagged ‘Build Process’

20 second builds

June 11, 2010

Back in May of 2007, Linus Torvolds did a talk at Google about git. Specifically he tended to reiterate the point on how your workflow changes dramatically when you can perform a merge or get new code from other people in under a second. Merging multiple branches into your own branch without remembering revision numbers and knowing that your history is in tact, being able to view the entire history of a project without an internet connection, and just being able to work with your version control system in sub-second time can change how you work with your code. You become willing to make branches because it’s so cheap and easy. You can try out new things quickly and have an easy way to toss them if it doesn’t work out, or merge back parts if it does. I think that the philosophy of really fast version control and its effects on your productivity are quite real, and very important. I also think that the same mentality can and should be applied to your build process.

An example of how development might work via a revision control system such as git. I just needed a picture. Image provided by wikimedia commons.

Of course the first thing you might think is “There is no way I’m going to get my build down to under a second.” And for anything non-trivial, you’d be right. However, I think having a goal of a build plus unit tests in under 20 seconds is not an unreasonable demand. Even better if it’s under 10 seconds. But projects can get really large and involve thousands of files and a similar set of thousands of unit tests. How can you build all of that in under 20 seconds? The answer is: you don’t. Do I sound contradictory? I’m not, but you might have to change how you think about your project.

The main idea behind getting a build under 20 seconds is that it will allow you to compile what you’re working on and run all of your unit tests so that you can test your changes and see if anything breaks. If your build takes several minutes you’re not going to want to build and run your unit tests until your done. This makes doing Test Driven Development a lot more difficult because if you don’t run your tests constantly, you’re probably not designing around your tests passing. You might write less tests, have a less elegant design, and ultimately produce poorer quality code. But still, a large project with hundreds of thousands of lines of code isn’t going to even compile, much less run all of the unit tests in under 20 seconds, so what do you do?

As with any development problem, you break it down into a smaller and more manageable chunk. Instead of writing a giant application with thousands of different files and unit tests, you write more small libraries and services that hold maybe a few dozen or hundred files and tests that contain decoupled and independent structures. These smaller libraries and services can have their own separate build process as they are all separate small projects in and of themselves. That way you can write just the unit tests relating to those small projects and cut your build times down to under 20 seconds.

Now granted that’s only the build time for one module, but all of them together can still take several minutes. But that’s ok. Most bugs and development that you’re working on will only touch one or maybe two modules at a time, causing you to run the build process for just those modules to fix the problem or add the feature. There’s no need to compile and run unit tests for a lot of other systems that are completely unrelated to what you’re working on. As for if your change will break things in other packages, that’s what integration tests are for.

Besides, if you separate your code into smaller libraries and services, you’ll be providing a natural way of decoupling systems and making them no longer become dependent on each other. If you write good strong unit tests that mock any external entities that are called or would call your service or library, those tests enforce the contract that your other systems will require. This will ensure that changes you make will not break other systems. If the changes you make do break the systems, you just create or modify your unit tests to fail due to this breakage so that you know why your system should output what it does or call who it does.

By driving down your build times by having sections be in separate projects, you will have faster turnaround and feedback on the work that you are doing right now. You can run a build right after changing a single line of code just to make sure, and 20 seconds isn’t too long of a wait to get a response. It’s almost like a small break, but no quite long enough that you start trolling slashdot and forget you’re building something.

I feel this post needs another picture, so here's my cat Mischief playing with some rope.

Dynamic Continuous Integration

October 9, 2009
I have an idea for continuous integration that is a bit beyond what I’m aware of as far as the capabilities of yhudson are concerned.
We work as a team on a project. This project is stored in SVN (like all good projects should be) and is continuously checked out by yhudson to be built on a CI server. However, yhudson is set up to just check out the latest version of trunk on a commit, checking every 5 minutes. The problem is that most developers do their work on development branches, which don’t take part in the CI cycle. These development changes do not gain the benefit of continuous integration until the changes are merged into trunk which results in a lot of loss in developer potential.
In our development system, people create branches for each bug or feature set that they are working on. Now if a bug is trivially small or easy to see and fix, it might not warrant its own branch and will be just fixed on trunk. But for larger changes, a branch is cut so that development that one person is performing won’t adversely effect other developers. It also allows us to postpone features from our monthly release cycle. If you’re working on a feature that is only halfway complete by the time our release branch is cut and code freeze begins, you can just not merge your change into trunk and you won’t dirty up the release branch with half-finished code. Once you finish your change, you merge into trunk and solve any conflicts that might have come up. It’s not a perfect process, but I don’t know if anyone’s will classify (git excluded).
Because of these separate branches, developers are not able to harness the utility of yhudson and our CI environments. yHudson is set up with a specific location in SVN in mind when you create a project, usually trunk. If you want to build your branch, you have to set up a new project and go through all of the configuration of your trunk project but with the new location. There’s no real way to just duplicate a project so you can change a setting, and even if there were the manual steps involved would be enough that most developers won’t bother. There should be a better way.
I think that there should be a different style of project able to be used in yHudson. The project should define a standard build like we did with trunk on our project, but also allow it to monitor all of the folders under branches and treat each one as their own project. Whenever a change is made to one of those branches, a build will be performed. If a new branch is created, a new project should dynamically be created for that branch. When the build is made, it should be deployed according to the Staging instructions of the original project, but to a dynamically generated VM or from a pool of VMs with modifiable settings for things like ports and server names. This will allow us to do continuous deployment on each of our development branches so that if a QA tester finds a problem with a new feature a developer is working on in a branch, the developer can quickly provide a fix and have it automatically go through a full set of continuous integration tests and be deployed to the staging environment for the tester to try out again.
This fast turn around time between bug reporting, bug fixing, and fix deployment could greatly enhance productivity and let us try out different solutions, faster. With a fast feedback loop, we can work directly with the QA department or our customers to solve the issue they’re having and let them try it out right away in such a fashion that doesn’t effect anyone else’s bugs or problems that they are working on.
In essence, dynamic continuous integration would broaden the usage of our continuous integration servers to cover all development, even on separate branches, without the need to manually create (or forget to create) projects for them. I think that if we worked on creating such a functionality in yHudson, along with some helper tools for utilizing large pools of servers for fast staging environment, we can create stronger bonds between developers and testers, along with developers and their customers by having a shorter feedback loop and faster turnaround of problems.

I have an idea for continuous integration that is a bit beyond what I’m aware of as far as the capabilities of hudson are concerned.

We work as a team on a project. This project is stored in SVN (like all good projects should be) and is continuously checked out by yhudson to be built on a CI server. However, hudson is set up to just check out the latest version of trunk on a commit, checking every 5 minutes. The problem is that most developers do their work on development branches, which don’t take part in the CI cycle. These development changes do not gain the benefit of continuous integration until the changes are merged into trunk which results in a lot of loss in developer potential.

In our development system, people create branches for each bug or feature set that they are working on. Now if a bug is trivially small or easy to see and fix, it might not warrant its own branch and will be just fixed on trunk. But for larger changes, a branch is cut so that development that one person is performing won’t adversely effect other developers. It also allows us to postpone features from our monthly release cycle. If you’re working on a feature that is only halfway complete by the time our release branch is cut and code freeze begins, you can just not merge your change into trunk and you won’t dirty up the release branch with half-finished code. Once you finish your change, you merge into trunk and solve any conflicts that might have come up. It’s not a perfect process, but I don’t know if anyone’s will classify (git excluded).

Because of these separate branches, developers are not able to harness the utility of hudson and our CI environments. Hudson is set up with a specific location in SVN in mind when you create a project, usually trunk. If you want to build your branch, you have to set up a new project and go through all of the configuration of your trunk project but with the new location. There’s no real way to just duplicate a project so you can change a setting, and even if there were the manual steps involved would be enough that most developers won’t bother. There should be a better way.

I think that there should be a different style of project able to be used in Hudson. The project should define a standard build like we did with trunk on our project, but also allow it to monitor all of the folders under branches and treat each one as their own project. Whenever a change is made to one of those branches, a build will be performed. If a new branch is created, a new project should dynamically be created for that branch. When the build is made, it should be deployed according to the Staging instructions of the original project, but to a dynamically generated VM or from a pool of VMs with modifiable settings for things like ports and server names. This will allow us to do continuous deployment on each of our development branches so that if a QA tester finds a problem with a new feature a developer is working on in a branch, the developer can quickly provide a fix and have it automatically go through a full set of continuous integration tests and be deployed to the staging environment for the tester to try out again.

This fast turn around time between bug reporting, bug fixing, and fix deployment could greatly enhance productivity and let us try out different solutions, faster. With a fast feedback loop, we can work directly with the QA department or our customers to solve the issue they’re having and let them try it out right away in such a fashion that doesn’t effect anyone else’s bugs or problems that they are working on.

In essence, dynamic continuous integration would broaden the usage of our continuous integration servers to cover all development, even on separate branches, without the need to manually create (or forget to create) projects for them. I think that if we worked on creating such a functionality in Hudson, along with some helper tools for utilizing large pools of servers for fast staging environment, we can create stronger bonds between developers and testers, along with developers and their customers by having a shorter feedback loop and faster turnaround of problems.