Archive for the ‘Uncategorized’ category

New Domain

September 14, 2010

I’ve moved my blog to my own domain name. All of the old posts have been ported over. This one will stay around for historical reasons, but new posts will be made at http://colintmiller.com/. I may eventually change it to blog.colintmiller.com or colintmiller.com/blog, but those changes will be noted on the new site if it happens.

This has all been made possible by Amazon Web Services and EC2 which I talk about in my most recent post: http://wp.me/p14AtB-34

Advertisements

An Idea for Tracer Bullets

August 6, 2010

My favorite software development related book, The Pragmatic Programmer, has a section in it where it mentions tracer bullets. The tracer bullet system they describe really reminds me of the scaffolding system that Rails made popular. It’s sort of a loose structure that gives you something to play with but that really needs to be fleshed out for your finished application. The idea behind using a tracer bullet, or scaffolding, is to have that basic structure to build with and help avoid the ‘blank page’ problem (writers and programmers both know that a blank page is the most difficult thing to write from). So scaffolding is useful, and what Dave and Andy remark as tracer bullets I’m going to call scaffolding for this piece. I would like to introduce a different idea for what tracer bullets are, and how they can help you focus on good Test Driven Development (TDD) and battle the problem of forgetfulness due to complexity.

Book cover for The Pragmatic Programmer

This is an amazing book. My top choice for any developer, especially ones just out of college or with limited 'real world' experience. Buy it, it's worth the price.

If you’ve ever worked on a new system or a larger refactoring of an old system, you’ll have noticed that you get into a sort of groove with what you’re doing. You look at the big picture and you start making your modifications. Something comes up though where you change a method signature and you have to adjust all of the places that use it, which maybe makes you rethink some basic designs and causes another smaller refactor or something during your larger one. Eventually when that’s all done you go back to what you were originally doing. You’re testing all the way which is good, but eventually you get to a point where you stop and say “I’m done”. Or at least that’s the idea. Often it comes out  more like “Am I done?” and you wrack your brain trying to think if you remembered to put everything into place or are you leaving out some small bit that’s going to cause a bug later on. If this has never happened to you and you don’t understand what I’m talking about, you may stop reading.

The main problem seems to be that you have this grand idea at the beginning, and you know pretty much all of the parts that you’re going to have to modify to make things work, you just loose track of what you’ve done and what you haven’t done. Now you can make a paper ToDo list at the start and check everything off, and that’s a viable method. However I would submit that you should utilize your best programming tool right from the start to keep track of your progress: your tests.

When you first collect your refactoring (or basic design) idea, you should know many of the places that you’ll need to modify in order for it to function. Create unit tests for each of these major ideas. These tests should fail with an error message relating to what the idea is and a note that it isn’t actually implemented yet. As you start to refactor, replace these tests with the actual tests for that piece of functionality. You won’t be fully removing the tests (just by having those tracer bullet tests to start with means there’s something that needs to change and those changes need to be tested), but you’ll be adding real tests for your changes to them. When all of your tracer tests are replaced, you should know that you’ve covered everything that your design required.

That doesn’t mean additional testing shouldn’t be done, there are often times areas that need to be modified or added that you did not originally envision. However, it should be a good start to tracking down those areas that need work. It should also prevent you from forgetting any of those key areas that you thought about originally but lost track of during the hectic course of development.

Tracer Bullets in this sense are really just empty failed tests that provide guidance as to what you need to work on. They’re not real tests, but they’re placeholders for where tests should be in the future. They act as a guide to help you figure out what you should be working on next. In addition, since you need to modify those failed tests for your build anyway, it should encourage you to work in a more test driven way by writing your tests first. Since you’ll look at your list of failed tests as your “ToDo” list, it will encourage your first action being to change those tests to something useful rather than a paper ToDo list which encourages you to jump into writing the application code before your tests.

Storm Trooper Helmet

As additional encouragement to practice Test Driven Development, here's the coercive face of failure.

Anything that keeps us more organized and removes unnecessary burdens to our mental flows is a helpful device for a programmer. Being able to utilize any technique that will increase your likelihood to properly test your application code is also a great benefit. This is just an idea I’ve been pondering for some weeks and I’m sure it’s not the only one for combining your design and testing in an up front mannor. If anyone else has any suggestions or practices that they use, please feel free to let me and others know what they are.

2010 Technologies

January 8, 2010

The new year has come and gone like so much glitter on the sidewalk. Looking at the first big technology event of 2010, CES, I realize that I tend to measure the years by what technology focus is more apparent.

For the past 3 years, ever since the original iPhone was released, there was a very strong focus on smart phones. The first year not as much because other manufacturers were scrambling to put up a competing product, but certainly in 2008 and 2009 everyone was in a rush to create a better knock-off of the iPhone than the original. Google developed Android and many vendors used it along with new buttonless or at least mostly screen phones to capture those users who migrated.

In 2008 the Kindle was pushed out and suddenly eBooks seemed like a viable idea. I know that Sony had a reader for a while and probably a few other companies, but the content wasn’t there and the usability just didn’t feel right. When amazon made a store large enough to rival physical bookstores (if not quite its own physical catalog) and made accessing it a part of the reader itself, that’s when I feel ‘normal’ people too notice. I remember looking at eBook readers back in the late 90’s and wishing that they would become mainstream. I knew that it would happen eventually, I just had to wait. Looking at CES this year, 2010 is most likely going to push eBooks and eBook readers into the mainstream, just like in 2000 the DVD took up adoption.

It seems like roughly 1/3rd of the products being talked about are eBook readers. Some using eInk, some using lower power displays that can do things like color and video (at an expense of battery life and probably readability). Ones with touch screens, ones that are low feature and hopefully low price. The plethora of devices makes me believe that more publishers are going to get on board with releasing their content in digital form. Hopefully they won’t make the same mistake the music industry originally made with high DRM, though they will. In the end they will probably learn just like the MPAA is starting to that releasing non-drm versions will still be bought, won’t effect piracy (people will always pirate), and customers will be happier, we just have to wait until they relearn the lesson of those that came before them.

Also, almost as interesting as the surge of ebook readers is the sudden push for tablet computers. Tablets are not a new idea at all. They’ve been made in various forms for a while now, and I remember that Bill Gates always pushed them as one of his favorite form of computer. Unfortunately the masses weren’t very impressed and no one really bought them. To be fair there weren’t many choices, and the ones that existed weren’t very good. Now with the rumored announcement of a possible Apple tablet, manufacturers are rushing to beat Apple to the punch by releasing their own versions before Apple can once again dominate a market like they did with digital music players and user friendly smart phones.

This is of course great news. The iPhone, Kindle, and non-existant iSlate (or whatever it might be called) has been spurring competition this year for companies to create new and innovated devices at lower prices to fight over the consumer. That is exactly what we need in these industries as the customer can only benefit from those efforts.

The last technology that I’ve noticed are being pushed, but that I don’t think will pick up as mainstream this year, is 3D movies in the home. For the last few years movie theaters have been pushing 3D movies using polarized or shutter glasses. Some of the movies have been pretty gimmicky in that regard (Journey to the Center of the Earth), but some supposedly have enhanced the experience (I’ve been told Avatar was good in 3D). Now there is a push to make 3D TVs and 3D blu-rays that you can wear home glasses to watch. There will most likely be a limited content selection as I think only 2 movies are announced to be released in 3D blu-ray this year. Also the price on those blu-ray players and 3D TVs will probably be prohibitively expensive, so I don’t see this as being a mainstream thing. Maybe in 3-5 years we’ll start seeing some of them pop up in middle-class households, but even then it depends on how well it is received.

The concept of 3D displays at home isn’t really all that new, it’s just been in a different area. Nvidia has had the ability to turn video games into 3D using special monitors and nvidia glasses to do basically the same thing. Again it’s an expensive setup and not many people have installed such a thing. However I think nvidia has an advantage in that currently existing games can be put into 3D without any code changes. The graphics hardware can just convert the 3D images generated by the game into images that can be viewed with the glasses. This increases the content to, well, your current game library assuming you’re willing to spend the money on the monitor and glasses (and that you have an nvidia card).

Overall I think this year will be quite interesting as far as gadgets and technology goes. Microsoft will be releasing Natal for the xbox 360 for home motion capture. Sony will be releasing their wand thing. Ebooks will get a bump in popularity as it becomes more mainstream.

The thing I’m really waiting for however is a la cart TV. Netflix streaming is already showing the potential, and some cable companies have some form of On Demand (that usually isn’t that great). However I know that sometime in the future there will be something like Netflix that covers everything, or at least most, of what’s shown on TV that can be viewed at any time without commercials for a flat monthly fee.

Or at least that’s my dream.

The Game of Go

December 7, 2009

This post actually has very little to do with programming, but is about a topic that some programmers might find interesting anyway. As the title implies, it is about the game of Go; which is also my favorite board game.

As a quick recap for those who don’t know, Go is an ancient chinese game developed around 4,000 years ago and played mostly by chinese, korean, and japanese people. However there is a growing player base in the US along with strong players in Germany and other areas of Europe. Go is the oldest board game still played in its original form, which is interesting because the rules are still being tweaked in official play. The game is played on a wooden board with 19 horizontal and 19 vertical lines called a goban. Two players place white and black playing pieces called stones on the intersections of the grid in an attempt to surround empty space and/or opponent pieces. Each player is attempting to control more of the board than the other player.

The rules are actually really simple. Players take alternate turns, starting with black, placing stones on the board. Stones of the same color that touch each other are considered a unit and treated as a single stone for the purpose of capture; to capture one you must capture them all. To capture a stone you must surround it with your own stones so that the unit no longer has any empty spaces next to it. Stones cannot be moved after they are placed unless they are captured, in which case they are removed from the board. You cannot place a stone on the board if placing it would result in immediate capture. This means you can’t place a piece on an intersection that has no free space next to it, unless that piece would capture a neighboring piece of the opponent and therefor free up some space. At the end of the game each player counts up the number of empty spaces their stones are surrounding and subtract from that the number of stones of theirs that were captured (Japanese rules anyway… there are variations). Whoever has the most points wins.

Now there’s a few extra parts to this. Since black goes first, black has first choice at placing a piece on the board. This gives black an advantage that is worth a few points. To compensate for this, white is given a ‘komi’ at the beginning of the game, currently often 6.5 points. This means that white starts with 6.5 points automatically for having gone second. The half point is there to prevent ties. There’s also an additional rule that the board cannot repeat itself. Certain configurations will result in what is called ‘ko’. From Sensei’s Library: “[Ko] describes a situation where two alternating single stone captures would repeat the original board position. Without a rule preventing such repetition, the players could repeat those two moves indefinitely, preventing the game from ending. To prevent this, the ko rule was introduced.”

The game is fairly easy to learn. Children as young as 2 can pick up the basic concepts and some strategies for the game. Wikipedia has one of my favorite descriptions about Go as it relates to mathematics: “In combinatorial game theory terms, Go is a zero sum, perfect information, partisan, deterministic strategy game, putting it in the same class as chess, checkers (draughts), and Reversi (Othello); however it differs from these in its game play. Although the rules are simple, the practical strategy is extremely complex.” Quite a bit more complex than chess I would say, while easier to learn. A game like that is something I find intriguing.

While Go is more popular in Asia, there is some support for it in the US as well. The American Go Association has a ranking system, official rulesets, holds tournaments, and tracks clubs where you can go to find players to learn or play the game. A friend of mine hosts a Promote Go website that has club listings in an easier to search form.

There are many websites and books to learn Go. A company called Slate and Shell publishes what is probably the largest set of English books on the topic. You can also play games online for free if you cannot get in contact with any players in your area. A great place to start would be KGS, which has a simple Java client that works on all platforms along with a server that hosts many games for free. There are other programs that use the Internet Go Server (IGS) protocol and can connect to a variety of servers, the most popular of which is pandanet. There are also computer opponents (stay away from them for the most part) along with computer learning games and versions for other platforms such as the xbox 360 and most mobile phones.

I encourage everyone to give the game a try as it can be one of the most mentally rewarding board games you’ll ever experience. If you’re ever interested in picking up a game with me, leave a comment or email me at jearil at gmail dot com.

Quick History of OSes

November 4, 2009

In an effort to write more on this blog, I’m posting a note originally meant for a friend who was very confused when I started talking about Linux. This is a lot less technical, and general knowledge for quite a few people. Feel free to send corrections and I’ll update this post.

There are a lot of people who don’t know about the differences in Operating Systems and that choices even exist, and I like to spread that information. I’ve been a Linux advocate and user since around 1997 and I like to give information on the OS and OSes in general when given the chance.

Back in the day (1969), a few guys at Bell Labs, part of AT&T, developed a multi-tasking, networked, multi-user operating system. Basically it means that several people could each run several tasks all on the same computer, and they could do so from other computers (the networked part) and could talk between computers. It was the most popular platform for business and scientific work due to the way that programs could interact with each other and how computers could talk to each other over a network. It was what the original Internet was built upon. It was expensive though and ran mostly on large mainframe computers built by IBM and DEC that ran for hundreds of thousands to millions of dollars, things that normal people couldn’t afford.

Meanwhile, Xerox starts working on some experimental things at Xerox Parc (Palo Alto Research Center) relating to user interactions with computers. They developed things like mice and a Graphic User Interface (GUI). Before that people just typed commands into a text screen with a keyboard. Apple was invited down to Xerox to have a look at some of their experiments and later sort of “borrowed” them and put them into the original Macintosh. The Mac was one of the first Personal Computer (PC) to be bought by the regular joe user for home use. It brought about the use of the mouse and a GUI instead of a command line interface, and people thought it was cool (the original advertisement video, which was only played twice, originally aired during the SuperBowl ( http://www.youtube.com/watch?v=OYecfV3ubP8 ). It was released in 1984 and it had a sort of vibe from the original Orwell book.

IBM wanted to compete with the Mac but they didn’t want to really spend all that much money so they decided to just build a computer by buying off the shelf parts from a lot of different places and putting them together. They did that not only with hardware but with software as well. Bill Gates was trying to sell IBM on BASIC which he and Paul Allen developed while in college. IBM liked it but also asked if they had an OS to sell. Bill said sure, and then proceeded to buy an OS from some company in Seattle for $50k which they probably kicked themselves for in retrospect. They sold the license to IBM but retrained the rights to resell it, which started MS into becoming a multi-billion dollar company.

Later they shoehorned a GUI on top of DOS with Windows 1 through 3.11. The rest of the MS story is pretty well known. Both MS and Apple charged quite a bit for their OS, but less than UNIX. People who went to college and worked on UNIX machines were disappointed that after they left they couldn’t have unix on their own machines because it was stupid expensive.

Around this time (mid 80’s, early 90’s) there was a group of people led by Richard Stallman who was pushing for Free Software. He developed the Free Software Foundation and pushed the agenda that people should have access to Free software. The ‘Free’ that he talked about wasn’t so much ‘free’ as in beer, but more ‘Free’ as in freedom. He wanted software that you could modify yourself if you choose and that anyone could improve upon. He started something called the “Open Source” movement where source code written was open for anyone to change. He doesn’t like copyright law very much. Unfortunately he also didn’t want companies to take all this free software, make changes, and sell it back to people without releasing the source code. He made a license called the GNU General Public license that forces people who make changes to the code and distribute it to also distribute the source code. He also very much wanted to create a full open source operating system that worked on unix. He called it GNU (Gnu’s Not Unix) and created a lot of utilities and programs that form the basis of an OS.

He never made a kernel (the program that’s at the heard of any OS, basically the software that talks to the hardware for other software). Fortunately some Finnish student by the name of Linus Torvolds developed a kernel in 1993 that ran on the generic x86 processor which was found in the 386, 486, and later Pentium and other Intel derivative processors that were part of IBM compatible PCs. He released the kernel as open source and named it Linux because he’s a bit of a narcissist. The GNU tools were added in and it became a full operating system called GNU Linux, or more often just Linux.

The interesting thing about Linux compared to Windows or even Mac OS was that it was free, both as in Freedom and beer. You could make changes to any of the programs as long as you released the results. This fostered a large community of hobby and volunteer programmers who added constantly to the OS. MS has thousands of developers for creating and maintaining Windows, Linux has millions, though granted part time and without as much financial incentive.

It became very popular in the hacker community in the early days, and businesses such as IBM, Google, Yahoo, Amazon, Oracle, and others started to use it in replacement of expensive UNIX machines. Eventually, around the year 2000, Apple completely redesigned the Mac OS based on NextStep and BSD (which was a free UNIX). OS X replaced Mac OS 9 and was pretty incompatible with previous versions. However it was a full and recognized UNIX system that actually used some of the same base commands and infrastructure found in Linux.

OS X and Linux are actually fairly similar (you can run a large majority of Linux apps on OS X just by recompiling). OS X came with a closed UI though and was much more polished as a user experience. Apple still contributes parts of its software that it updates back to the open source community however and Linux has benefitted from that.

The normal arguments against Linux have been that it’s difficult for a new inexperienced user to get working. That had been true for a long while, but eventually it has become as easy to use as Windows or OS X (some would argue its easier than Windows). Ubuntu is a distribution of Linux that is especially popular and noted as having a good user experience. You can download and try it out at http://www.ubuntu.com/.

Linux works on a very wide range of hardware, though not everything is supported (but the list of unsupported is always getting smaller). While most games and quite a bit of business software is designed to run under Windows and not Linux, there are common (free) replacements for them. There are technologies such as Wine (Wine Is Not an Emulator) that allow you to run windows software directly in Linux; some versions even let you play games. For typical home user’s daily tasks such as Web Browsing (Firefox), email (Thunderbird or Evolution), IM (Pidgin), Photo Collections (A few of these, I forget the names), and Music Collections (again a few of these, I forget the names) there are native apps (listed in parentheses) that replace the windows counterparts. Some of those such as Firefox, Thunderbird, and Pidgin were ported from Linux to Windows.

Linux is a free OS for those who are light gamers and don’t want to pay for an expensive OS, or who want the freedom to heavily modify the OS. The modifications don’t always require changing source code and recompiling, a lot of aspects of the OS are just more configurable and there is a wider range of free quality software available. It is also more resistant to viruses, which is one reason that many people turn to Linux and away from Windows. For those who want to just try it out you can download an Ubuntu CD and boot off of it into a sort of trial version, or you can download a VM such as virtualbox (http://www.virtualbox.org/) and install it in there without messing up your current OS.

Data Scalability – The Write Problem

October 19, 2009

Many serious applications on the web need to scale to some extent. For smaller projects that you make for yourself or for a small audience, a single server box hosting both the web server and the database can be enough. For larger audiences and application demands, often you’ll separate the web server from the database into different boxes. This will allow a small bit of scaling as you can utilize one machine for your processing logic and another for your data store. But what about going further than that?

So you find that your front end machine can no longer handle all of the traffic arriving. Typically you’ll start setting up your application on multiple front end machines and hide them behind a load balancer so that your traffic is split between different machines. Now this can add in some new problems depending on how you architected your application (session data can become tricky), but overall it can help release load. In fact if you engineered correctly so as to not utilize session data to a great extent, or if you use a smart load balancer that works on an IP address hashing algorithm to make sure that requests from one client always go to the same server, then you can expand horizontally in a linear fashion. Excellent! But what about your data?

Well most people store their data in some sort of RDBMS (Relational DataBase Management System), basically a database. On the free side there are tools like MySQL, Postgresql, sqlite, berkely db, and probably a slew of others. On the not-free side there are a lot of options, but most people end up talking about Oracle. I’m not going to talk about Oracle because it’s a very large, expensive, difficult to manage piece of software that does a lot, but requires a lot of support. I also won’t talk about MSSQL because no one should host anything on a Windows machine (personal bias). MySQL is what I’m most familiar with and probably the most popular of the free databases so it is where I will focus.

If you have all of these front end machines set up with load distributed between them you’re going to run into a problem where they are all trying to access the same database and they get bogged down because your DB server has too high of a load. What can you do? Well you can set up replication and make slaves is what! Each slave reads commands that were performed on the master and performs them locally so as to keep data in sync. Basically a slave receives all of the same write commands from the master that the master received so that it can have an up to date¹ copy of your data. From then on you can treat your slave as read-only and send read requests to the slave instead of the master, thereby freeing up processing on your master for just writes (and critical reads, but I won’t get into that).

You can even add lots of slaves to scale reads horizontally in a linear fashion, which is exactly what we want. But what about writes? While one of your many front end boxes can choose one of your many slave database boxes to read data from, when it comes time to insert or update any data, those requests all have to go to the master database. In a write heavy application that can become a bottleneck. Unfortunately you cannot scale writes like you can with reads. You could create another DB machine with Master-Master replication (each machine is a slave to the other), but unfortunately it doesn’t help your write problem at all because each master has to write not only its own requests but also replicate from the other thereby nulling any effect you might have wanted to gain.

The only real solution I’ve ever heard about is sharding, which isn’t a very fun solution at all. Basically you figure out some method for separating the data, say by user_id. So all user data with an ID between a certain range gets sent to one master box and all of the data for IDs in another range go to another box. Unfortunately because your data is now in two separate database systems, you can’t run SQL joins on them (Oracle RAC can I believe, but again it’s expensive and hard to maintain for the regular developer). Instead you’ll have to make your queries separately and massage the data in code, separating your model into multiple places that is error prone.

I start to wonder then, if we need to use a database at all. Is there another data storage engine that would work better? I think about projects like memcached, and I wonder if one could make a similar system but with more integrity. There are some efforts to do so with hadoop, but they’re not quite ready yet and I’m not sure about the real-time performance of anything involved with hadoop just yet.

A lot of people won’t run into the write problem with scalability. But if you do it can be a real difficulty to solve in an elegant fashion. I haven’t done so, but I think it’s one of the next areas of research I shall be looking into.

¹ Technically the data isn’t guaranteed to be up to date as the replication process has some lag to it. It can take a few seconds or maybe minutes depending on the data and load for replication to catch up and the data to be mirrored onto the slave. That is why any critical reads that need to know exactly what’s on the master without doubt must be made to the master server.