Archive for January 2010

Client Side Map-Reduce with Javascript

January 29, 2010

I’ve been doing a lot of scalability research lately. The High Scalability website has been fairly valuable to this end. I’ve been thinking of alternate approaches to my application designs, mostly based on services. There was an interesting article about’s architecture that describes a little bit on how they put services together.

I started thinking about an application that I work on and how it would work if every section of the application was talking to each other through a web service or sockets passing JSON or Protocol Buffers rather than the current monolithic design that uses object method calls. I then had the thought that why limit your services to being deployed on a set of static machines. There’s only so much expandability in that, what if we harnessed all of the unused power of the client machines that visit the site.

Anyone who’s done any serious work with ECMAScript (aka Javascript) knows that you can do some pretty powerful things in that language. One of the more interesting features about it is the ability to evaluate plain text in JSON format into Javascript objects using eval(). Now eval() is dangerous because it will run any text given to it as if it were Javascript. However that also makes it powerful.

Imagine needing to do a fairly intensive series of computation on a large data set and you don’t have a cluster to help you. You could have clients connect to a web site that you’ve set up and have their browser download data in JSON format along with executable Javascript code. The basic Javascript library can be such that a work unit will be very generic and contain any set of data and functions to perform on that data along with a finishing AJAX call to upload the results. On the server side when you have a Map-Reduce operation that you need to perform, you can distribute work units that contain both a section of the data along with the code needed to execute on it to any connected clients that have this library and are sending AJAX polling requests asking for work.

A work unit gets placed into a JSON string and sent back as the AJAX reply. The client then evaluates the reply which calls a javascript function that processes the data (which is probably a map job). Once the data is process the javascript automatically makes another AJAX call to send the result data back, most likely with some sort of work unit ID to keep track of anything, and requests the next work unit. The server then just coordinates between the various clients, setting time outs for each work unit so that if someone closes their browser the job can be handed out to an active client or a back-end server.

This will work a lot better on CPU intensive processes than it will on memory intensive ones. For example, sorting a large data set requires some CPU, but a lot more memory because you need to keep each element in memory that you’re sorting. Sending entire large lists to clients to sort and receiving the results would be slow due to bandwidth and latency restraints. However, performing large computations on a smaller series of data such as what’s done with SETI or brute force cryptography circumvention where you can send heartbeats of partial results back, there could be a benefit.

The limits of course will be on how much memory you can allocate in your browser to JavaScript. Also, since this technique would focus on heavy computational functions, the user will probably notice a fairly large hit on browsing speed in non-multithreaded or multiprocessing browsers. Naturally from a non-scientific point of view, most people would be outraged if their computer was being taken advantage of for its computing resources without their knowledge. However for applications working on an internal network or applications that state their intentions up front, users might be interested in leaving a browser open to a grid service to add their power to the system. This would probably be easier if you make each work unit worth a set of points and give them a running score like a game.

People like games.

Breaking a monolithic applicaiton into Services

January 22, 2010

Today I shall be tackling Service Oriented Architecture. I think that that particular buzz phrase has annoyed me a lot in the past. CTOs and middle management talk about how the read in a magazine something about SoA or SaaS (Software As A Service) as being the next big thing and that we should switch all of our stuff to that! Meanwhile there isn’t a very good explanation of how or why to do such a thing. I had written it off mostly as I never saw how independent services could be pulled together into an application. Sure it sounded neat for some other project that wasn’t mine. But no way did that model fit in with the project I’m doing.

Recently however I’ve been tasked with rewriting the section of code that generates reports in our application. The entire application is just a series of Java classes nested together and packaged up as one WAR file that is deployed to a Tomcat server. To scale, we deploy the same war file to another server (ideally everything is stateless and share-nothing so there’s no sessions to worry about). The whole thing is behind a load balancer that uses a round-robin strategy to send requests to various machines. Seems like a pretty good way to scale horizontally I thought.

However the horizontal scaling of this system is fairly coarse grain. If we find that getting hit with a lot of requests to generate reports is slowing down a server, our option is to duplicate the entire application on another server to distribute the load. However that is distributing not only report generation but also every other aspect of the application. So now we have an additional front end, service, and database layer running just to speed up one area. It seems like a bit of a waste of resources.

So instead of scaling a monolithic application, how about we break the whole thing up into ‘services’. Instead of the Report system just being a set of APIs and implementations in the applications source code, we instead make an entire new application that just generates reports as its whole goal. It has a basic interface and API that takes in a request with certain criteria of what type of report you want and in what format and returns that report. It can be a completely stand-alone application that just sits and waits, listening on a port for a request to come in. When one does it processes the request, generates a report, then sends the information back over that port to the client and waits for the next request. Naturally we make it multi-threaded so that we can handle multiple requests at a time, putting a limit on that number and queuing any overflow.

The benefit of this is manyfold. For starters you gain fine-grain horizontal scalability. If you find that one service is too slow or receives a lot of requests you can just deploy that service on additional machines. You only deploy that service however rather than the whole application. The service can be done via RPC, direct sockets, web services, or whatever else you like to listen for your requests. A controller near the front end would just call each individual service to gather the end data up to pass back to the user. Put it behind some sort of load balancer with failover and you have fine-grain horizontal scalability.

Second, you gain faster deployment. Since each service is self-contained they can be deployed independent of the other services. Launching a new version does not have to replace the entire system, just the services that you’re upgrading. In addition, since each service can be running (and should be running) on multiple machines, you can upgrade them piecemeal and perform bucket tests. For instance, if you have a service that is running on 10 machines you can upgrade only 2 of them and monitor user interaction with those services. This way you can have 20% of your users actually utilize your new code, the rest will hit the old code. When you see that everything is going fine you can upgrade the other 8 servers to have the new version of the service.

Because each part of the program is just a small separate program itself, problems become easier to manage and bugs become easier to fix. You can write tests for just the service you’re working on and make sure that it meets its contract obligation. It helps manage the problem with cyclical dependencies and tightly coupled code. It also reinforces the object-oriented strategy of message passing between components.

Further more, try to avoid language specific messages such as Java Serialization for communicating between your services. Utilize a language agnostic format such as JSON or Google’s Protocol Buffers. I would avoid XML because it’s fairly verbose and slower to transmit over a network and parse than those others. The advantage of using a language agnostic format is that each service can be written in a different language depending on the needs of the service. If you have a service that really needs to churn through a lot of data and you need every ounce of speed out of it you can write it in C or even assembly. For those areas where it would be good to do something concurrently you might use Scala, Erlang, or Clojure. The main idea is each part of your program can be written in a language that is best for solving that services problem.

Blog Schedule Modification/Goals

January 15, 2010

I’ve decided to change the schedule in which I update this blog. I’m finding it difficult to make 3 updates a week and still have something relevant to say. To that effect I’m going to try a schedule of Friday updates. This will give me a week worth of experience to draw upon every post and perhaps will lead me to more on-topic discussions.

With that said I still have a post to make, and make it I shall. Today I’m going to ramble again, but this time about my own personal career and personal goals; my Developer Aspirations if you will.

From the time my family acquired our first computer, a 486 SX 25Mhz beast with 4MB ram, 210MB hard drive, 2x CD-ROM, and 2400 baud modem, I was pretty much captivated. When you spend so much time tinkering and using a thing due to pure enjoyment of discovery, it makes sense to work that thing into your career. For anyone who still wonders what profession to get into, I would highly suggest something that you can enjoy and that you spend time doing anyway.

My overall goal had been, and to an extent I would say still remains, to become a video game programmer. However that dream is not the easiest to achieve due to a very competitive employment market along with my base skill set and experience hasn’t reflected that required of the industry. I also am not that big of a fan of 16 hour workdays 7 days a week for 3 month stretches which also plagues that industry. However the independent game industry is actually thriving more than ever now due to the increase availability of embedded devices like the iPhone and the Android OS. Platforms that require by design small, narrow-focused apps that can actually be written by a single person or a small team.

So a renewed focus on embedded devices and high performance applications in restricted resource environments. I have no OpenGL experience so the 3D stuff isn’t very obtainable (I’m not the best artist either). It seems like utility applications or gameplay rather than graphic oriented games are the areas I can still dabble.

So I suppose my current development goals are to become more focused on Objective-C and the cocoa APIs. I may also attempt to leverage my free time in making small C or C++ text-based games as a refresher to build my familiarity with the C-based languages and create something simple.

Ideally if I were able to manage the commitment I would dedicate myself to making a small game every week, just to get some code out. That might be a workable goal I can put on my todo list and see if I can’t make at least one decent attempt before Spring hits us.

I think as far as current career goes I’d like to focus more on research and scalability. There are some interesting problems that don’t seem to have very good solutions. Being able to just identify a problem that needs a solution is a good step to making a useful application or model. I know that our current application does not have a very good data model for scaling beyond a few servers, and it would be good to see a way to create the same application in a distributed way like we have with programs like Git.

Two Databases, One Hibernate

January 13, 2010

Hibernate is a very interesting ORM product that hides the SQL behind mapping your objects to your database. In general it works pretty well at making it easier to deal with database objects. Unfortunately, there seems to be some lack of flexibility. For instance, utilizing two databases for the same dataset in Hibernate.

Here’s my issue: I have a MySQL database that acts as a primary master. There is a slave database that is replicated off of this first database (to make it easier to talk about, I’ll call the master db “db1” and the slave “db2”). Db2 has an up-to-date copy of db1 and is set in read-only mode because we only ever want changes to happen in one location (db1). However, due to database load or scaling we want to modify our application to send all read-only requests to db1 while sending write requests to db2. Since Hibernate abstracts out the database fairly well, I’m not sure how to accomplish this.

In Hibernate you can read an object from the database, make modifications to that object, and Hibernate will seamlessly save those modifications right back into the database. How then do you force Hibernate to read objects from one DB but save them to another? You obviously can’t use the same session, which hibernate is centered around. However attaching a persistent object to a new session of a different database doesn’t seem very feasible either. I suppose there are perhaps some sort of interface or abstract class I can implement or extend to plug in, but there’s no documentation that I’ve found that deals with this issue.

The hassle of losing this flexibility makes me wonder if dealing with an ORM suite is really a benefit. Portability seems to decrease (as much as database agnostic Hibernate’s queries are) because I’m still tied to the one database model. The feature list of the framework talks about “extreme scalability” and that it may be used in a cluster, yet the documentation does not make any mention of this supposed feature. It almost seems like a bullet point that someone placed on the hibernate website as a marketing point that they later expected no one to really look into. Perhaps it can be adopted into a cluster yet out of the ‘box’ functionality seems to be missing or hidden.

I’m not really sure if ‘cluster support’ would really be of any benefit to us anyway. We’re not running what would be called a ‘cluster’. We’re using multiple front-end machines behind a load balancer that handle stateless requests. There is no coupled connection between the application servers. None of them know about any other application server running as there is no clustering software that would do something like share session data. Having a decoupled stateless database system where we can separate reads and writes to different (though content-wise identical) databases does not seem to fit into Hibernate’s current scope.

If anyone has any insight onto this matter I’d really appreciate it. I realize that there’s many aspects of the framework that I have not delved too deeply into and it is very possible that there are interfaces and/or abstract classes that I can implement and/or extend to gain the required functionality I desire. However the pickings will be slow due to lack of decent documentation in this area. I shall instead have to resign myself to trolling internet forums about Hibernate usage in Java, looking for nuggets of useful truth among the squalid remains of ORM flamewars.

iPhone woes

January 12, 2010

This is a late post because I ran out of time yesterday. It’s also not on any of the subjects that I usually cover. Instead my current iPhone troubles led me to thinking about the state of smartphones and some of the general trends that the industry seems to be leaning towards.

My iPhone randomly turned off last night and hasn’t turned back on yet. I’ve tried charging it, letting it drain, all to no effect. It’s still under warranty so I’ll go and have it checked out at the Apple store tonight. However, being without the handy little device does make me appreciate it more. I also realize just how far these pieces of technology have changed my behaviors in the last few years.

The iPhone was the second cell phone I’ve ever had. My first was a Motorola Razr, which had an extremely slow data connection with a very poor web browser. It could make calls and text just fine, and for a while I figured that was the main purpose of a cell phone so it suited me well enough. I had an iPod for music playing (just a color photo one) and a PSP for playing videos on the go. The number of devices has never really been a problem since I always carried my bag around with me.

The iPhone came out and I jumped on a few months after it was released. There was no app store or any apps other than what came with the phone and web apps, however just having the ability to utilize a real web browser in a phone was pretty useful. Maps also came in handy a number of times when trying to find the nearest store or atm when I was out wandering. After native apps came around and the 3GS was released the phone became even more useful. I can use it to play music, watch video (both on youtube and just copied over), play games, run utilities, and yes, even make phone calls and text messages (now with MMS!). Often when I’m home I will not even bother to take out the laptop and turn it on because I can just as easily reach for my phone and use my home wireless to quickly view web pages or check my email.

I did not miss not having an iPhone before I had one, but now that I have one and it’s broken it drives me crazy. I have to modify my behaviors to get information from other sources instead of that small, easy to use device. I feel the same way about my Kindle. The ability to carry around new books, sync bookmarks with my handy little phone, and read any of my library anywhere is something that makes regular books seem just bland in comparison. I still love regular books, but I’m nearing the point now where if I had the option between reading a physical paper copy or the same book on the Kindle, I’d prefer the Kindle. Having dictionary and wikipedia lookup is just too useful, and so is automatic bookmarks so that I don’t have to keep scraps of paper or worry about losing my place if I need to drop the book quickly.

It is rumored that Apple will be announcing a tablet sometime later this month for availability in March or so. I can’t help but ponder and speculate what they might come up with. I’ve always known that modifications of the computer (which in all honesty is what both the iPhone and Kindle are) would be able to replace many aspects of our current lives as far as information consumption. I have to wonder if there will be a device that will replace my Kindle, Laptop, and perhaps my iPhone just like the iPhone replaced my razr, iPod, and PSP. Granted the form factor will be the most difficult aspect of replacing something like the phone, since if it’s too large to fit in your pocket you won’t want to carry it everywhere like you do with the iPhone. If the screen is too small you won’t want to read on it like you do the kindle, or do any serious writing or more involved apps like you can on a laptop.

Still, many companies have tried pushing the tablet idea and customers haven’t been biting. Apple seems to be good at customer fishing however with the most tasty of bait around. Maybe it’s the fruit origins.

The cell phone industry, especially in the area of touch smart phones, is in an incredible boom right now. Everyone is trying to outdo each other with the newest and greatest little gadget that is faster and with better battery life than their competition. Granted in the US we still have the problem of carrier lock-in and lock-downs (I’m looking at you Verizon). However even some of that is shifting as the carriers lose power to the manufacturers like Apple or Google who are beginning to dictate more and more how their phone can be used rather than the other way around. I imagine someday it will be more common to see less contract-tied plans and slightly more expensive phones that have the flexibility to move between networks. It’s already started a little bit, but the carriers are still too interested in 2-year contracts and don’t seem to be ready to give that up just yet.

So even though my iPhone is currently acting as a very expensive, and lightweight, brick, it does make me reflect that the lose of such an interesting piece of technology and the effect it can have on ones day to day life speaks volumes of where we are today and where we are going in the future of technology.

2010 Technologies

January 8, 2010

The new year has come and gone like so much glitter on the sidewalk. Looking at the first big technology event of 2010, CES, I realize that I tend to measure the years by what technology focus is more apparent.

For the past 3 years, ever since the original iPhone was released, there was a very strong focus on smart phones. The first year not as much because other manufacturers were scrambling to put up a competing product, but certainly in 2008 and 2009 everyone was in a rush to create a better knock-off of the iPhone than the original. Google developed Android and many vendors used it along with new buttonless or at least mostly screen phones to capture those users who migrated.

In 2008 the Kindle was pushed out and suddenly eBooks seemed like a viable idea. I know that Sony had a reader for a while and probably a few other companies, but the content wasn’t there and the usability just didn’t feel right. When amazon made a store large enough to rival physical bookstores (if not quite its own physical catalog) and made accessing it a part of the reader itself, that’s when I feel ‘normal’ people too notice. I remember looking at eBook readers back in the late 90’s and wishing that they would become mainstream. I knew that it would happen eventually, I just had to wait. Looking at CES this year, 2010 is most likely going to push eBooks and eBook readers into the mainstream, just like in 2000 the DVD took up adoption.

It seems like roughly 1/3rd of the products being talked about are eBook readers. Some using eInk, some using lower power displays that can do things like color and video (at an expense of battery life and probably readability). Ones with touch screens, ones that are low feature and hopefully low price. The plethora of devices makes me believe that more publishers are going to get on board with releasing their content in digital form. Hopefully they won’t make the same mistake the music industry originally made with high DRM, though they will. In the end they will probably learn just like the MPAA is starting to that releasing non-drm versions will still be bought, won’t effect piracy (people will always pirate), and customers will be happier, we just have to wait until they relearn the lesson of those that came before them.

Also, almost as interesting as the surge of ebook readers is the sudden push for tablet computers. Tablets are not a new idea at all. They’ve been made in various forms for a while now, and I remember that Bill Gates always pushed them as one of his favorite form of computer. Unfortunately the masses weren’t very impressed and no one really bought them. To be fair there weren’t many choices, and the ones that existed weren’t very good. Now with the rumored announcement of a possible Apple tablet, manufacturers are rushing to beat Apple to the punch by releasing their own versions before Apple can once again dominate a market like they did with digital music players and user friendly smart phones.

This is of course great news. The iPhone, Kindle, and non-existant iSlate (or whatever it might be called) has been spurring competition this year for companies to create new and innovated devices at lower prices to fight over the consumer. That is exactly what we need in these industries as the customer can only benefit from those efforts.

The last technology that I’ve noticed are being pushed, but that I don’t think will pick up as mainstream this year, is 3D movies in the home. For the last few years movie theaters have been pushing 3D movies using polarized or shutter glasses. Some of the movies have been pretty gimmicky in that regard (Journey to the Center of the Earth), but some supposedly have enhanced the experience (I’ve been told Avatar was good in 3D). Now there is a push to make 3D TVs and 3D blu-rays that you can wear home glasses to watch. There will most likely be a limited content selection as I think only 2 movies are announced to be released in 3D blu-ray this year. Also the price on those blu-ray players and 3D TVs will probably be prohibitively expensive, so I don’t see this as being a mainstream thing. Maybe in 3-5 years we’ll start seeing some of them pop up in middle-class households, but even then it depends on how well it is received.

The concept of 3D displays at home isn’t really all that new, it’s just been in a different area. Nvidia has had the ability to turn video games into 3D using special monitors and nvidia glasses to do basically the same thing. Again it’s an expensive setup and not many people have installed such a thing. However I think nvidia has an advantage in that currently existing games can be put into 3D without any code changes. The graphics hardware can just convert the 3D images generated by the game into images that can be viewed with the glasses. This increases the content to, well, your current game library assuming you’re willing to spend the money on the monitor and glasses (and that you have an nvidia card).

Overall I think this year will be quite interesting as far as gadgets and technology goes. Microsoft will be releasing Natal for the xbox 360 for home motion capture. Sony will be releasing their wand thing. Ebooks will get a bump in popularity as it becomes more mainstream.

The thing I’m really waiting for however is a la cart TV. Netflix streaming is already showing the potential, and some cable companies have some form of On Demand (that usually isn’t that great). However I know that sometime in the future there will be something like Netflix that covers everything, or at least most, of what’s shown on TV that can be viewed at any time without commercials for a flat monthly fee.

Or at least that’s my dream.


January 6, 2010

I am very thankful for my current manager. I’ve thought about it for a while recently due to managers often having such an effect on the lives of those who work for them. I have only worked for 2 good managers, and I wanted to explain why I thought they were good and what separates a ‘good’ manager from a ‘bad’ manager.

Managers I feel are a necessary evil that have been thrust upon us by the bureaucracy gods who, like the gods of old, are both terrible and powerful. Also like you would find in the olden times there are beings of both light and dark in the realm of managers. Some managers rule through fear and require sacrifices to be made in their honor. Others are more benevolent, helping their subordinates through the natural disasters that the bureaucracy gods would thrust upon them. Working for a benevolent one is one of the greater benefits to some job positions.

A good manager will help a subordinate. They will listen and remove higher level management obstacles. When talking to your manager, if they ask the question “What can I do to make your job easier?” or “What can I do to help you?”, then you might indeed have a good manager. Good ones will often push your ideas through upper management, and sit in meetings to help you do your job so you don’t have to sit in those meetings, wasting your productive time.

A bad manager is quite the opposite. A bad manager will request of their subordinates to fill out forms detailing how they spend their time so that the manager can more easily report to his/her managers. A bad manager will thrust additional work on those they lead, not to make their subordinates lives easier, but to make the managers life easier. A bad manager will often not listen when a subordinate brings up an idea or has a concern, but will instead thrust their own ideas down the throat of those who work for them. A bad manager should never be given a cookie.

If you manage people, you have a responsibility. You should not be in it for glory and the people you are trying to please should not be your manager(s), but instead those who you manage. I’m not saying you should let them slack off or anything like that, but you should be making their job easier and make their work more effective. It might be difficult for you and you might be placed in a position where people give you shit for it, but you should take that shit and not pass it down onto those who work for you but instead shield them from that bureaucratic BS. In the end your team will be more effective and efficient. They will succeed where other teams will fail. By making them look good, you will look good in return.

I am lucky to have a manager that does that. I was lucky to have one in another job once as well. If you find a manager that fits this criteria, promote them to their bosses as much as you can, because you don’t want to lose that shield you’re getting. If you’re a manager who does what I’ve listed on the good side, keep going, you rock. If you find yourself doing stuff I’ve listed on the bad side. No cookie.


January 4, 2010

I’m not talking about the type that is done at the airport, but more the type that lets you see how long or how many times each section of code is called. There is most likely a wide variety of tools for profiling your code in a variety of languages. I’m going to focus on Java in this post using a tool called YourKit, but I assume other tools will give the same basic info.

From time to time, especially when you’re having performance problems, it’s a good idea to run your code through a profiler and gain a readout of how long methods take to run and how many times they are called. A good profiler will break it down into a nice tree view that shows which methods took the longest and which methods inside of those methods took a long time based on the number of invocations. Looking at these numbers you can sometimes find places where your code has a nested loop that isn’t required or is calling an expensive method or function that returns the exact same data multiple times.

A developer is trained to trace code in his/her head while writing it, however sometimes you write a method that uses the functions of an already written piece of code. If you don’t know how efficient that code is, you may place it in a loop without realizing the performance penalty. When testing it locally it might seem to run fine because you’re a single user with very low load. A method that called the DB 10k times might return in seconds and seem to display fine for you, but under load it can slow down in production.

While a load test suite can find that a problem exists in the code, profiling can narrow it down to what’s really causing it. You can profile your code without a commercial or open source profiler by manually adding timestamp information before and after method calls. This isn’t very usable to do for every single call, but if you have an idea of what areas are causing a slow down and you want to narrow it down it can be very helpful.

Recently I ran a profiler on a section of a project I’m working on. I was able to find that to load a fairly simple page that there was a database query being called 22k times. That DB query was pulling in information related to an item I actually used, but was never used itself. While those queries return fairly quickly normally, having multiple users hit the page at the same time will multiply the number of queries. Also, if the latency between the server and the DB increases, we will definitely see a decrease in the responsiveness of the application. If we can eliminate these unneeded DB calls, we can speed up the application for the end users and also increase scalability.

Just because you don’t see an issue when you’re running your quick visual test doesn’t mean that the issue doesn’t exist. Performance is not something that should be tacked on at the last minute, and code needs to be analyzed regularly to see where inefficiencies are introduced.