Posted tagged ‘Performance’

Client Side Map-Reduce with Javascript

January 29, 2010

I’ve been doing a lot of scalability research lately. The High Scalability website has been fairly valuable to this end. I’ve been thinking of alternate approaches to my application designs, mostly based on services. There was an interesting article about Amazon.com’s architecture that describes a little bit on how they put services together.

I started thinking about an application that I work on and how it would work if every section of the application was talking to each other through a web service or sockets passing JSON or Protocol Buffers rather than the current monolithic design that uses object method calls. I then had the thought that why limit your services to being deployed on a set of static machines. There’s only so much expandability in that, what if we harnessed all of the unused power of the client machines that visit the site.

Anyone who’s done any serious work with ECMAScript (aka Javascript) knows that you can do some pretty powerful things in that language. One of the more interesting features about it is the ability to evaluate plain text in JSON format into Javascript objects using eval(). Now eval() is dangerous because it will run any text given to it as if it were Javascript. However that also makes it powerful.

Imagine needing to do a fairly intensive series of computation on a large data set and you don’t have a cluster to help you. You could have clients connect to a web site that you’ve set up and have their browser download data in JSON format along with executable Javascript code. The basic Javascript library can be such that a work unit will be very generic and contain any set of data and functions to perform on that data along with a finishing AJAX call to upload the results. On the server side when you have a Map-Reduce operation that you need to perform, you can distribute work units that contain both a section of the data along with the code needed to execute on it to any connected clients that have this library and are sending AJAX polling requests asking for work.

A work unit gets placed into a JSON string and sent back as the AJAX reply. The client then evaluates the reply which calls a javascript function that processes the data (which is probably a map job). Once the data is process the javascript automatically makes another AJAX call to send the result data back, most likely with some sort of work unit ID to keep track of anything, and requests the next work unit. The server then just coordinates between the various clients, setting time outs for each work unit so that if someone closes their browser the job can be handed out to an active client or a back-end server.

This will work a lot better on CPU intensive processes than it will on memory intensive ones. For example, sorting a large data set requires some CPU, but a lot more memory because you need to keep each element in memory that you’re sorting. Sending entire large lists to clients to sort and receiving the results would be slow due to bandwidth and latency restraints. However, performing large computations on a smaller series of data such as what’s done with SETI or brute force cryptography circumvention where you can send heartbeats of partial results back, there could be a benefit.

The limits of course will be on how much memory you can allocate in your browser to JavaScript. Also, since this technique would focus on heavy computational functions, the user will probably notice a fairly large hit on browsing speed in non-multithreaded or multiprocessing browsers. Naturally from a non-scientific point of view, most people would be outraged if their computer was being taken advantage of for its computing resources without their knowledge. However for applications working on an internal network or applications that state their intentions up front, users might be interested in leaving a browser open to a grid service to add their power to the system. This would probably be easier if you make each work unit worth a set of points and give them a running score like a game.

People like games.

Breaking a monolithic applicaiton into Services

January 22, 2010

Today I shall be tackling Service Oriented Architecture. I think that that particular buzz phrase has annoyed me a lot in the past. CTOs and middle management talk about how the read in a magazine something about SoA or SaaS (Software As A Service) as being the next big thing and that we should switch all of our stuff to that! Meanwhile there isn’t a very good explanation of how or why to do such a thing. I had written it off mostly as I never saw how independent services could be pulled together into an application. Sure it sounded neat for some other project that wasn’t mine. But no way did that model fit in with the project I’m doing.

Recently however I’ve been tasked with rewriting the section of code that generates reports in our application. The entire application is just a series of Java classes nested together and packaged up as one WAR file that is deployed to a Tomcat server. To scale, we deploy the same war file to another server (ideally everything is stateless and share-nothing so there’s no sessions to worry about). The whole thing is behind a load balancer that uses a round-robin strategy to send requests to various machines. Seems like a pretty good way to scale horizontally I thought.

However the horizontal scaling of this system is fairly coarse grain. If we find that getting hit with a lot of requests to generate reports is slowing down a server, our option is to duplicate the entire application on another server to distribute the load. However that is distributing not only report generation but also every other aspect of the application. So now we have an additional front end, service, and database layer running just to speed up one area. It seems like a bit of a waste of resources.

So instead of scaling a monolithic application, how about we break the whole thing up into ‘services’. Instead of the Report system just being a set of APIs and implementations in the applications source code, we instead make an entire new application that just generates reports as its whole goal. It has a basic interface and API that takes in a request with certain criteria of what type of report you want and in what format and returns that report. It can be a completely stand-alone application that just sits and waits, listening on a port for a request to come in. When one does it processes the request, generates a report, then sends the information back over that port to the client and waits for the next request. Naturally we make it multi-threaded so that we can handle multiple requests at a time, putting a limit on that number and queuing any overflow.

The benefit of this is manyfold. For starters you gain fine-grain horizontal scalability. If you find that one service is too slow or receives a lot of requests you can just deploy that service on additional machines. You only deploy that service however rather than the whole application. The service can be done via RPC, direct sockets, web services, or whatever else you like to listen for your requests. A controller near the front end would just call each individual service to gather the end data up to pass back to the user. Put it behind some sort of load balancer with failover and you have fine-grain horizontal scalability.

Second, you gain faster deployment. Since each service is self-contained they can be deployed independent of the other services. Launching a new version does not have to replace the entire system, just the services that you’re upgrading. In addition, since each service can be running (and should be running) on multiple machines, you can upgrade them piecemeal and perform bucket tests. For instance, if you have a service that is running on 10 machines you can upgrade only 2 of them and monitor user interaction with those services. This way you can have 20% of your users actually utilize your new code, the rest will hit the old code. When you see that everything is going fine you can upgrade the other 8 servers to have the new version of the service.

Because each part of the program is just a small separate program itself, problems become easier to manage and bugs become easier to fix. You can write tests for just the service you’re working on and make sure that it meets its contract obligation. It helps manage the problem with cyclical dependencies and tightly coupled code. It also reinforces the object-oriented strategy of message passing between components.

Further more, try to avoid language specific messages such as Java Serialization for communicating between your services. Utilize a language agnostic format such as JSON or Google’s Protocol Buffers. I would avoid XML because it’s fairly verbose and slower to transmit over a network and parse than those others. The advantage of using a language agnostic format is that each service can be written in a different language depending on the needs of the service. If you have a service that really needs to churn through a lot of data and you need every ounce of speed out of it you can write it in C or even assembly. For those areas where it would be good to do something concurrently you might use Scala, Erlang, or Clojure. The main idea is each part of your program can be written in a language that is best for solving that services problem.

Profiling

January 4, 2010

I’m not talking about the type that is done at the airport, but more the type that lets you see how long or how many times each section of code is called. There is most likely a wide variety of tools for profiling your code in a variety of languages. I’m going to focus on Java in this post using a tool called YourKit, but I assume other tools will give the same basic info.

From time to time, especially when you’re having performance problems, it’s a good idea to run your code through a profiler and gain a readout of how long methods take to run and how many times they are called. A good profiler will break it down into a nice tree view that shows which methods took the longest and which methods inside of those methods took a long time based on the number of invocations. Looking at these numbers you can sometimes find places where your code has a nested loop that isn’t required or is calling an expensive method or function that returns the exact same data multiple times.

A developer is trained to trace code in his/her head while writing it, however sometimes you write a method that uses the functions of an already written piece of code. If you don’t know how efficient that code is, you may place it in a loop without realizing the performance penalty. When testing it locally it might seem to run fine because you’re a single user with very low load. A method that called the DB 10k times might return in seconds and seem to display fine for you, but under load it can slow down in production.

While a load test suite can find that a problem exists in the code, profiling can narrow it down to what’s really causing it. You can profile your code without a commercial or open source profiler by manually adding timestamp information before and after method calls. This isn’t very usable to do for every single call, but if you have an idea of what areas are causing a slow down and you want to narrow it down it can be very helpful.

Recently I ran a profiler on a section of a project I’m working on. I was able to find that to load a fairly simple page that there was a database query being called 22k times. That DB query was pulling in information related to an item I actually used, but was never used itself. While those queries return fairly quickly normally, having multiple users hit the page at the same time will multiply the number of queries. Also, if the latency between the server and the DB increases, we will definitely see a decrease in the responsiveness of the application. If we can eliminate these unneeded DB calls, we can speed up the application for the end users and also increase scalability.

Just because you don’t see an issue when you’re running your quick visual test doesn’t mean that the issue doesn’t exist. Performance is not something that should be tacked on at the last minute, and code needs to be analyzed regularly to see where inefficiencies are introduced.