Archive for February 2010

Java’s flaws: Why Primitives Are Bad

February 23, 2010

I use Java every day for the majority of my work. This isn’t that uncommon, as Java is one of the most used languages in businesses today. While the language has improved greatly from it’s 1.0 release, some of the legacy features still make working with the language very painful.

Primary I’m talking about Java primitives. Primitives in Java were one of the features taken from languages such as C and C++ that really should have been left out. If it weren’t for those 8 primitive types, Java would be a real Object Oriented Language. Because of those primitives, we can’t really call it that, which can cause some problems when using some OO methodologies.

The reason why primitives are bad is that they are not Objects. A primitive is not extended from the Object class like every other Object in Java. This reduces a primitives ability to participate in polymorphism by casting to the Object class. If a data type cannot be cast as an Object, it is unable to participate in generic methods that accept Objects as their argument.

Of course the first argument someone will bring to me is the availability of Autoboxing that came with Java 1.5. Autoboxing will automatically convert a primitive to its Object counterpart and visa versa on the fly with no additional programming thought. Problem solved right? Well not exactly.

You see while base primitives can be converted automagically to Objects via autoboxing, primitive arrays cannot. Ok well sure they can be created and cast to an object, but the resulting object cannot be initiated via reflection. This is because the ‘class’ of a primitive array has no constructor since it’s not a real Object.

So why would anyone want to create a primitive array, cast it to an object, just to create a similar one again via reflection? Who messes with reflection anyway?

JMock.

If you’re mocking an interface where one of the methods utilizes a primitive array as a parameter, JMock cannot create an expectation for that method being called by passing it in the class of the parameter because the parameter cannot be initiated via reflection. The best you can do is use an expectation that ignores all calls to that mocked object, but then you can’t verify that a method gets called or called a certain number of times.

Why would you even have an interface that used a primitive array rather than an array of the Object wrapper? Well Java has several built in such as String.getBytes(). There are 3rd party transmission libraries that send byte arrays over the wire to a receiver which cannot be tested due to the method signature having a primitive array.

It’s possible that I haven’t dived far enough into the language to find a work around for this language flaw yet. Perhaps it is possible to utilize primitive arrays as objects for mocking that I haven’t discovered. Unfortunately, forums don’t yield much else of interest on the topic either.

If anyone knows of a way to create a JMock Expectations object with a primitive array parameter in one of the mocked method calls, let me know. I’m at a loss and not very fond of the language at this point.

Advertisements

Remote Datasources

February 18, 2010

Almost all applications today deal with persisted data. The only rare exceptions are usually not very complicated such as a calculator, unit conversion tool, or minor utility. Let us assume that any application we’re working on is going to have some sort of persistant state that must be kept between executions of the application. Typically the state is stored in either a file or a database. File access is useful for many desktop applications used by a single user as the format can be created very easily from the data model, the data is easy to transfer to another user via email or some other file transfer process, and backups involve just storing the file somewhere safe.

Many applications that involve multiple users, including web applications, will instead employ a database. Even single user desktop applications sometimes use a database. Database access is often performed by a section of the code converting objects or structures into database tupils and storing them in tables. When the data is required again, select statements are made that call the data and turn it back into a structure. Usually this storage and retrieval of data is done in the same code base as the rest of the application.

There are a few problems with this approach however. For one, the database code can sometimes become tightly coupled with the business logic of the application itself. The application knows about the data being stored in a database and will often be changed accordingly to facilitate database access. Switching to a different database system can involve very large rewrites of not only the database code, but also application code as it can rely on database specific functionality. Think of functions that pass along snippets of SQL in order to construct the correct objects. Often that SQL will be database specific.

Besides being tied to a single database, the same coupling also ties applications to databases in general. If your application is expecting the data to be stored in a relational database, assumptions in the business logic will be made regarding that. This inhibits your ability to transition to a file store or other NoSQL solution if required. Because of these concerns, my topic for today is about abstraction and remote data sources.

If you code your database logic in with your application code, coupling can form. However if you separate them completely your application doesn’t need to know where the data comes from or where it goes when its saved. All the application has to do is follow a simple generic interface for storing and retrieving data. This not only elements coupling, but has some other scalability benefits that I’ll discuss a bit later.

To really appreciate interfaced based design and loose coupling, try removing your data access code from your application code base completely. Instead, use a language neutral format such as JSON or XML to communicate between your application and your data source, itself a small application. The application will send a request along with perhaps some data if it needs something stored. The data source will then transform that data into a format that can be stored and place it either in a database, file, or any other storage medium that you desire. Later if you want to modify how your data is stored, maybe a different DB system is more suited or you’re moving from a file-based system to a database, you can just modify the implementation of the data source application while not changing a line of code in your actual application. The loose coupling means that as long as the data source maintains a solid interface that the application can interact with, the application can be developed independently of the data source.

If each of your interactions between your application and datasource are atomic and independent of each other, stateless, you can horizontally scale your datasource using replication. Each data source can access a different replicated read-only data store for request information. Meanwhile write access will all go to a master data store that can be replicated to the read-only slaves. Read requests from an application can be distributed to multiple data sources, especially useful if the data source does any CPU intensive filtering or processing.

The main benefit however remains that by separating your data source from your application as a remote service, you can ensure that your code is loosely coupled and that your business logic will not rely upon the underlying method which your data is stored. This will lead to greater flexibility in your code, and a more agile platform on which to build.

Scalability Testing with Virtual Machines

February 8, 2010

Making a service based, distributed application is a good way to help your scalability goals. If you can compartmentalize the different aspects of your application into separate processes that can host services through either web services or plain sockets, you’ll gain the ability to make asynchronous and parallel calls between services.

When you’re developing an application you generally use a single development machine. You can run into problems when trying to do integration tests between multiple services due to the limits placed on listening to ports and how some technology stacks can act differently on the same machine in comparison to using multiple machines. While your application might be distributed onto many servers, your development environment might not have the same number of servers to access, often due to budget constraints. This is especially true if you’re just hacking something up in your spare time at home or if you’re a self-employed programmer. However, machine resources can be limited in many companies due to the cost of those servers.

Good news is you probably don’t need additional physical servers just for your own testing. Most computers today are so powerful that you don’t use all of the resources of it very often. You can harness those additional resources by running virtual machines and using those as your additional servers. A really good place to acquire a high-quality, free virtual machine package is from Virtual Box. It’s an open source virtual machine started by Sun and now owned by Oracle that’s quite fast, multiplatform, and of course free.

The main idea will be to run multiple virtual machines, each one representing a proxy of a real server you would run your application on. You can then distribute your services onto these VMs and test how your interactions work. Each VM will have less resources than a real machine, but still enough for most testing you’ll need to do. In fact you can use VMs on your real servers if you find your resource utilization on those servers are low, but you need more machines for your application design.

This is also useful if you’re attempting to build a fault-tolerant distributed application where a single service might be replicated onto many machines. Any distributed application needs to deal with node failure, which is difficult to simulate on a single box. However, if you run your service on multiple virtual machines, you can see the effect of shutting one down during execution to see if your clients can handle the service outage and redirect to a running service. This can also be used to test the reintegration of a previously down server back into a distributed service pool.

Of course this isn’t new stuff at all. Many people have been using VMs for years for a variety of purposes. I’d like to mention however that if you’ve avoided VMs because you don’t see how they can apply to your work, if you think your work is too small for it, you should reconsider. Even a non-developer can benefit from using Virtual Machines. Sometimes you might want to run a piece of software that doesn’t work in your current OS, but you’d like to gain its services. You can fire up a VM of the exact OS that software supports and run it inside of a virtual machine. Some VMs can be fairly light-weight, taking up only a minor amount of ram, disk space, and cpu usage.

Another developer usage for VMs is to make a completely clean build environment. If you want to make sure your package can build, run, and execute with many Linux distributions you can create automations to create new virtual machines, install the OS, import your code, compile, test, run your application, export the results, then remove the virtual  machine. You can use VMs as part of an extended systems test for your application. This way you can find out if your application relies on some odd configuration that might cause difficulties for your users. It also can help you refine your build process and configurations for multiple distributions or Operating Systems.

In the current development climate, processors have nearly reached their speed limits. It is said that the next big push is for more parallel processing among multiple cores. Beyond that is distributed computing. When needs reach beyond the multiple processors of a single machine, we’ll look at the resources of multiple processes of multiple machines. This work is already being done at Yahoo!, Google, Microsoft, Amazon, Ebay, Facebook, and all of those other large tech corporations. However in the future I suspect that it will become fairly common to utilize the resources of many devices for a single application. Virtual Machines can give you a cheap way to explore the exciting realm of distributed applications from the comfort of your single (hopefully mutli-core, multi-processor, or both) machine.