Archive for the ‘Databases’ category

Remote Datasources

February 18, 2010

Almost all applications today deal with persisted data. The only rare exceptions are usually not very complicated such as a calculator, unit conversion tool, or minor utility. Let us assume that any application we’re working on is going to have some sort of persistant state that must be kept between executions of the application. Typically the state is stored in either a file or a database. File access is useful for many desktop applications used by a single user as the format can be created very easily from the data model, the data is easy to transfer to another user via email or some other file transfer process, and backups involve just storing the file somewhere safe.

Many applications that involve multiple users, including web applications, will instead employ a database. Even single user desktop applications sometimes use a database. Database access is often performed by a section of the code converting objects or structures into database tupils and storing them in tables. When the data is required again, select statements are made that call the data and turn it back into a structure. Usually this storage and retrieval of data is done in the same code base as the rest of the application.

There are a few problems with this approach however. For one, the database code can sometimes become tightly coupled with the business logic of the application itself. The application knows about the data being stored in a database and will often be changed accordingly to facilitate database access. Switching to a different database system can involve very large rewrites of not only the database code, but also application code as it can rely on database specific functionality. Think of functions that pass along snippets of SQL in order to construct the correct objects. Often that SQL will be database specific.

Besides being tied to a single database, the same coupling also ties applications to databases in general. If your application is expecting the data to be stored in a relational database, assumptions in the business logic will be made regarding that. This inhibits your ability to transition to a file store or other NoSQL solution if required. Because of these concerns, my topic for today is about abstraction and remote data sources.

If you code your database logic in with your application code, coupling can form. However if you separate them completely your application doesn’t need to know where the data comes from or where it goes when its saved. All the application has to do is follow a simple generic interface for storing and retrieving data. This not only elements coupling, but has some other scalability benefits that I’ll discuss a bit later.

To really appreciate interfaced based design and loose coupling, try removing your data access code from your application code base completely. Instead, use a language neutral format such as JSON or XML to communicate between your application and your data source, itself a small application. The application will send a request along with perhaps some data if it needs something stored. The data source will then transform that data into a format that can be stored and place it either in a database, file, or any other storage medium that you desire. Later if you want to modify how your data is stored, maybe a different DB system is more suited or you’re moving from a file-based system to a database, you can just modify the implementation of the data source application while not changing a line of code in your actual application. The loose coupling means that as long as the data source maintains a solid interface that the application can interact with, the application can be developed independently of the data source.

If each of your interactions between your application and datasource are atomic and independent of each other, stateless, you can horizontally scale your datasource using replication. Each data source can access a different replicated read-only data store for request information. Meanwhile write access will all go to a master data store that can be replicated to the read-only slaves. Read requests from an application can be distributed to multiple data sources, especially useful if the data source does any CPU intensive filtering or processing.

The main benefit however remains that by separating your data source from your application as a remote service, you can ensure that your code is loosely coupled and that your business logic will not rely upon the underlying method which your data is stored. This will lead to greater flexibility in your code, and a more agile platform on which to build.

Two Databases, One Hibernate

January 13, 2010

Hibernate is a very interesting ORM product that hides the SQL behind mapping your objects to your database. In general it works pretty well at making it easier to deal with database objects. Unfortunately, there seems to be some lack of flexibility. For instance, utilizing two databases for the same dataset in Hibernate.

Here’s my issue: I have a MySQL database that acts as a primary master. There is a slave database that is replicated off of this first database (to make it easier to talk about, I’ll call the master db “db1” and the slave “db2”). Db2 has an up-to-date copy of db1 and is set in read-only mode because we only ever want changes to happen in one location (db1). However, due to database load or scaling we want to modify our application to send all read-only requests to db1 while sending write requests to db2. Since Hibernate abstracts out the database fairly well, I’m not sure how to accomplish this.

In Hibernate you can read an object from the database, make modifications to that object, and Hibernate will seamlessly save those modifications right back into the database. How then do you force Hibernate to read objects from one DB but save them to another? You obviously can’t use the same session, which hibernate is centered around. However attaching a persistent object to a new session of a different database doesn’t seem very feasible either. I suppose there are perhaps some sort of interface or abstract class I can implement or extend to plug in, but there’s no documentation that I’ve found that deals with this issue.

The hassle of losing this flexibility makes me wonder if dealing with an ORM suite is really a benefit. Portability seems to decrease (as much as database agnostic Hibernate’s queries are) because I’m still tied to the one database model. The feature list of the framework talks about “extreme scalability” and that it may be used in a cluster, yet the documentation does not make any mention of this supposed feature. It almost seems like a bullet point that someone placed on the hibernate website as a marketing point that they later expected no one to really look into. Perhaps it can be adopted into a cluster yet out of the ‘box’ functionality seems to be missing or hidden.

I’m not really sure if ‘cluster support’ would really be of any benefit to us anyway. We’re not running what would be called a ‘cluster’. We’re using multiple front-end machines behind a load balancer that handle stateless requests. There is no coupled connection between the application servers. None of them know about any other application server running as there is no clustering software that would do something like share session data. Having a decoupled stateless database system where we can separate reads and writes to different (though content-wise identical) databases does not seem to fit into Hibernate’s current scope.

If anyone has any insight onto this matter I’d really appreciate it. I realize that there’s many aspects of the framework that I have not delved too deeply into and it is very possible that there are interfaces and/or abstract classes that I can implement and/or extend to gain the required functionality I desire. However the pickings will be slow due to lack of decent documentation in this area. I shall instead have to resign myself to trolling internet forums about Hibernate usage in Java, looking for nuggets of useful truth among the squalid remains of ORM flamewars.