Christian Posta bio photo

Christian Posta

Principal Middleware Architect @ Red Hat, open-source enthusiast, committer @ Apache, Cloud, Integration, Kubernetes, Docker, OpenShift, Fabric8, #blogger

Twitter Google+ LinkedIn Github Stackoverflow

Recently, a reader spotted a comment that I made on a different blog concerning generalizing data-access operations for inappropriate reasons. What follows was his email and my response:

Christian,

You made the following comment on this blog:

I've seen many examples similar to this that try too hard to fight an separation of concerns issue by 'generalizing' their data-access operations way too much.

Could you tell me why this approach is trying too hard?

Thanks,

Bob

Bob,

Here's what I meant by that statement.

Data Access Objects (DAO) are used for primitive access operations against an abstracted data store (database, web services, RMI, etc, etc). Those primitive operations include insert, update, delete, query. I think the DAO should be limited to those operations, and only those operations, with an emphasis on very simple queries for the 'query' operation (e.g., "findByID"). The purpose of this reasoning is to keep the functionality on the DAO very focused on its limited responsibility. Adding methods such as "findByStreetAddressAndZip" or "queryForByAccountBalance" etc, prove to muddy the responsibility of a DAO.

However, those types of methods will most likely be necessary. But the context in which they are necessary helps illuminate the best place to put them. For example, in an architecture where the domain layer is nicely separated from the rest of the supporting software, and all domain logic resides there, those domain objects will need to retrieve data from a data store. Enter the repository pattern. (http://thinkddd.com/glossary/repository/) The repository pattern would provide the glue between the data store and your domain objects... you will find methods such as findByStreetAddres or findLastTransaction, etc, etc, but they will be completely related to the business logic and operations. These methods in the repository will provide a very explicit "seam" or contract between your business logic and the data store.

A lot of times the very subtle distinction between a "seam between business logic and data store" becomes muddied to mean "a seam between your software and data store'. This manifests itself when the repositories act as a seam between the business logic as well as the seam for the user interface. This is what leads to the explosion of "findBy..." "findWhenThisIsTrue", etc methods. The user interface is constantly querying for data to display to the user, but is a repository the best place to put those methods? A repository is to act as a seam between the business logic and the data store, not the entire software (GUI) and the data store.

When the UI is also being fed by the repository objects, developers then cook up the reason for wanting to generalize the repository methods into findBy(Query). What this does is delude any sort of seam or contract you had with the business logic/data store, as well as open up the repository/data access to mean anything.

The fact is, retrieving data from a data store for the purpose of painting the UI is a DIFFERENT CONCERN than retrieving data from the data store to support the business logic. Coming to this realization can help simplify a design by putting the appropriate logic in appropriate parts of the architecture and keeping their responsibilities focused.

A simple example would look like this:
The domain module would contain repositories that implemented very specific methods for retrieving data from a data store in support of the domain operations. Generalized queries to support any UI functions would be disallowed. The domain layer would know nothing about the UI and what data it displays.

A UI module would have separate data-access classes specifically to support displaying data in the UI. Even if this approach *appears* to introduce some "duplication." You may find folks take up crusades against this 'duplication' because they don't understand the separation of concerns in this architecture. The *contexts* in which these objects are being used are completely different, therefore the objects should not be considered to be the same. The objects and data-access classes that support the UI will be able to change independently of the domain layer. They can be modified endlessly without any worry of breaking the domain. They can come from the same database/data store that the domain layer gets its data, or it can come from a completely separate database/data store in much larger applications (reporting or read-only DBs). This will allow your app to scale substantially better than if the UI and business logic operations/objects were all sloppily mangled together.

I realize parts of my explanation may need further examples or commentary, so please let me know where to expound if necessary.