Wednesday, November 12, 2008

Database Independence?!

OK. Call me dense. Call me biased. Call me performance conscious. But I just can't groove on the concept of database independence. You know what I mean? The whole idea of having all your application logic built into your application and the database does basically nothing...well, it just doesn't make sense. Even after hearing all the arguments, I just don't get it.

There are so many optimizations that can't be utilized when the database isn't involved. The database ends up serving back boatloads of "simple" query results (usually max of 2-3 tables per query and mostly single table queries) and the application does everything else. The application basically processes the joins, applies the constraints...everything. What would amount to simple work that could occur in minimal time for Oracle, amounts to seconds of time processing the data at the application layer. Heck...if you're gonna do that, why not just have flat files to store your data and skip the database all together!? It'd sure save you a bundle on Oracle license fees!

Toon Koppelaars recently delivered a presentation at Miracle Oracle Open World that I attended that was about a database centric approach. I hope he'll post it somewhere soon for "public consumption" as it makes an extremely strong argument for going towards a "fat" database and getting the bulk of logic and function out of the application layer. Amen brother!

I'm ranting, so I'll stop. But, I'd love to hear from you about your thoughts. I'm an open-minded kinda gal, but when it comes down to it, I find all the reasons people share with me about why they absolutely have to write database independent code and put all the code in their application layer to be just a bunch of excuses. They are excuses that cover up 1) lack of sufficient knowledge of the databases they want to use and 2) a preference for what they think is easy. I mean, when you have a stable full of developers who can code Java fast and furious, but have been taught to not care about the database, designing for a database centric approach just isn't the forte...so it doesn't happen. But, when performance problems start to occur, they want to blame the database. Again...I just don't get it.

Can anyone help me out here?

19 comments:

Noons said...

Karen,

30 years ago when I first worked with Codasyl databases, the argument was the same.

Basically: a bunch of illiterate coders with no clue about the wider discipline and science of Data Management imposed their views on the world.

With dire consequences.
It took years for that mindset to disappear.

Nowadays, we are doing essentially the same: letting a bunch of ignorant "web coders" under the nice name of "architects" get away with designs that are absolutely demented and were mathematically and empirically proven wrong decades ago.

Let those who ignore the lessons of history be forever condemned to repeat it.

Bundit said...

Retrospect to oracle product itself, Does it exploit the generic code for each platform ?

Some developers'd rather write a standard program as commercial to minimize their operation and maintenance cost.

I agree with you most developers should profoundly use of their dependent products because they have already spent a lot of money. But from my experience, they haven't, that may come from your post "[url=http://karenmorton.blogspot.com/2008/10/courage-to-learn.html]The courage to learn[/url]".

Cristian said...

One of the excuses, in my little experience, was "marketing". I say "was" because it was when our company started developing a new product. Now we are dependent on Oracle RDBMS but some marketing mans loves to say that we can install our application over others RDMBS.
I think that if you put all application logic on application does not make sense pay expensive license fees, you use a free rdbms that makes only RDBMS (as opposed to your extreme example of flat files :)).
The argument "lack of sufficient knowledge of the databases they want to use" is that used by some Oracle competitors that say that Oracle is expensive and complicated. For a lot of little applications where performance is not a problem this is somewhat true. Do you agree?

DomBrooks said...

We're mainly database people so we're biased.

I think the only excuse for database independence is third-party products that have to support N databases - even then there's a likely trade-off in performance.

The problem is, as others have said, that the majority of developers don't know how to write efficient and performant database code and most aren't interested either.

As a result, they either put everything in the middle tier or they write crap code, say that the database is the bottleneck, and then migrate everything to the middle tier. Then they graduate on to be architects as Noons says.

From my experience, a typical team ratio of database expert to c# or java or other language programmer is somewhere between 1:10 and 1:20.

It's not uncommon for an experienced programmer to have seen one or two 18 month projects. The first one might have been a hibernate project doing trivial stuff and it flew, the second one was a mature product with a lot of database work going on, much written by people not knowing how to get the best of the database. In their ignorance, their conclusion is that databases are bad.

Narendra said...

Karen,

Though I completely agree with you over this (and I always try to advocate the database-centric development approach), what do you say when people tell you about success of eBay architecture http://highscalability.com/ebay-architecture?
From management / user point of view, doesn't it satisfy all the criteria of availability, scalability and profitability? How do you counter that?

Brian Tkatch said...

The main excuse heard is regarding switching RDBMs. When i hear about the cost of Oracle, i know why too. Of course, there probably isn't an app that actually is database independent, nor have they weighed the benefits.

I think people are aiming for a goal not worth the cost, and fooling themselves into thinking they are actually accomplishing that.

The fat client is a good idea. The Database doesn't just store the data, it is the authority on the data. You want to store something? You better have rights and fit into the rules. You want to retrieve something? You better have the rights and do it the set way.

If that is the case, the DB should have an API of sorts. Allows queries for the simple things, but use VIEWs to standardize the relationships inbetween entities. Use PACKAGEs with FUNCTIONs and PROCEDUREs to create standards of how things are done. Use the DB for something other than a messy drawer.

Jerry C said...

Tom Kyte talks about this in his "Oracle Expert One-On-One" book - it's Chapter One in my old copy! The counterpoint to the argument that it's too expensive to switch the RDBMS if you use database-specific features, is that you're not using the features you paid for if you don't use them.

Doesn't look like you're getting many arguments for database independence here!

Jerry

Karen said...

Thanks everyone for your comments. I suppose I don't feel alone in my inability to fathom the database independence rabbit hole.

I guess I just think of it as being a bad case of not knowing what you don't know. When decisions are made without knowledge, the shortcomings of those decisions come back to haunt you later. I think too often the decision to strictly adhere to a database independent design doesn't (as several of you have said) given enough time/effort toward understanding the consequences of treating all databases the same. Sadly, by the time the "truth" is found out, the issues can't be addressed easily - if at all. Sad and scary both.

Narenda,
As best I can tell, eBay uses a single database - Oracle. They're not trying to write code that totally disassociates itself from the database. They have utilized the database features and application features as a mix to find their "perfect" strategy. They are not trying to make the database be a trivial part of their application as many who strive for database independence do. So, I don't think their process conflicts with my complaints. Maybe I missed something and they are using multiple database vendors, but I didn't see it in my quick review of the link you gave me.

Joel Garry said...

Take a longer term view.

Say it's 1980. There's this concept fairly new in the business world, "relational databases."

Would you have wanted to be stuck with a particular RDBMS then? How about 1985? 1990? What if you picked the wrong one? (which of course could be, "anything but Oracle or, excuse the blasphemy, SS").

OK, it's 2008. Which is more important, databases... or applications? Which has the greater potential for revenue growth? What do you think Larry Ellison would say?

Most businesses must grow or die (a hard thing in the current economic environment, eh?). A company selling applications in 1980 would likely specialize in a particular data engine, and not likely relational. Same company in 1990 would have to make the choice, hook your wagon to a biggie, or try to layer products such that the company can change direction if events in the future required it. That's where db-independence made sense. Who knew if Sybase would win over Oracle? Or Unify or DB2 or Ingres or Informix or...?

The db-independent stuff I work on started with their own relational db, then expanded on top of RMS hierachical files, then used Rdb, then expanded to a bunch of others. For a while, Oracle was a big growth vector for that vendor. Now, SS is, some of the others have been deprecated. And some of the newer products such as BI are being written specifically for each popular engine.

Things like analytics that are too generic in the db engine for serious users of that sort of thing are written custom in general, anyways.

Grow, or die. What is true now, may not be true in the future. Businesses buy apps.

Apps ought to be written for the ages, but that isn't going to happen until there is some perceived benefit.

I used to strongly advocate putting all the logic in the db. I don't anymore, because it implies the db will both be able to easily handle all the business logic, and do so in a timely manner. It just isn't the case, at a very simple level of business logic. What is needed is to be able to direct the logic to the proper layer - and that is what is missing now, in both design and implementation.

Karen said...

Joel,

To your point about a longer term view, I think if you look forward from today vs a historical view, what we find is that the application tools are now where the RDBMS was 20-25 years ago. Today, what application development environment are you going to pick: Javascript, JDBC, Javaserver Pages, iBatis, Xquery, Spring, PHP, Apex (html-DB), ADF-BC, Webforms, Ruby on Rails, Grails, F#, Groovy, Html/XML, Ajax, Struts, Hivemind, Actionscript, .Net, AspectJ, Cocoon2, Webwork, Javaserver Faces, Adobe/Flex, Drupal, WCF, Java, EJB, Tapestry, Toplink, Hibernate, LINQ, Maverick, UIX, Wicket, JHeadstart, Netbeans, Python????? Whatever you pick, will it last? Will you find yourself changing in a year or two when the next tool dejour comes along and some slick sales guy pitches the boss that his product development tool is "the best" and "the wave of the future", etc. What do you do when you have to change tools? Isn't it the same problem as changing databases is perceived to be?

I think the big dollars that people roll out for enterprise database solutions, like Oracle, commits them to that technology for a good long while. But, I do think what you said in the end "What is needed is to be able to direct the logic to the proper layer" is the key. It's just that I think that when vendors create a piece of software that promotes itself as database-independent, it needs to come with a disclaimer that states very clearly:
"Your Results May Vary (i.e. your performance may suck). But, we apologize in advance and will tell you that it's just too bad for you that you spent a million dollars on Oracle and our product runs like crap on it, but hey...it sure keeps things simple for us and our code base." I mean, how can they direct the logic to the proper layer without considering the specific database the application will run on?

If you want your app to run well on "your" database then I think that the apps you choose to use should have been written specifically for that database and take advantage of as many of the features it provides as possible. If not, you are asking for performance issues and at that point, you have no one to blame but yourself I suppose. Maybe it's just that Oracle is an exception and that other databases all perform admirably using database-independent code. But, since I only see customers on Oracle who are hurting from trying to run these apps, I can't speak to the others.

Just seems like a very sticky wicket.....

Unknown said...

“Call me dense. Call me biased.” Karen, we would never call you those awful things. “Call me performance conscious.” Well, maybe that.

Noons said...

Karen and Joel:

I wrote this many, many years ago.

It used to be part of my old web site when I was running my own show, so please excuse the little "spiel" at the top.

This was written long before the current and demented "web 2.0" cacophony of acronyms so well exemplified in Karen's reply.

Joel: you might remember some of our collective discussions on this? I hear your point and I do agree to a certain extent.

But I submit that database independence is not an attribute of an application's coding, it's rather an intrinsic part of its design.

It is perfectly possible to maintain database independence while supporting advanced "near-the-data" database functionality for those that provide it.

Of course: when one develops systems solely based on "frameworks" designed by folks with less knowledge of data management than fits in a pinhead, problems surface...

SydOracle said...

No-one has mentioned the testing component of database independence. Unless you have tested your application running on those different database platforms, you don't KNOW it is independent. You're just guessing, with your fingers crossed.
Difference in locking/concurrency models, treatment of nulls, case (in)sensitivity, even transactioncal capabilities can differ depending for different storage engines within MySQL.
If you want database independence, you will have to pay for a lot of testing, resolving inconsistencies, determining viable DB configurations, and keep doing that with all future bug-fixes.
That may be reasonable for a COTS package for the most popular DB environments. It won't happen for an in-house application.

robert said...

DomBrooks wrote "I think the only excuse for database independence is third-party products that have to support N databases - even then there's a likely trade-off in performance." - these products are database vendor aware but not independent. Or put it differently: they depend on all the supported databases. :-)

IMHO database vendor independence is a myth as soon as you leave the usual trail of simple SELECT and small datasets - which is pretty fast usually. (If you didn't want to deal with larger data ony major reason for using a relational database goes away.)

Unknown said...

Java and .Net developers are cheap and abundant. The approach is biased in that direction.

Not saying it is right, but it is the way it is.

peter said...

Hi Karen,

In the eBay Architecture link mentioned earlier in the thread, it says

"Move cpu-intensive work moved out of the database layer to applications applications layer: referential integrity, joins, sorting done in the application layer! Reasoning: app servers are cheap, databases are the bottleneck."


How do we counter this argument?

Karen said...

Peter,

Well, I'd say that's perhaps a choice they must have made after testing in their environment (at least I hope they tested it and proved that it was the "right" place for it). I'd sure love to see the results of those tests! I mean, have you ever written join logic on your own? A NESTED LOOPs join would translate to row by row (i.e. slow by slow) comparisons.

But, if they found they could put a gazillion (whatever the real number is goes here) app servers in the middle to do the joins/sorts/RI, then they basically had the database return the full filtered row sources and then they let one of the many app servers do the work. I'd bet that if you were to measure the time it takes a single app server to do a join vs. letting a single join occur on the database, the database would win. But, what if you've got 1000's or tens of 1000's of joins that all need to occur at once (which they certainly do)? The database could very well be the bottleneck.

Perhaps from a scalability standpoint, it was found that the database could service the "simple" queries adequately for lots of requests, but had a hard time with servicing complex joins for lots of requests simultaneously. So, as they said, they found that app servers were cheap and they moved the harder to scale work off their database server and onto cheaper hardware.

I don't think we need to counter this. I think it's precisely in line as an example of how they thoughtfully and methodically determined how they could best leverage their database to scale out as they needed. They're not trying to be database independent with this strategy I don't think. I think they're using "Method R" to the utmost and determining when the cost of improvement outweighs the benefit.

The bottom-line as it appears to me is that they determined that spending more to make their database server handle their load was not cost effective, but that buying more app servers was a more cost effective solution for the scale they needed to accomplish.

I can't disagree with eBay's choice and that it was best for their situation. However, I don't think that agreeing with their choices in any way negates my concerns, frustrations and disagreement with the "typical" database independent choices implemented elsewhere without the care and real tested choices that eBay appears to have used.

Kerry Osborne said...

Interesting discussion. I haven't really heard much about "Database Independence" recently (actually I don't think I've heard anyone use that specific term in the last 10 years). I have seen a number of poorly written (architected) applications that do a ton of work in the app that should be done in the database itself, similar to those already mentioned (i.e. get the whole table and do the joins and the filtering and RI in the app). While these types of applications are, in my humble opinion, very poorly architected and consequently very inefficient, they haven't been justified (to me anyway) as a way to maintain "Database Independence". But perhaps that has been a design goal and I just haven't been made aware of it.

I can understand why a vendor that sells an application would like to be able to tell potential customers that it runs on Oracle or SQL Server or MySQL, and I can understand that a development manager might like to have only one set of code to maintain, but in reality I don't think it is possible unless they are willing to live with less than optimal (in some cases quite abysmal) performance. And from a philosophical standpoint, I don't believe that a vendor should penalize the users in order to make the developers lives easier.

For any shop that's not building an application to sell, I think the whole idea is crazy. It's kind of like my decision to switch from a PC to a Mac a few weeks ago. I could run Windows on it and basically not change any of my habits. That way, if I ever decided to switch back it would be pretty easy. But I wouldn't be getting the value that I paid for when I bought the Mac. There was a reason I spent the money on the Mac and I have committed to use it for a few years at least. I believe I should use it for what it's good for, or else why did I buy it in the first place.

I think it really all comes down to using the right tool for the job. There are things that should quite obviously be done in the database itself, such as enforcing referential integrity. I love Karen's example of writing your own nested loop join. There's no way that something like that can be done "better" in the application than at the database layer.

Keep in mind that I am extremely biased, having never worked with anything but Oracle. But I have to agree with Brian's comment:

"I think people are aiming for a goal not worth the cost, and fooling themselves into thinking they are actually accomplishing that."

eBay tools of eBay software offers said...

nice blog i have visited your blog and it's really nice and use full for me... please add my post thanks

ebay software