Tuesday, January 5, 2010

2000 columns - take 2

Thanks to everyone who commented on yesterday's post. After Randolf's comment about the 1000 column limit, I realized that so far, I've not attempted to create the full table in one go. Instead, I've been building 20-ish separate tables that contain the needed raw data plus the computed/aggregated columns. This speaks to Noons comment about having separate tables that access and group data up that can be executed in multiple streams

The SAS guys want to be able to access a single row that represents all the information they need for one type of data (for example, a shipment). So far, as I build the separate tables, I've been using a couple of views that combine several of the tables. The SAS guys then use the view to build a SAS dataset and merge multiple datasets together.

Yesterday's post was prompted by the prospect of having to build a single table in Oracle that does what they're currently doing via SAS. And to Joel's point, SAS is certainly not Oracle! One of the original goals was to ultimately have a table built and maintained in Oracle and only accessed by SAS, not multiple objects in Oracle that have to be merged into SAS datasets.

I've been so busy and focused on getting the data formulated/aggregated, and doing so as efficiently as possible, that I hadn't even made the attempt to build out a full table with all the columns of all the tables included in that one table. When I started thinking about it yesterday, I got all freaked out about doing it and that's what prompted my post. I hadn't even thought about the Oracle imposed limits on the number of columns! I've been told more than once that the team had delivered 1000s of columns before, so I just hadn't given it much thought. But, given there is an Oracle limit, I'm guessing they've never delivered a final Oracle table like they want to do this time...so, it's either been done in SAS previously or the Oracle tables didn't exceed the 1000 column limit.

So, I've got lots to figure out over the next few weeks. This just may be more fun than I can stand! :)

Monday, January 4, 2010

2000 columns

How many columns does the largest table you've ever worked with contain? The current project I'm working on has 1 table with almost 2000 columns (and it's likely to add more!). This is the most highly denormalized design I've ever encountered and there's something about it that makes the performance optimizer in me cringe. But, the statisticians that have to munch and crunch this data in SAS tell me this format best suits their needs (based on similar designs used successfully in previous projects).

I think I'm really more concerned about the work that has to be done to populate these columns as most of the columns contain aggregations or formulations of some sort or another. So, perhaps it's not the number of columns that really is niggling at me as it is everything that must occur to produce the values contained in a single row (it's a lot).

What's your experience? Did the number of columns help, hinder or make no difference in the design and performance of the application that used such wide tables? This phase of the project is a proof of concept so it'll be a while before I get hard data on how well (or not) this design really works, but it has certainly made me curious about what other folks have experienced.

Let me hear from you if you have any comments or insights!