Thanks to everyone who commented on yesterday's post. After Randolf's comment about the 1000 column limit, I realized that so far, I've not attempted to create the full table in one go. Instead, I've been building 20-ish separate tables that contain the needed raw data plus the computed/aggregated columns. This speaks to Noons comment about having separate tables that access and group data up that can be executed in multiple streams
The SAS guys want to be able to access a single row that represents all the information they need for one type of data (for example, a shipment). So far, as I build the separate tables, I've been using a couple of views that combine several of the tables. The SAS guys then use the view to build a SAS dataset and merge multiple datasets together.
Yesterday's post was prompted by the prospect of having to build a single table in Oracle that does what they're currently doing via SAS. And to Joel's point, SAS is certainly not Oracle! One of the original goals was to ultimately have a table built and maintained in Oracle and only accessed by SAS, not multiple objects in Oracle that have to be merged into SAS datasets.
I've been so busy and focused on getting the data formulated/aggregated, and doing so as efficiently as possible, that I hadn't even made the attempt to build out a full table with all the columns of all the tables included in that one table. When I started thinking about it yesterday, I got all freaked out about doing it and that's what prompted my post. I hadn't even thought about the Oracle imposed limits on the number of columns! I've been told more than once that the team had delivered 1000s of columns before, so I just hadn't given it much thought. But, given there is an Oracle limit, I'm guessing they've never delivered a final Oracle table like they want to do this time...so, it's either been done in SAS previously or the Oracle tables didn't exceed the 1000 column limit.
So, I've got lots to figure out over the next few weeks. This just may be more fun than I can stand! :)
Tuesday, January 5, 2010
Monday, January 4, 2010
2000 columns
How many columns does the largest table you've ever worked with contain? The current project I'm working on has 1 table with almost 2000 columns (and it's likely to add more!). This is the most highly denormalized design I've ever encountered and there's something about it that makes the performance optimizer in me cringe. But, the statisticians that have to munch and crunch this data in SAS tell me this format best suits their needs (based on similar designs used successfully in previous projects).
I think I'm really more concerned about the work that has to be done to populate these columns as most of the columns contain aggregations or formulations of some sort or another. So, perhaps it's not the number of columns that really is niggling at me as it is everything that must occur to produce the values contained in a single row (it's a lot).
What's your experience? Did the number of columns help, hinder or make no difference in the design and performance of the application that used such wide tables? This phase of the project is a proof of concept so it'll be a while before I get hard data on how well (or not) this design really works, but it has certainly made me curious about what other folks have experienced.
Let me hear from you if you have any comments or insights!
I think I'm really more concerned about the work that has to be done to populate these columns as most of the columns contain aggregations or formulations of some sort or another. So, perhaps it's not the number of columns that really is niggling at me as it is everything that must occur to produce the values contained in a single row (it's a lot).
What's your experience? Did the number of columns help, hinder or make no difference in the design and performance of the application that used such wide tables? This phase of the project is a proof of concept so it'll be a while before I get hard data on how well (or not) this design really works, but it has certainly made me curious about what other folks have experienced.
Let me hear from you if you have any comments or insights!
Monday, November 30, 2009
Compression to the rescue
We've had issues getting enough storage space allocated on the development server our project is intended to use. For this project, our team doesn't have full DBA rights on the box and have very limited privileges. Even before I joined the team, a request had been made for a little over 3.5 TB of space to be used for the development of a proof of concept datamart. The first project deliverable is due this week and we've only been allocated 200 GB so far. As you can imagine, it's difficult to get needed data loaded when there is less than 10% of the space we need available.
The data is supposed to span a 2 year period. A set of base transaction tables have to be extracted from various sources and loaded, then that data will be flattened out into a model that will represent the final design. With only 200 GB currently available, my initial loads for 5 (out of 50) tables took nearly all of what we had available. So, I had to do something to try and fit as much data as possible into the space we had available. Compression to the rescue!
I decided to compress the tables I had extracted using basic compression (just add COMPRESS to the table definition). But first, in order to try and get the most optimal compression possible, I collected statistics on the tables I wanted to compress and reviewed the num_distinct column statistic. Using this statistic, I sorted the data into my new compressed table using a column order from lowest to highest number of distinct column values.
When the new compressed tables were ready, I had achieved 86% compression! Using this method, I was able to load the majority of the key tables I needed to allow progress to continue for our initial project deliverable. I still haven't been able to get everything, but at least I got enough to allow our statisticians to produce some initial findings (the project is intended to create a predictive analytical model).
An additional 2 TB was allocated to us on Friday but of course it was on a different server so the initial data has to be exported and imported into the new location (sigh...). But the lack of available space gave me an opportunity to effectively use a feature I probably wouldn't have used if I had all the space I needed to start with.
The data is supposed to span a 2 year period. A set of base transaction tables have to be extracted from various sources and loaded, then that data will be flattened out into a model that will represent the final design. With only 200 GB currently available, my initial loads for 5 (out of 50) tables took nearly all of what we had available. So, I had to do something to try and fit as much data as possible into the space we had available. Compression to the rescue!
I decided to compress the tables I had extracted using basic compression (just add COMPRESS to the table definition). But first, in order to try and get the most optimal compression possible, I collected statistics on the tables I wanted to compress and reviewed the num_distinct column statistic. Using this statistic, I sorted the data into my new compressed table using a column order from lowest to highest number of distinct column values.
When the new compressed tables were ready, I had achieved 86% compression! Using this method, I was able to load the majority of the key tables I needed to allow progress to continue for our initial project deliverable. I still haven't been able to get everything, but at least I got enough to allow our statisticians to produce some initial findings (the project is intended to create a predictive analytical model).
An additional 2 TB was allocated to us on Friday but of course it was on a different server so the initial data has to be exported and imported into the new location (sigh...). But the lack of available space gave me an opportunity to effectively use a feature I probably wouldn't have used if I had all the space I needed to start with.
Wednesday, October 28, 2009
Book Adverts
I'm very pleased to announce two Oracle related books I have co-authored are to be published shortly by Apress.

The first book is entitled Expert Oracle Practices: Oracle Database Administration from the Oak Table and it will be officially published in December. There are 9 chapters currently available for eBook download via the Apress Alpha program. The Alpha program allows you to purchase (at a substantial discount from the full publish price) an unedited, unfinished pre-release format. The full book isn't available yet, but when it becomes available, you will be able to download the full eBook at no additional cost. It's a good option if you want to get a jump on what's to come! I wrote the chapter on Managing SQL Performance. I'm really excited about the book and am very proud to be among the group of Oak Table folks who are co-authoring it together.
Not far behind the Oak Table book will be Beginning Oracle SQL. The book is a revision of the classic Mastering Oracle SQL and SQL*Plus by Lex de Haan originally published in 2004. Again, I was fortunate to be able to work with a fantastic group of co-authors to revise Lex's work to include many of Oracle's latest developments to the SQL query language. I knew Lex for only a very short time before his passing in 2006 but I knew him to be a brilliant and wonderful man and I'm very honored to be able to contribute to revising his original work.
These books represent my first time as a published author. I've written many whitepapers and articles for various Oracle user group periodicals and such in the past but never attempted the process of writing a book. After having this experience, I'm grateful that my first time was as a co-author. It's comforting to know there are other folks out there sweating the process with you!
The folks at Apress have been awesome and I hope that my first experience as an author won't be my last!

The first book is entitled Expert Oracle Practices: Oracle Database Administration from the Oak Table and it will be officially published in December. There are 9 chapters currently available for eBook download via the Apress Alpha program. The Alpha program allows you to purchase (at a substantial discount from the full publish price) an unedited, unfinished pre-release format. The full book isn't available yet, but when it becomes available, you will be able to download the full eBook at no additional cost. It's a good option if you want to get a jump on what's to come! I wrote the chapter on Managing SQL Performance. I'm really excited about the book and am very proud to be among the group of Oak Table folks who are co-authoring it together.
Not far behind the Oak Table book will be Beginning Oracle SQL. The book is a revision of the classic Mastering Oracle SQL and SQL*Plus by Lex de Haan originally published in 2004. Again, I was fortunate to be able to work with a fantastic group of co-authors to revise Lex's work to include many of Oracle's latest developments to the SQL query language. I knew Lex for only a very short time before his passing in 2006 but I knew him to be a brilliant and wonderful man and I'm very honored to be able to contribute to revising his original work.
These books represent my first time as a published author. I've written many whitepapers and articles for various Oracle user group periodicals and such in the past but never attempted the process of writing a book. After having this experience, I'm grateful that my first time was as a co-author. It's comforting to know there are other folks out there sweating the process with you!
The folks at Apress have been awesome and I hope that my first experience as an author won't be my last!
Tuesday, October 20, 2009
Things that make you go "hmmmmm"
I was reading a blog post entitled How Good is Good?. It was written 8 years ago by a graphic designer by the name of Stefan Sagmeister. The post was speaking about creating designs that were meaningful and make an impact on the world (or the part of the world each design is directed toward). While the post was interesting, it was one of the comments that caught my eye. Most of the commenters were inspired by the post but one commenter, in reference to the idea of making a match between good causes and good design a priority vs the more mundane every day marketing stuff, said:
It was the commenter's question that made me thoughtful: Do people really think that everybody enjoys their job?
I've always felt very fortunate to be able to do work that I really love doing. There may be times when I get frustrated with some of the particulars, but overall I know I'm doing "the thing" I'm supposed to be doing.
My choice of career path came to me in college. I started out as an accounting major but after my first computer-related course, I switched majors and knew that working in the information technology field was for me. So, I got my degree and have worked in the field ever since.
I honestly can't imagine working in a job that I didn't like. So, it was just a bit of an "ah-ha" moment to consider the idea that not everyone enjoys their job. What would that be like? What would it be like to wake up every morning and go do something that you don't enjoy for 8 or more hours? That would seem like torture to me.
But, I suppose it may be more likely that more people don't enjoy their jobs than do. Have you ever seen the TV show "Dirty Jobs"? Some of those jobs are (in my opinion) just horrid. I can't imagine having to do some of those tasks every day. But, there are a lot of jobs out there that I can't imagine doing that many people love.
However, I think that it may be more about "doing what you gotta do" many times vs actually choosing a job you want/enjoy/love. I just find it sad, and somewhat humbling, to think about all the people who do things they don't enjoy just to make enough money to maintain a life for themselves and their families. To think of spending such a large portion of my life doing something I found no enjoyment in really makes my heart ache.
Are you doing what you really enjoy?
Simple, don't go into graphic design, Choose something else. Cause if you don't do "that" work, other designers will. Do people really think that everybody enjoys their job?
It was the commenter's question that made me thoughtful: Do people really think that everybody enjoys their job?
I've always felt very fortunate to be able to do work that I really love doing. There may be times when I get frustrated with some of the particulars, but overall I know I'm doing "the thing" I'm supposed to be doing.
My choice of career path came to me in college. I started out as an accounting major but after my first computer-related course, I switched majors and knew that working in the information technology field was for me. So, I got my degree and have worked in the field ever since.
I honestly can't imagine working in a job that I didn't like. So, it was just a bit of an "ah-ha" moment to consider the idea that not everyone enjoys their job. What would that be like? What would it be like to wake up every morning and go do something that you don't enjoy for 8 or more hours? That would seem like torture to me.
But, I suppose it may be more likely that more people don't enjoy their jobs than do. Have you ever seen the TV show "Dirty Jobs"? Some of those jobs are (in my opinion) just horrid. I can't imagine having to do some of those tasks every day. But, there are a lot of jobs out there that I can't imagine doing that many people love.
However, I think that it may be more about "doing what you gotta do" many times vs actually choosing a job you want/enjoy/love. I just find it sad, and somewhat humbling, to think about all the people who do things they don't enjoy just to make enough money to maintain a life for themselves and their families. To think of spending such a large portion of my life doing something I found no enjoyment in really makes my heart ache.
Are you doing what you really enjoy?
Thursday, October 15, 2009
OOW09 and my favorite 11gR2 feature
I decided to wait until the end of Oracle Open World 2009 before I posted anything. So many people have been tweeting and blogging and shooting video of this year's event, I figured I'd put in my two cents worth at the end.
Man...it's been a l-o-n-g week. I don't know if it's my increasing age or what, but being in a herd of this size for an entire week has left me feeling somewhat bruised (both physically and mentally). I started at 9am on Sunday morning by attending Tom Kyte's keynote for Oracle Develop and have made it through Thursday - which seems like a minor miracle somehow. But I have survived!
So much goes on here, it's hard to keep moving through it given the mass of people and multiple session locations (not to mention the after hours opportunities for fun and frolic). Add a bit of rain (some kind of monsoon as it seemed to me) and, at times, it was like being in an undersized kennel filled with wet dogs. :)
The sessions I attended, and there were many, were all quite excellent. I sat in on most all of the 11gR2 sessions and am really amazed at many of the new features and options 11gR2 offers. My favorite is edition-based redefinition (also referred to as "online application upgrades"). This feature isn't getting nearly as much press time as some of the other more sexy things (for instance, you couldn't move 10 feet without seeing or hearing about Exadata) but as far as I'm concerned, editioning is a hugely beneficial new feature.
By the way, it is a feature, not an option. Wow...for once it's something you don't have to pay extra for! I loved that when Tom Kyte did his "10 - no 11 - top things about 11gR2" session, that he gave editioning two spots on the list. So what is editioning?
1. You can create a new edition to contain any new code changes.
2. Data changes can be made (add new columns or new tables) and the new code can safely write to the new stuff without any of it being seen by the old edition.
3. Different projections of tables are exposed via editioning views so that each edition is allowed to see only its own columns.
4. If the old edition makes data changes those changes are propogated using cross-edition triggers into the new edition's columns (or vice-versa).
The bottom-line is that now with 11gR2, if you want to make changes to procedures, packages, views, synonyms, etc, you can do it using editioning and maintain both the old (original) version as well as the new. It also means a lot less headache. Have you ever tried to CREATE OR REPLACE PACKAGE in production while the instance is in use? What happens? Well, if the package is in use, Oracle has to wait until it is no longer in use in order to replace it. So, Oracle waits. But, if new requests to use the package come in during this time, those sessions must wait also. That doesn't even begin to touch what happens if the new package causes other objects to be invalidated and thus they need recompilation also. It can be a pretty long, arduous and resource-intensive process to get something seemingly so simple done.
With editioning, a new edition is created and the modified package is compiled into that edition. No muss, no fuss...and no waiting! This is just a bit about this great feature, but I recommend you look on OTN for more info and several whitepapers Oracle has available on the topic.
There's so much more I could blabber on about, but I'll save other thoughts for other posts. All in all, it's been a good week, my presentation on Tuesday was well attended and I got great feedback, and I've got lots of great material to take back home for further review and sharing with colleagues.
Man...it's been a l-o-n-g week. I don't know if it's my increasing age or what, but being in a herd of this size for an entire week has left me feeling somewhat bruised (both physically and mentally). I started at 9am on Sunday morning by attending Tom Kyte's keynote for Oracle Develop and have made it through Thursday - which seems like a minor miracle somehow. But I have survived!
So much goes on here, it's hard to keep moving through it given the mass of people and multiple session locations (not to mention the after hours opportunities for fun and frolic). Add a bit of rain (some kind of monsoon as it seemed to me) and, at times, it was like being in an undersized kennel filled with wet dogs. :)
The sessions I attended, and there were many, were all quite excellent. I sat in on most all of the 11gR2 sessions and am really amazed at many of the new features and options 11gR2 offers. My favorite is edition-based redefinition (also referred to as "online application upgrades"). This feature isn't getting nearly as much press time as some of the other more sexy things (for instance, you couldn't move 10 feet without seeing or hearing about Exadata) but as far as I'm concerned, editioning is a hugely beneficial new feature.
By the way, it is a feature, not an option. Wow...for once it's something you don't have to pay extra for! I loved that when Tom Kyte did his "10 - no 11 - top things about 11gR2" session, that he gave editioning two spots on the list. So what is editioning?
1. You can create a new edition to contain any new code changes.
2. Data changes can be made (add new columns or new tables) and the new code can safely write to the new stuff without any of it being seen by the old edition.
3. Different projections of tables are exposed via editioning views so that each edition is allowed to see only its own columns.
4. If the old edition makes data changes those changes are propogated using cross-edition triggers into the new edition's columns (or vice-versa).
The bottom-line is that now with 11gR2, if you want to make changes to procedures, packages, views, synonyms, etc, you can do it using editioning and maintain both the old (original) version as well as the new. It also means a lot less headache. Have you ever tried to CREATE OR REPLACE PACKAGE in production while the instance is in use? What happens? Well, if the package is in use, Oracle has to wait until it is no longer in use in order to replace it. So, Oracle waits. But, if new requests to use the package come in during this time, those sessions must wait also. That doesn't even begin to touch what happens if the new package causes other objects to be invalidated and thus they need recompilation also. It can be a pretty long, arduous and resource-intensive process to get something seemingly so simple done.
With editioning, a new edition is created and the modified package is compiled into that edition. No muss, no fuss...and no waiting! This is just a bit about this great feature, but I recommend you look on OTN for more info and several whitepapers Oracle has available on the topic.
There's so much more I could blabber on about, but I'll save other thoughts for other posts. All in all, it's been a good week, my presentation on Tuesday was well attended and I got great feedback, and I've got lots of great material to take back home for further review and sharing with colleagues.
Friday, October 9, 2009
Memorization vs. Understanding
This graphic was recently pointed out to me (thanks Jared!) and I loved it...so I thought I'd share.

The key point (as noted in the upper right) is:
I believe this is an accurate statement. I've been around many people who seemed to be able to spout out the most arcane command syntax or provide the "textbook" definition of virtually any topic. But, if there was a problem where something more than knowing the definition of something or being able to quote the documentation on a topic was required, they quickly got lost and didn't know what to do or how to proceed.
Certification exams, for instance, ask questions that can be answered correctly if you've memorized a lot of facts. I've always jokingly said that if I gave my mother (sweet southern homemaker that she is) enough material to memorize, that she could likely pass the Oracle certification exams in 1 or 2 tries. And although she'd never indulge me to see if I could prove that theory, I suspect it's close to the truth. She's a bright woman with a great memory so I think she could memorize and do pretty well on the tests.
I once met a kid that had just graduated college and had started his first real job with a client I was doing some work for several years ago. The company paid for him to take a two week "boot camp" course and take the certification exams. So, with about 2 months of on-the-job experience, he went to the two week class, took all of the exams at the end of it, passed them all and returned back to work a "certified expert". Of course there was a big production flame-out right after he got back and he got razzed pretty good about "not knowing" what to do since he was now an "expert".
The bottom-line, as I see it, is that memorized knowledge can (and likely should) be part of your foundation. But, it will only get you so far. It's the ability to really understand something and how it works that will help you solve issues that just aren't solvable with memorized facts alone.

The key point (as noted in the upper right) is:
Understanding how something works can drastically reduce the need to memorize a bunch of seemingly arbitrary facts.
I believe this is an accurate statement. I've been around many people who seemed to be able to spout out the most arcane command syntax or provide the "textbook" definition of virtually any topic. But, if there was a problem where something more than knowing the definition of something or being able to quote the documentation on a topic was required, they quickly got lost and didn't know what to do or how to proceed.
Certification exams, for instance, ask questions that can be answered correctly if you've memorized a lot of facts. I've always jokingly said that if I gave my mother (sweet southern homemaker that she is) enough material to memorize, that she could likely pass the Oracle certification exams in 1 or 2 tries. And although she'd never indulge me to see if I could prove that theory, I suspect it's close to the truth. She's a bright woman with a great memory so I think she could memorize and do pretty well on the tests.
I once met a kid that had just graduated college and had started his first real job with a client I was doing some work for several years ago. The company paid for him to take a two week "boot camp" course and take the certification exams. So, with about 2 months of on-the-job experience, he went to the two week class, took all of the exams at the end of it, passed them all and returned back to work a "certified expert". Of course there was a big production flame-out right after he got back and he got razzed pretty good about "not knowing" what to do since he was now an "expert".
The bottom-line, as I see it, is that memorized knowledge can (and likely should) be part of your foundation. But, it will only get you so far. It's the ability to really understand something and how it works that will help you solve issues that just aren't solvable with memorized facts alone.
Subscribe to:
Posts (Atom)

