Monday, November 30, 2009

Compression to the rescue

We've had issues getting enough storage space allocated on the development server our project is intended to use. For this project, our team doesn't have full DBA rights on the box and have very limited privileges. Even before I joined the team, a request had been made for a little over 3.5 TB of space to be used for the development of a proof of concept datamart. The first project deliverable is due this week and we've only been allocated 200 GB so far. As you can imagine, it's difficult to get needed data loaded when there is less than 10% of the space we need available.

The data is supposed to span a 2 year period. A set of base transaction tables have to be extracted from various sources and loaded, then that data will be flattened out into a model that will represent the final design. With only 200 GB currently available, my initial loads for 5 (out of 50) tables took nearly all of what we had available. So, I had to do something to try and fit as much data as possible into the space we had available. Compression to the rescue!

I decided to compress the tables I had extracted using basic compression (just add COMPRESS to the table definition). But first, in order to try and get the most optimal compression possible, I collected statistics on the tables I wanted to compress and reviewed the num_distinct column statistic. Using this statistic, I sorted the data into my new compressed table using a column order from lowest to highest number of distinct column values.

When the new compressed tables were ready, I had achieved 86% compression! Using this method, I was able to load the majority of the key tables I needed to allow progress to continue for our initial project deliverable. I still haven't been able to get everything, but at least I got enough to allow our statisticians to produce some initial findings (the project is intended to create a predictive analytical model).

An additional 2 TB was allocated to us on Friday but of course it was on a different server so the initial data has to be exported and imported into the new location (sigh...). But the lack of available space gave me an opportunity to effectively use a feature I probably wouldn't have used if I had all the space I needed to start with.


robert said...

Table compression in Oracle is an only moderately useful feature IMHO. This is only useful in special cases such as yours where the table is not updated after an initial load. And, of course, performance figures you might have drawn from that compressed table may be quite different than for the real thing. (In case someone wonders: index compression is a totally different story though.)

Karen said...

I agree that basic compression has limited usefulness except in specific cases like this one. But, for what I needed, it was just the ticket. It made me remember that I need to give thought to choosing the right tool for the job. In this case, compression was the perfect tool to do the job I needed.

jan van mourik said...

Just out of curiosity Karen, which Oracle version are you using?

Karen said...

Oracle version 11r1

Oxnard said...

Assuming I am following u correctly the compression is being done in a development environment. Additionally, reading into this the current plan is to have non-compressioned objects in production. What would be the risk of the dread problem of: It runs fast in in dev but not prod? Anyway to account for this? We face a simular issue.