Monday, November 30, 2009

Compression to the rescue

We've had issues getting enough storage space allocated on the development server our project is intended to use. For this project, our team doesn't have full DBA rights on the box and have very limited privileges. Even before I joined the team, a request had been made for a little over 3.5 TB of space to be used for the development of a proof of concept datamart. The first project deliverable is due this week and we've only been allocated 200 GB so far. As you can imagine, it's difficult to get needed data loaded when there is less than 10% of the space we need available.

The data is supposed to span a 2 year period. A set of base transaction tables have to be extracted from various sources and loaded, then that data will be flattened out into a model that will represent the final design. With only 200 GB currently available, my initial loads for 5 (out of 50) tables took nearly all of what we had available. So, I had to do something to try and fit as much data as possible into the space we had available. Compression to the rescue!

I decided to compress the tables I had extracted using basic compression (just add COMPRESS to the table definition). But first, in order to try and get the most optimal compression possible, I collected statistics on the tables I wanted to compress and reviewed the num_distinct column statistic. Using this statistic, I sorted the data into my new compressed table using a column order from lowest to highest number of distinct column values.

When the new compressed tables were ready, I had achieved 86% compression! Using this method, I was able to load the majority of the key tables I needed to allow progress to continue for our initial project deliverable. I still haven't been able to get everything, but at least I got enough to allow our statisticians to produce some initial findings (the project is intended to create a predictive analytical model).

An additional 2 TB was allocated to us on Friday but of course it was on a different server so the initial data has to be exported and imported into the new location (sigh...). But the lack of available space gave me an opportunity to effectively use a feature I probably wouldn't have used if I had all the space I needed to start with.