Design a table and index compression strategy
May include but is not limited to: row vs. page, update frequency, page compression implementation, compressing individual partitions
SQL Server databases are stored in the file system and their storage costs space. I think this is not some kind of breathtaking news for anyone, but it’s worth to mention. Now what’s happening when you have a massive amount of data (probably being gathered for years) which you don’t need everyday but takes a lot of place to store? Yes, you can compress it – and even better – you can compress it directly in the database server.
SQL Server provides two kinds of compression mechanisms: row compression and page compression. Row compression is the magic when the database engine stores your fixed length column values in variable length ones, thus saving a considerable amount of space. To know exactly how much is that considerable amount, you can use the sp_estimate_data_compression_savings stored procedure, which tells you how much space can be saved. If you’d need that space badly, and you are sure that compressing the data makes more benefits than harms, than use the ALTER TABLE (or ALTER INDEX) statement along with the REBUILD WITH(DATA_COMPRESSION = ROW) command to compress the given table/index on the row level.
In the beginning I told you that there are two types of compression mechanisms. The other one is page compression, which can be further divided into two types: column-prefix compression and page-dictionary compression. Column-prefix looks for the same prefixes in columns on the page level. If it finds some, then it creates an anchor record, and calculates values on the bit level – it replaces the prefix occurred in other values with a number of how many bytes should be used to recreate the original value from the anchor record. Sounds messy, and it is but I don’t think we should go this deep.
The other far easier to understand page compression method is page-dictionary compression. It does what its name implies: creates a dictionary of repeated values and uses an index to indicate their occurrence.
You can use the exact same syntax as shown with row compression, but you should replace the ROW keyword with PAGE. Now when you shouldn’t compress tables or indexes:
- When the table is small, it simply won’t worth the effort.
- Heavy traffic in the given table will result in massive CPU overhead. Use data that you won’t need daily.
- First query the sp_estimate_data_compression_savings stored procedure. It measures good, so if there’s little savings, forget compression.