Design tables

May include but is not limited to: table width, sizing data types, IN_ROW_DATA (BLOBs), overflow data, sparse columns, computed columns, persisted computed columns

In the heart of every relational database system, we find tables. Lots of them. Everybody knows how they look – rows and columns, and in the intersection the data value itself and a nice header with the column titles.

Our first issue is table width. You can calculate it easily by the built-in dynamic management view sys.dm_db_index_physical_stats. The hard decision is how wide your table should be. A wide table is not necessarily the devil itself, and a very short table can be a bad solution, too. Of course it’s not the best idea to store ordered products in the customer table rendered as columns like Order1, Order2… I think the keyword here is clever design. Normalize your database system, ensure referential integrity, and you won’t have to worry about table width. You’d need to store that data anyway, don’t you?

The question of data type sizes is a more interesting one. Here’s a brief table of them:

tinyint 1 byte 0 to 255
smallint 2 bytes -32768 to 32767
int 4 bytes -2.147.483.648 to 2.147.483.647
bigint 8 bytes -2E63 to 2E63-1
Decimal(precision, scale) 5 to 17 bytes, depending on precision -10E38+1 to 10E38-1
Float(n<=24) || float(24<n<=53) 4 or 8 bytes Damn wide range.
datetime 8 bytes 1753-01-01 to 9999-12-31
Datetime2 6 to 8 bytes, depending on fractional seconds precision 0000-01-01 to 9999-12-31
Date 3 bytes 0000-01-01 to 9999-12-31
time 3 to 5 bytes, depending on fractional seconds precision 0-24

 

As a rule of thumb, store what you need. Knowing the time of a registration on a website with the preciseness of 100 nanoseconds is a cool thing, but storing only the date can save you 5 bytes per row. A million users => the save is 5 MBs. OK, this doesn’t seem to be such a saving, but you should allow about 50-70 characters for a first name, for example (I was interested in the issue, and once had the opportunity to work with a database of the subscribers of a broadcasting company – the longest first name was 35 characters long).

Sparse columns are a great way to save some place – they don’t use space to store NULL values. The trade-off is 4 bytes for every field with a value, and a lot of restrictions. You cannot use geography, geometry, text, timestamp, image, UDFs and ntext with them; also forget a PRIMARY KEY constraint, etc.

Because of the 4 byte “penalty” you should consider using sparse columns when the NULL percentage exceeds a given limit. You can view these limits on this site.

You can use computed columns too (I never used one). These are virtual columns which calculated based on other columns, constants, functions, whatever you like, but not on a subquery. There are some restrictions on these columns; such as if you’d like to use them in indexes or primary keys, their values must be deterministic. Also, computed columns cannot be the target of inserts and updates (what a surprise),

Computed columns get recalculated every time you query them. Marking them as PERSISTED overrides this (because they get stored just like any other column), and let’s them participate in indexes, for example. But this is more the subject of indexes than tables, so we’ll look at this later.

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: