Finally I slayed the beast and passed 70-451 PRO: Designing Database Solutions and Data Access Using Microsoft SQL Server 2008. Pretty long name for a long exam. In fact, I passed this one back in November, but I can rarely find time to post (even this happy news).
The point is that now I’m an MCITP (for a long time, I pronounced it as MCTIP). The exam process was hard and sweating (I have failed my first attempt with 690 points) but in the end it worth it. I learned a lot of general database design, as a lot of SQL Server specific things, too. I can only suggest this exam to anyone, because you have to learn so many concepts that it helps a lot later.
About the exam itself: a bunch of multiple choice questions, mostly on theory, and underline that word twice, because that was what it’s all about. If you want to pass for the first time (not as I did) then check concurrency a lot, know the lock types, and have an eye for database normalization. The only thing I hated was the questions on row based logic. I never did any, nor will I intend to.
Anyway it’s nice to earn another title, and after I finished my semester (five days from now), I will surely find another exam to pass and post about.
May include but is not limited to: row vs. page, update frequency, page compression implementation, compressing individual partitions
SQL Server databases are stored in the file system and their storage costs space. I think this is not some kind of breathtaking news for anyone, but it’s worth to mention. Now what’s happening when you have a massive amount of data (probably being gathered for years) which you don’t need everyday but takes a lot of place to store? Yes, you can compress it – and even better – you can compress it directly in the database server.
SQL Server provides two kinds of compression mechanisms: row compression and page compression. Row compression is the magic when the database engine stores your fixed length column values in variable length ones, thus saving a considerable amount of space. To know exactly how much is that considerable amount, you can use the sp_estimate_data_compression_savings stored procedure, which tells you how much space can be saved. If you’d need that space badly, and you are sure that compressing the data makes more benefits than harms, than use the ALTER TABLE (or ALTER INDEX) statement along with the REBUILD WITH(DATA_COMPRESSION = ROW) command to compress the given table/index on the row level.
May include but is not limited to: row-based logic vs. set-based logic, batching, splitting implicit transactions
If you try to fulfill this objective following my approach, you’d first look at MSDN, and try to get some info on the topic. You’ll fail. Then you’d google it up for yourself, and find some pretty decent articles about what’s bad in row-based logic, and what’s good in set based. You won’t find anything good about row-based, about cursors, etc.
Let’s put the bigger focus (becoming a better developer) and concentrate on the smaller (take the exam). The information above means that if you stumble into anything questioning you about the use of cursors then the right answer (on the exam) is not to use them, and find the shiny set-based alternative among the answers. I think this will be the key to success (on the exam, again).
Now what is row-based and set-based logic? Thinking row-based is thinking in a for-each loop. You take one entry at a time, examine that and throw it away. The basic entity to work with is a single object (or row, in SQL). This approach focuses on how to solve a problem to gain the results you need. Set-based logic on the other hand works with entities representing result sets. This is some kind of fire and forget style programming. You want results, and you issue an order (a SQL query) to the database engine to get those results, and get them damned fast. You don’t dirty your hands with mundane things like how to get the appropriate data (only when something is bad, then come query hints). I think this little paragraph explained the basic concepts behind row-based and set-based logic. Remember – they want you to think in result sets and simple SQL queries. Because they hate cursors and everything which is considered as row-based. So do what you asked and you get a nice cert to your wall.
As for the other two objectives I found nothing really useful or not known yet. I think you’ll have to google a lot for this objective.
May include but is not limited to: optimizer hints, common table expressions (CTEs), search conditions, temporary storage, GROUP BY [GROUPING SETS|CUBE|ROLLUP]
This one seems to be a rather large objective, but have no fear; we’ll slowly dig through it. Let’s take a look at the optimizer hints first.
As you may already know SQL Server optimizes your queries to perform as best as possible. However even the Query Optimizer can go wrong in some particular cases. As a last resort we have query hints to force some kind of execution. Generally you won’t have to use these kind of beasts, but it won’t hurt you to know about them.
There are three major kinds of hints:
- Join Hints
- Query Hints
- Table Hints
We won’t bother with join hints, because I don’t really see any possibility about them showing up on the exam. So let’s consider query hints first. You can specify query hints in the OPTION clause, at the end of a statement. There are a whole lot of them, and I don’t feel like copying them from MSDN, so search for the link at the end of this post. We’ve met with some query hints already at the transaction part. There are query hints for each locking level: READUNCOMMITTED, REPEATABLEREAD, READCOMMITTED, SERIALIZABLE can be specified in an OPTION clause.
May include but is not limited to: views, FOR XML PATH and EXPLICIT, FOR XML… TYPE
I have a similar post on this, which you can find in the category of the exam 70-433, but here’s a quick refresher. You can retrieve relational data as XML by specifying the FOR XML clause in the end of the query. It has four options:
- AUTO: retrieves data with a default node name of the table. When used with joins, the row names differentiate.
- RAW: mostly for debugging and testing, returns data as xml with the node name as row.
- PATH: lets you query relational data with XPath.
- EXPLICIT: a somewhat complex option which lets you return data with a syntax similar to XPath. I’d recommend using PATH instead, you’d lose only legacy support with it.
There are a bunch of other options which can be used in conjunction with the FOR XML clause, here’s a list:
- XMLDATA: appends an XML-Data Reduced schema to the beginning, but no longer used, consider XMLSCHEMA instead. Works with AUTO, RAW and EXPLICIT.
- XMLSCHEMA: returns an inline XML schema definition. Available for AUTO and RAW.
- ELEMENTS XSNIL/ABSENT: ABSENT is the default, it doesn’t mark NULL values. When using XSNIL, NULLS are included with an xsi:nil attribute set to true. AUTO, RAW and PATH has it.
- BINARY BASE64: when you return binary data in the result set, you must specify this one. Available for every setting.
- TYPE: returns the result set as an instance of type xml, instead of string. Usable with everyone.
- ROOT: lets you specify a root name for the returned xml.
- (‘ElementName’): you can give a name for the default xml node. Only RAW and PATH have this option.
May include but is not limited to: assembly PERMISSION_SET, CLR vs. T-SQL, ordered vs. non-ordered
From SQL Server 2005 we have the nice feature of running managed code right inside SQL Server. This is something similar that Oracle did with Java, but much more cooler. You can think that if you can code in .NET, than T-SQL is certainly dying. This is not true, and because this objective covers the CLR vs. T-SQL topic, here’s a brief when and what:
Consider using SQL CLR when:
- Pulling a lot of data, and executing lots of operations on it, especially string operations and the like.
- Highly procedural code with lots of steps.
- Having existing XPs, or accessing external resources from SQL Server.
- Performing lots of calculations on pulled data.
T-SQL triumphs when:
- When you pull little data over the wire, and do little processing on it.
- You are dealing with set-based operations. T-SQL is always faster in it.
May include but is not limited to: dynamic vs. prepared SQL (CREATE PROCEDURE… WITH EXECUTE AS) procedure, protecting against SQL injection
It won’t be so surprising, but I tell you that SQL Server has some great optimization features. One of these is that it caches query plans. A query plan is a compiled form of what the database engine should do. It stores cached plans in a normalized form, and performs hash lookups when it meets with the same hash to skip compiling the given query, and use an existing execution plan. Marvelous.
Sometimes you need quite a lot of flexibility – let’s say you are dealing with optimal parameters. Optimal parameters tend to result in different query execution plans, thus slower performance. This is the time when dynamic SQL comes into place.
May include but is not limited to: nested transactions, savepoints, TRY/CATCH
A transaction is an atomic, consistent, isolated, durable operation – this is best known as the acronym ACID. There are two kinds of transactions: explicit – this is what you refer as a transaction. It starts with the BEGIN TRANSACTION clause, and ends with a COMMIT / ROLL BACK – what you prefer. There are also implicit transactions. These are enabled when you connect SQL Server with implicit transactions enabled. By this, some clauses automatically start transactions (CRUD operations, DDL statements, etc.) which you can commit or roll back immediately.
To enable implicit transactions, you call the SET IMPLICIT_TRANSCATIONS ON statement. Remember – a transaction either succeeds or fails, so this mode holds some surprises to you.
Nothing can hold you back from nesting transactions. There are some rules, although. The first rule is that the outmost transaction always wins. So if you commit every nested transaction, but roll back the outmost one, every inner transaction is rolled back. The reverse is true.
May include but is not limited to: locking hints, memory consumption
SQL Server has a multigranural locking feature, which lets you lock resources on different levels. Here’s a brief table of them:
|RID||A row identifier to lock rows within a heap.|
|KEY||A row lock within an index.|
|PAGE||A whole page locked in the database (8 KB).|
|EXTENT||8 continuous pages.|
|HoBT||A heap or B-tree, depending on the existence of a clustered index.|
|TABLE||An entire table locked down.|
|FILE||A whole database file.|
|APPLICATION||An application-specified resource.|
|METADATA||Locks in metadata.|
|ALLOCATION_UNIT||An allocation unit.|
|DATABASE||An entire database.|