This article is a continuation of Cursors. Previous Parts can be viewed here:
Cursors : An In Depth Look – Part 1
Cursors : An In Depth Look – Part 2
This section will go into the finer details of the cursor and set-based operations. We will be looking at what is going on behind the scenes to get a better understanding of how these operations work, and to show you how to calculate performance so you can determine the best approach for your query, either cursor or set-based.
We will be looking at the STATISTICS IO and the Query Execution Plan when we execute these queries.
Before we dive into these details, we are going to explore some definitions that you need to know when looking at the Query Execution Plan.
Copyright Information:
The following definitions were taken directly from “Inside SQL Server 2005 T-SQL Programming” by Itzik Ben-Gan.
Estimated I/O Cost and Estimated CPU Cost:
The estimated part of the operator’s cost associated with that particular resource (I/O or CPU). These measures will help you identify whether the operator is I/O or CPU intensive.
Estimated Operator Cost:
The cost associated with the particular operator.
Estimated Subtree Cost:
The cumulative cost associated with the whole subtree up to the current node.
In order to not overload our results window, we need to make sure we get a TOP 1 operation when doing the initial selects. This way we get 1 result set. The results will be slightly skewed but we are only really looking for the basic I/O operations for each type of query.
Before you run these queries, run the following command AND turn on the Actual Execution Plan by hitting Control-M on your keyboard.
SET STATISTICS IO ON
Also, we need to make sure you clear your cache for each type, so we can compare cold cache against warm cache results.
The query for clearing your cache is located in Part 2 of this article.
The Cursor
We are first going to execute the cursor. The query will be the same as our previous cursor query from Part 2 of this article, the only difference is we are going to add TOP 1 to the operation.
DECLARE @keycol AS INT ,@filler AS CHAR(200) DECLARE C CURSOR FAST_FORWARD FOR SELECT TOP 1 keycol ,filler FROM dbo.T1 OPEN C FETCH NEXT FROM C INTO @keycol, @filler WHILE @@fetch_status = 0 BEGIN FETCH NEXT FROM C INTO @keycol, @filler END CLOSE C DEALLOCATE C
When you execute your query, we are going to look at the message window.
Note: your results may vary slightly, however, all in all the results should be very close.
Cold Cache:
Table 'T1'. Scan count 1, logical reads 3, physical reads 2,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
Warm Cache:
Table 'T1'. Scan count 1, logical reads 3, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
As you can see, table T1 did 1 scan, 3 logical reads and 2 physical reads total for the cold cache and the warm cache had the same except for the physical reads because the data was in the cache.
NOTE: If you do not use the FAST_FORWARD Command you will get an ADDITIONAL 3 logical reads from a ‘Worktable’
Logical reads are from memory, physical reads are from disk.
When you look at your execution plan you will see something like the following (this will be the same for both cold and warm cache).
Because we have a clustered index on the primary key and we did not specify actual key columns, we got an Index Scan. If we had specified key columns we would have an index seek (which provides better performance).
Now, the key aspect of this is if you move your mouse over the Clustered Index Scan you will see something like the following:
The scary part about this is the estimated number of rows at 1 million and the Operator Cost and Subtree Cost of 20.5966.
Set-Based Operations
We are going to run our set based operation (query taken from Part 2 but without the loop for demonstration purposes)
DECLARE @keycol AS INT, @filler AS CHAR(200) SELECT @keycol = keycol, @filler = filler FROM (SELECT TOP (1) keycol,filler FROM dbo.T1 ORDER BY keycol) AS D SELECT @keycol = keycol, @filler = filler FROM (SELECT TOP (1) keycol, filler FROM dbo.T1 WHERE keycol = @keycol ORDER BY keycol ) AS D
You will see that the STATISTICS IO is the same on the logical and physical reads. The best part about this is the execution plan:
Here you can see that we only have 1 estimated row versus 1 million and the Operator Cost is significantly lower then before!
I ran the same process using a temporary table and with the results I have created a table to show you the output from the Execution Plan:
|
Cursor |
Set-Based /w Top |
Set-Based /w Temp |
|
|
Estimated I/O |
19.4965 |
19.4965 |
0.006250 |
|
Estimated CPU |
1.10016 |
1.10016 |
0.000336 |
|
Estimated Operator Cost |
20.5966 |
0.0032831 |
0.006586 |
|
Estimated Subtree Cost |
20.5966 |
0.0032831 |
0.006586 |
|
Estimated Number of Rows |
1000000 |
1 |
1 |
As you can see, running the cursor (even with the TOP 1 operator) will still have significant overhead on the machine. Using a set-based operation has less Operator and Subtree cost and significant estimated number of rows. Using the temporary table method, you can see that all numbers are drastically reduced.
So, just by looking at this, you can see that a Set-Based operation using a temporary table for row by row manipulation has the best impact on the machine.
Continue to Cursors: An In Depth Look Part 4
