SQL is a declarative language that returns multi-sets (sets allowing duplicates) of rows. The operations exposed in SQL, such as join, filter, group by, and project, do not inherently have any ordering guarantees. ANSI SQL does expose an ORDER BY clause for the top-most SELECT scope in a query, and this can be used to return rows in a presentation order to the user (or to a cursor so that you can iterate over the results of query). This is the only operation in ANSI SQL that actually guarantees row order.
Microsoft SQL Server provides additional ordering guarantees beyond ANSI, mostly for backwards-compatibility with previous releases of the product. For example, variable assignment in the top-most SELECT list when an ORDER BY is specified is done in the presentation order of the query.
Example:
SELECT @a = @a + col FROM Table ORDER BY col2;
SQL Server also contains a non-ANSI operator called TOP. TOP allows you to limit a result set to a certain number or percentage of the result set. If an ORDER BY is used in the same scope, it qualifies rows based on the ORDER BY. So, TOP 1 … ORDER BY col will return the “first” row from that result set based on the order by list. However, SQL Server does not guarantee that the rows will be returned in that order from the intermediate result set. It only guarantees which rows actually qualify. You’d need to put an ORDER BY at the top of the query to guarantee the output order returned to the client. (In a previous blog entry, I noted how SQL 2005 actually doesn’t bother processing TOP 100 PERCENT … ORDER BY since it is “meaningless” under this definition).
Other operations in SQL Server also have this “which rows qualify” semantic. ROW_NUMBER, RANK, DENSERANK, and NTILE contain an OVER clause in which an ORDER BY can be specified. This order by guarantees the output of the operation, but not the order in which the rows are output. So, the following is valid output for the row_number query – the function outputs row values as if it had been evaluated in a specific order.
SELECT col2, ROW_NUMBER() OVER (ORDER BY col1) FROM T
Col2 Col1 row_num
1 1 1
3 3 3
2 2 2
Now, in practice, we don’t currently generate a plan that would return rows in this order for this specific query. However, different query plans for the same query can return rows in different order, so it is important to understand what is guaranteed and what is not. When you build an assumption in your SQL application during development about one query plan, then deploy it and a customer with their own data in the database gets a different plan, you can very quickly learn about this dependency the hard way. (A more plausible possibility for this example is that we could assign the row numbers backwards if we knew the exact rowcount from a previously completed operation in the SQL query plan).
Essentially, the SQL Server Query Optimizer is guaranteeing that the internal operator in the query tree will process its input in a particular order. There is no corresponding guarantee that the output of that operator will imply that the next operator in the query tree is performed in that order. The reordering rules can and will violate this assumption (and do so when it is inconvenient to you, the developer ;). Please understand that when we reorder operations to find a more efficient plan, we can cause the ordering behavior to change for intermediate nodes in the tree. If you’ve put an operation in the tree that assumes a particular intermediate ordering, it can break.
Thanks,
Conor