SQL Server Storage Basics and Access Methods

Posted on: February 8th, 2012 - Written by: Stephen Marshall

SQL Server Storage

In reviewing some SQL Server Storage Basics and Access Methods, we will cover:

  1. Storage characteristics
  2. Clusters, Heaps, B-Trees, etc.
  3. Access Methods

This post will cover some basics in terms of storage and access methods.

Pages

A Page in SQL Server is the primary storage structure for all data (data, indexes, BLOB/CLOB/LOB, row-overflow/SLOB, etc.). Pages are 8k in size and store records in no particular order (row offset is used to find each logical record on the page) – a very basic structure is as follows:

Extents

An Extent in SQL Server is the primary allocation structure (i.e. what is allocated to objects when they request more space to store stuff). Extents are made up of 8 contiguous pages (64k) and can be either shared or uniform in nature. This means anytime new storage space is needed for data, or an index, or a heap, it is allocated a single, full, contiguous extent (8 contiguous pages). The only exceptions to this are when space is allocated for a new object and the 1118 trace flag is not enabled, or when you have the -E startup option enabled, or if space is being allocated for a very small object that is less than ~64k in size.

HoBT

HoBT (pronounced “hobbit”) stands for “Heap or B-Tree” and is the logical name for how all user data is stored in SQL Server (either in a Heap or a B-Tree).

Heap

A Heap is basically a bunch of pages with no particular structure (aside from that of a page/extent), no order, no linkage. If you have a table that has no Clustered Index defined on it, then the data for that table is stored in a Heap. Heaps are logically a flat structure (i.e. they are made up of only data pages, no root/intermediate pages). Given that there is no order, structure, or page linkage in a Heap, there is no direct support in a heap for singleton lookups or range scans.

B-Tree

A B-Tree (balanced-tree) is a balanced, structured object, and it is used for all indexes in SQL Server (Clustered, Nonclustered, XML, etc.). Think of a B-Tree as a triangle-shaped structure of pages – it has a single root page at the top of the triangle with pointers to the next level down the triangle (which is wider than the root level by the number of pages that the root page can contain pointers to), each of those pages contain pointers to the next level, and so on (these are called intermediate pages) down the triangle until you get to the bottom of the triangle, which is the widest and referred to as the leaf level of the structure. In ASCII x marks, a B-Tree would look like the following if it contained 2 intermediate levels where each page in the root and intermediate levels could each could point to 3 other pages:

x
xxx
xxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx

What resides at the leaf level of the structure depends on what type of index the B-Tree supports. If it is a clustered index, then the leaf level of the index contains all the data for all the columns for all the rows of the table; if it is a non-clustered index on a clustered table it contains the non-clustered index keys, the included column values, and the keys for the clustered index (uses those as a pointer to the rest of the data); if it is a non-clustered index on a Heap table, it contains the non-clustered index keys, the included column values, and the physical row-id value for the record in the Heap (uses this to get a pointer to the rest of the data).

Data at the leaf-level of a B-Tree is logically ordered in order of the index key – this applies to both clustered and non-clustered indexes. It is a myth that the data is physically sorted this way. This is exactly what logical fragmentation is: The logical ordering of pages in an index not matching the physical layout within the file. This logical ordering is achieved via the “row offset”/”slot array” for records on a page, and via a doubly-linked list for pages in an index. A doubly-linked list is basically a pointer in the header of each page that points to the prior logical page and the next logical page in the index.

Seek

A Seek is an efficient access method that can be fulfilled using an index structure. Seeking touches fewer pages than scanning, and can only occur on an index of some type. Think of a Seek as what you would do with a phone book if you were told to find the phone number of someone with last name Boyd and the first nameChad.

Scan

A Scan is basically an access method whereby all data, or some range of data, in an index or heap must be touched or retrieved in order to fulfill a request. Think of a Scan as what you would do with a phone book if you were told to find the phone numbers for everyone in the book, or the phone number for all people with the first nameChad, or for someone with a phone number of (555) 555-5555.

Singleton Lookup

A Singleton Lookup is a seek operation whereby a single record/page of data is retrieved. This can occur for example when you perform a query that needs to access only a single record, or a few records from a single page. The navigation through a B-Tree in a Singleton Lookup would be similar to the following diagram:

Full Scan – Key Ordered

A key-ordered Full Scan is a scan operation whereby all the leaf pages of a given B-Tree structure are retrieved by following the page linkage (i.e. starting with the first page in the leaf level and following to the next page via the doubly-linked list, and so on and so on). Note that this type of scan can only be performed on a B-Tree structure and NOT against a heap. This occurs for example when you perform a query that must return all the rows and columns in a clustered table, or only a few rows but the filter predicate has no index to seek with, or against a non-clustered index if you want all rows for a table but only values from columns that are covered by the non-clustered index. (navigational diagram follows under Range Scan)

Range Scan

A Range-Scan is a scan operation whereby a range of pages are retrieved – think of this as a full-scan with boundaries. The navigation through a B-Tree in a Range Scan would be similar to the following diagram (a Full Scan operation would be exactly the same, only the entire leaf would be scanned instead of just a portion of it):

Full Scan – Allocation Ordered

An allocation ordered Full Scan is the same as a key ordered Full Scan, only instead of scanning the data via page linkage, the pages from the leaf of the index (or from a Heap) are scanned via page ID values taken from system pages that track which pages are allocated to the given index/heap (pages that track this are called IAM pages). Basically all the physical page ID values are gathered for pages allocated to the given index/heap, and then those pages are retrieved directly – even in a B-Tree structure, the root/intermediate pages for the index would not be touched at all (no reason to do so, since the physical page ID values take you directly to the leaf pages needed). This is the ONLY type of operation that is directly supported against a Heap (though seeks and range-scans can occur indirectly via non-clustered indexes that are built against the Heap table). The navigation through a B-Tree in this type of scan would be similar to the following (notice that the root/intermediate levels aren’t touched). Navigation through a Heap would be similar as well, only a Heap wouldn’t include root/intermediate pages at all in the structure:

SQL Server Read-Ahead

Read-Ahead is an access method mechanism that attempts to intelligently pre-fetch data that resides on-disk into the data cache prior to it being needed for use by the CPU – this is done to try and optimize the IO throughput of a system in order to keep the CPU as busy as possible without waiting on the slower IO system to get needed data. This type of operation is typically reserved for scans of data that fetch medium/large amounts of pages/data, and can issue 1,8,32, or 64 page IOs in a single IO operation – the size of operation that can be performed is very heavily impacted by the contiguity of the data (the more contiguous the data, the bigger the IOs that can performed, the better performance you see). There are 2 kinds of read-ahead operations, 1 that works with allocation ordered scans, and 1 that works with key-ordered scans.

With allocation ordered scan, the IAM pages in the database list the extents used by a heap or index – the engine will read the IAM and build a sorted list of the physical disk addresses that must be read, allowing the engine to optimize it’s IOs as large sequential read performed in sequence based on their location on disk, vs. multiple small, random reads.

With key-ordered scans, the engine uses the information stored in the intermediate index pages 1 level above the leaf level to schedule serial read-aheads for pages that contain the keys found. If a request is made for example for all keys from 1 to 100, the engine will first read the index page above the leaf page for key 1 (on it’s way to traversing to the leaf page); however, instead of simply reading each leaf page in sequence from page 1 to page 100, the engine scans the intermediate level page and builds a list of the leaf pages that must be read to get pages 1 thru 100, then schedules all reads in key order – in addition, the engine will recognize if pages are contiguous and perform a single read to retrieve the adjacent pages in a single operation as opposed to multiple, smaller operations. A similar type of operation is used to pre-fetch data from a base cluster or heap when scanning through a non-clustered index – since the leaf rows of a non-clustered index contain pointers to the data rows in the cluster/heap structure, as the storage engine reads through the leaf of the non-clustered index, it also starts scheduling asynchronous reads for the corresponding data rows whose pointers have already been retrieved. This allows the engine to fetch data efficiently from the underlying cluster/heap before it has completed the scan of the non-clustered index.

The navigation for a Read-Ahead in a key-ordered scan would look something like the following:

This covers the basics around storage, allocation, and access methods in regards to what we need to understand when talking about Fragmentation. In future posts, look for more about fragmentation.

Tags: ,

Leave a Reply