SQL Server Fragmentation Types
what is fragmentation is and the different types of fragmentation that exist in SQL Server?
Fragmentation by and of itself in the SQL Server data storage world could be described as some variation of non-contiguous data, whether logical or physical. This applies to how it relates to data page fragmentation (including clustered indexes, heaps, and non-clustered indexes). Fragmentation comes in a few different flavors. There is really no defined terminology (that I know of) for the different types, so we will use 3 terms for these types:
We will use Logical fragmentation to refer to the most common type of fragmentation – leaf pages of indexes becoming physically unordered within a given file so as to no longer match the logical left-to-right linked-list ordering of the pages. This type of fragmentation (as used here) does not relate to the physical pages being contiguous (i.e. back to back, side by side, etc.) within the given file, but only to the physical order of the pages within the file compared to the logical order in the linked list. The difference between contiguous pages and properly ordered pages is that to be contiguous, the pages must exist physically in side-by-side positions within the file (i.e. pages 1,2,3,4,5,6,…n) with no gaps in between. To be properly ordered, the pages must exist simply in logical order and on ever-increasing physical positions within the file (i.e. pages 10, 17, 22, 23, 28, 42,…n). Contiguous pages (for purposes of this post) are always also properly ordered pages, but properly ordered pages aren’t necessarily contiguous.
Logical fragmentation also encompasses page density (how full a page is). Given that the page is the basic unit of storage for SQL Server, it is also the basic size of data stored in your data cache (a subset of your buffer pool, usually the largest subset); therefore, the more dense the pages are, the less cache that is either used or needed to read the same amount of data. The less dense the pages, the more cache that is needed, the more pages that need to be read/inspected, and the more pages that are needed to store the same amount of data. This particular portion of Logical Fragmentation can actually be quite interesting because it is the only kind of fragmentation that can actually be helpful in some cases. It is also the only kind of fragmentation that will cause performance degradation for data that is in cache and not on disk (all other types of fragmentation only impact performance when actual disk IO is incurred. Once the data is in cache, all other types of fragmentation are totally irrelevant). When referring to reading data or selecting data, Logical Fragmentation (like all others) is bad, for all the reasons mentioned above. However, since most database systems aren’t read-only in nature, at some point data needs to be written to the database. When new data is written to the database and it needs to be placed in a particular location within an index (determined by the index key(s)), or if an existing row is updated to contain larger variable-length column values, that data will need space to reside within the appropriate logical location in the index. If free space exists on the page where the existing record already resides, or on the page where a new record will be inserted into, then the new data is simply written to that page and life is good. If the page is completely full (or full enough to not have the required amount of space for the new data), then a page split will occur, a new page will be created, and then the new data will be written. This latter type of write operation will be significantly slower than the prior write operation where free space existed on the page. This is the case where purposely having a lower page density and actually introducing some fragmentation into the system will help. This is also the only kind of fragmentation that has a possible up-side.
To show some examples, the following graphic displays a perfectly contiguous, perfectly logical ordered index structure:
This next graphic displays what the same structure might look like with pages that aren’t full, and aren’t logically ordered, and also aren’t contiguous:
This terminology refers to the fragmentation of the actual physical location of extents/pages within SQL Server data file(s). This is typically called Extent Fragmentation due to the fact that extents are the primary allocation unit for SQL Server data, and the majority of the time allocation of space within a file will occur as a full extent, which is made up of 8 contiguous pages. Therefore, the majority of the time, this type of fragmentation manifests itself as interleaved extents that are allocated to different structures. The pages within each of these extents are mostly contiguous. If you go back to the discussion on the differences between properly ordered pages and contiguous pages, this type of fragmentation is referring to properly ordered pages that are not contiguous, not extents that are not contiguous. Given that space is allocated in extents and not pages, much of the time it can be easy to think of this type of fragmentation as only referring to out of order extents.
The extent fragmentation that is most common occurs when data is being inserted into the database within multiple different tables. Extents are being allocated for these insertions within a given data file in a round-robin fashion (i.e. extent #1 goes to index #1, extent #2 goes to index #2, extent #3 goes to index #3, then extent #4 goes to index #1, then extent #5 goes to index #3, etc., etc.). This results in properly ordered pages for the given indexes (i.e. no logical fragmentation), but not in contiguous pages, since the pages are interleaved with each other by nature of the extents being interleaved. You end up with 8 contiguous pages, an extent for another object(s) and then 8 more contiguous pages, etc., etc. It is possible to end up with extents that look like the following within the SQL data file(s):
While this describes a common form of Extent Fragmentation, the term “Extent Fragmentation” can be a bit deceiving since it implies that it is only looking at the “extent” level of storage. Since pages in a linked list don’t necessarily have to exist on the same extent, even those that are previous/next in the linked chain, it really comes down to where the pages exist. You could make the argument that in that case, it still has to “jump” between 2 different extents, which is true â€“ as long as you realize that you can have Extent Fragmentation even if all the extents for an object are contiguous but the pages within them are not.
Extent Fragmentation is really just another form of Logical Fragmentation, since it really only manifests itself within the SQL Server data file in which SQL Server interacts with via standard APIs. A SQL Server data file is the container for data that SQL Server knows about, however that data file may reside in all kinds of different sectors on the actual spindles. SQL Server really has no direct control over where new page allocations come from as they reside on-disk, especially with the adoption of SAN systems in today’s environments. Though SQL does do everything it can to try and ensure contiguous data on a given disk, in many systems this isn’t even possible. Â (Think of SANs where a request for an extent of space can come from anywhere the SAN software sees fit, which is most likely from a variety of spindles.) Additionally, think of the case where file-growths occur on a SQL data file from the time the data file was initialized to the current state. It may have gone through many, many file-growths, and in between each of these file-growths, the storage system has allocated space for lots of other reasons, applications, etc. In this case, the actual data physically on-disk is most likely to be spread all over the place.
File-Level Fragmentation is your typical fragmentation of data blocks as they reside physically on-disk. SQL Server has limited control over this type of fragmentation. Much of it is controlled by the disk subsystem and our decision on where to place files, how many to have, choice of storage, etc. All SQL Server can actually do is request space for the given data file(s) when it is told to do so, or when it needs to do so. These requests are then passed to the appropriate APIs, storage drivers, etc. that handle the allocation of new space on-disk. The best SQL can do is request this space be contiguous â€“ assuming you are using a single spindle with a single file. Once you start introducing things like RAID, SANs, etc., you begin to tread into a territory where it is almost out of SQL Server’s control. That doesnâ€™t mean that the introduction of these types of systems is bad, after all, RAID and SANs are both good technologies that are very good at what they do. RAID is all about spreading data across multiple spindles for either performance improvements, data protection, or a combination of both. If data is being spread across multiple spindles, then you are not going to have contiguous data on a single spindle. The benefit here is that multiple spindles can service more requests than a single spindle, and they give you data protection. SANs are even more complex in that they implement RAID (in quite a few varieties) and provide management capabilities, options, etc., etc., making RAID even more complex in understanding exactly where your data resides on-disk. A very simple example of File-Level Fragmentation can happen when running SQL on a laptop with a single LUN that is supported by a single spindle. In this scenario it is a bit easier to understand that you simply have the SQL Server data file(s) stored in blocks on this disk, and if you pre-allocate the appropriate space for the database supported by the file(s), and the contiguous space on disk is available, then you’ll most likely end up with a very contiguous file on-disk. If you took the disk and sliced it into blocks, a single disk with File-Level Fragmentation for SQL Server data files might look something like this:
This has summarized the different types of fragmentation at a high-level.