Nucleus Datakonsult List candidates for sparse columns

Home

About me and Nucleus
Partners

Articles, tips etc.
My blog
Links
Microsoft articles
Forums

Courses and training




Microsoft Most Valuable Professional


Skor
R

Overview
As of SQL Server 2008 you can define a column as SPARSE, which means that a NULL occurence uses 0 byes instead of the size of the defined column datatype. I.e., potential space saving if you have lots of NULLs for that column. The purpose of this procedure is to list candidates for sparse columns.

Versions etc.
SQL Server 2008 and later. See comment block in procedure source code for version history of the procedure.

Details
A quick backgrounder on SPARSE:
  • SPARSE means that a NULL occurance (a row with a NULL) for the column uses 0 bytes instead of the size for the data type.
  • But for rows where you *do* have a value, you instead pay a penalty of 4 bytes fir each occurense.
  • You also pay a penalty of 6 bytes for a row if that row have at least one column defined as SPARSE but that has a value (the "sparse vector").
  • For columns of types char, nchar and binary, SPARSE means that SQL Server doesn't store trailing spaces (or zeroes for binary). They are still there when you SELECT, but the data isn't stored.

Limitations
  • The procedure do not handle partitioned tables and indexes. I just haven't tested the proc with any partitioning and such tables/indexes are likelty to be listed several times, potentially with incorrect space data.
  • Currently, the procedure ignore columns of the following datatypes. These columns can be sparse bus potential space savings are too complex to calculate for various reasons. The types are: varbinary, nvarchar, varchar, sql_variant, bit, hierarchyid, xml.
  • The procedure do not attempt to weigh the cost for the sparse vector, since we would have to look at each row, across all sparse columns to see if there at lease it one column which isn't NULL. Doable, I guess, but not something I wanted to dig into right now.
  • For same reasons as above, I didn't bother to weigh the factor that trailing spaces/zeroes aren't stored for char, nchar and binary.

Parameters:
@MinRowsToBother int
Only list the column if we have more than this number of rows in the table. Optional, with default value of 40.

@PercentagesavingsToBother int
Only list the column if we expect to save at least this much for the column. Optional, with default value of 100.

@NullPercentageIxToBother int
Only list the index as candidate for filtered index if we have at least this number of NULLs for the column in question. Optional, with default value of 10.



The code:
You find the code here.