Statistics are very important in the query optimization. This post will give you some basic idea of stats and how they are created and used. This post is to link with my performance tuning parts.
What is statistics:
Statistics store information about the distribution of data in the column/columns for a table. It has Header, Densities and Histogram. The Histogram stores only 200 steps, and for very large tables VLT, it is hard to hold much data only can hold up-to 200 steps, you have to create manually for the VLT. It is a lightweight and use very low storage.
The only way to know the underlying tables from the Query Optimizer is by using statistics, to estimate how many rows will be returned to the query plan.
How they are creating:
-- step 1 use master alter database stats_check set single_user with rollback immediate drop database stats_check create database stats_check -- USE master GO ALTER DATABASE [stats_check] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT GO -- step 2 USE stats_check create table tbl_stats_check (id int, name varchar(20), dept varchar(20), age int,join_date datetime) insert into tbl_stats_check values (1,'a','IT',30,'2018-10-25') insert into tbl_stats_check values (2,'b','Civil',35,'2017-10-25') insert into tbl_stats_check values (12,'x','Eng',45,'2013-10-25') -- step 3 -- run this to know the -- stats info SELECT s.name, s.stats_id, s.auto_created, s.user_created, c.name as colname, s.no_recompute,'dbcc show_statistics (''dbo.tbl_stats_check'','''+s.name+''')' AS Tablename_show_statistics FROM sys.stats AS s JOIN sys.stats_columns as sc on s.stats_id=sc.stats_id and s.object_id=sc.object_id JOIN sys.columns as c on sc.object_id=c.object_id and sc.column_id=c.column_id WHERE s.object_id=OBJECT_ID('dbo.tbl_stats_check') --and s.auto_created = 1 ORDER BY sc.stats_column_id; GO -- step 4 select * from tbl_stats_check where id =1 select * from tbl_stats_check where dept ='it' and age =30 -- run the stats info select again (step 3) -- drop the database and run stpe 1 skip the step 4 -- run step 5 and check the stats info select again (step 3) create clustered index cx_tbl_stats_check_id on tbl_stats_check(id) create index cx_tbl_stats_check_dept on tbl_stats_check(dept,age) dbcc show_statistics ('dbo.tbl_stats_check','cx_tbl_stats_check_dept') dbcc show_statistics ('dbo.tbl_stats_check','_WA_Sys_00000004_0EA330E9') dbcc show_statistics ('dbo.tbl_stats_check','_WA_Sys_00000003_0EA330E9') -- The stats created for auto is creating single stats for each column, from the index it is a combination of columns distribution UPDATE statistics Stats_Check.dbo.tbl_stats_check (cx_tbl_stats_check_dept) with fullscan--,NORECOMPUTE
Creating an index – statistics are created automatically created for column/columns. For more than one column of composite index, statistics are only created for first column in the histogram.
By Default AUTO_CREATE_STATISTICS is ON for database – When query is running by where clause or join clause it gets created.
We can manually create it. https://docs.microsoft.com/en-us/sql/t-sql/statements/create-statistics-transact-sql?view=sql-server-2017
Query example: You can run and see how it is created by where clause (By sampling) and index (By Full scan).
How the query using them:
You can see three different categories in the results.
- Head – will have the information of the statistics. When it is last updated, rows and rows sampled – if this differs the stats are not with full scan. Steps – number of steps etc.
- Density – Shows the density values for the column/columns.
SELECT 1.0 / COUNT(DISTINCT Column) FROM Table;
- Histogram – The table data that divide further as histogram
The histogram will only store 200 steps, this is another problem for very large tables.
This is where Query Optimizer will calculate how many rows will be returned for a query, based on that it will create a plan. To create a most accurate plan, we need to have a updated stats.
Take a look on this: https://jbswiki.com/2017/01/31/what-dbcc-show_statistics-tells-me/
Update statistics regularly:
SQL automatically update stats, When 20% + 500 are changed in a table.
The rebuild index will update stats on index column with full scan not for non-index columns.
We can use UPDATE STATISTICS command or can schedule GUI, I prefer to use OLA script.
sp_updatestats – Generally, we will not use this, since it is replaces a full scan stats with random sampling.
Some larger tables, we can use manual statistics update by using NORECOMPUTE option.
Update statistics schedule will be based on the data changes and application mostly weekly once, some application needs a daily stats update.