•  
  • DBA (21)

TempDB database is Full and Optimization

What is TempDB and best practice for TempDB

TempDB is the system database and it is per instance. It is a common and shared by all other databases. All the temporary activities are done here and yes, definitely the TempDB will become full and occupy more space depends on the temporary tasks, which we are running. There are many activities can happen in tempDB.

Best practice, create a TempDB in separate disk with the estimated initial file size, those are old days and now most of us using disk array. The spindles and HDDS/SSDs are striped through RAID and shared across LUNs and pools, check with your infra team about the disk configuration from the storage. How the disk is mounted or presented to the server. Is it a dedicated drive (commonly in DAS) or a disk array (commonly in SAN). DAS or SAN, SAN is most common industry standard that is being used. We had both DAS and SAN, it depends on the database application. Definitely a RAID level has been used, check with that as well. A common recommendation will be RAID 1+0, which is costly. It is always good to know, what kind of storage and storage vendor, we are using it to plan and test that perfectly. Some servers we use advanced Automated Tiered Storage, which is good some without tired etc. DBAs have limited knowledge in the storage, but it is good to learn from our SAN & VM admins :-).

Issue 1:

What are all the activities are done in Tempdb and which is occupying the space

The list is very big (User Objects, Internal Objects & Version Stores) are stored in temporarily in tempDB.  Read the TechNet article for more.

If it gets full often, we need to capture the tasks that are hitting to tempDB and need to plan accordingly. 

Find out the Tempdb physical file usage

SELECT SUM(unallocated_extent_page_count) AS [free pages],
SUM(unallocated_extent_page_count
+ user_object_reserved_page_count
+ internal_object_reserved_page_count
+ mixed_extent_page_count
+ version_store_reserved_page_count) * (8.0/1024.0/1024.0) AS [Total TempDB SizeInGB]
, SUM(unallocated_extent_page_count * (8.0/1024.0/1024.0)) AS [Free TempDB SpaceInGB]
,unallocated_extent_page_count
,user_object_reserved_page_count
 ,SUM(version_store_reserved_page_count  * (8.0/1024.0/1024.0)) AS [version_store_GB]
,internal_object_reserved_page_count
,mixed_extent_page_count
FROM tempdb.sys.dm_db_file_space_usage
--where [FreeTempDBSpaceInGB]>50
group by unallocated_extent_page_count,user_object_reserved_page_count,internal_object_reserved_page_count,mixed_extent_page_count;

We need to work based on the result from the above code, like version store or internal objects etc.

TempDB DMV:

Monitor the disk space used by the user objects, internal objects, and version stores in the tempdb files.

sys.dm_db_file_space_usage – Returns space usage information for each file in the database.

Allocation or deallocation activity in tempdb at the session or task level

sys.dm_db_session_space_usage – Returns the number of pages allocated and deallocated by each session for the database.

sys.dm_db_task_space_usage – Returns page allocation and deallocation activity by task for the database.

These views can be used to identify large queries, temporary tables, or table variables that are using a large amount of tempdb disk space.

For more:TechNet article diagnosing tempdb Disk Space Problems

 

Useful queries to find out who is using my TempDB

Following is code originally from Gianluca Sartori and Deepak Biswal. I just copied here. The MSDN article has more code, have a look at it and use the same based on your case. This code really cool and helped me a lot.

The following query will only show the Active request joining from sys.dm_exec_requests DMV. If the query finished, you cannot get that by using this.

;WITH task_space_usage AS (
-- SUM alloc/delloc pages
SELECT session_id,
request_id,
SUM(internal_objects_alloc_page_count) AS alloc_pages,
SUM(internal_objects_dealloc_page_count) AS dealloc_pages
FROM sys.dm_db_task_space_usage WITH (NOLOCK)
WHERE session_id <> @@SPID
GROUP BY session_id, request_id
)
SELECT TSU.session_id,
TSU.alloc_pages * 1.0 / 128 AS [internal object MB space],
TSU.dealloc_pages * 1.0 / 128 AS [internal object dealloc MB space],
EST.text,
-- Extract statement from sql text
ISNULL(
NULLIF(
SUBSTRING(
EST.text,
ERQ.statement_start_offset / 2,
CASE WHEN ERQ.statement_end_offset < ERQ.statement_start_offset THEN 0 ELSE( ERQ.statement_end_offset - ERQ.statement_start_offset ) / 2 END
), ''
), EST.text
) AS [statement text],
EQP.query_plan
FROM task_space_usage AS TSU
INNER JOIN sys.dm_exec_requests ERQ WITH (NOLOCK)
ON TSU.session_id = ERQ.session_id
AND TSU.request_id = ERQ.request_id
OUTER APPLY sys.dm_exec_sql_text(ERQ.sql_handle) AS EST
OUTER APPLY sys.dm_exec_query_plan(ERQ.plan_handle) AS EQP
WHERE EST.text IS NOT NULL OR EQP.query_plan IS NOT NULL
ORDER BY 3 DESC, 5 DESC

The following query will show the session space usage. If the session is closed, you cannot get that by using this. 

SELECT DES.session_id AS [SESSION ID],
Db_name(DDSSU.database_id) AS [DATABASE Name],
host_name AS [System Name],
program_name AS [Program Name],
login_name AS [USER Name],
status,
( user_objects_alloc_page_count * 8 ) AS
[SPACE Allocated FOR USER Objects (in KB)],
( user_objects_dealloc_page_count * 8 ) AS
[SPACE Deallocated FOR USER Objects (in KB)],
( internal_objects_alloc_page_count * 8 ) AS
[SPACE Allocated FOR Internal Objects (in KB)],
( internal_objects_dealloc_page_count * 8 ) AS
[SPACE Deallocated FOR Internal Objects (in KB)],
cpu_time AS [CPU TIME (in milisec)],
total_scheduled_time AS
[Total Scheduled TIME (in milisec)],
total_elapsed_time AS
[Elapsed TIME (in milisec)],
( memory_usage * 8 ) AS [Memory USAGE (in KB)],
CASE is_user_process
WHEN 1 THEN 'user session'
WHEN 0 THEN 'system session'
END AS [SESSION Type],
row_count AS [ROW COUNT]
FROM tempdb.sys.dm_db_session_space_usage AS DDSSU
INNER JOIN sys.dm_exec_sessions AS DES
ON DDSSU.session_id = DES.session_id
ORDER BY [space allocated for internal objects (in kb)] DESC

If you find the session ID, which is using more temp space. Pass the session ID to find the code which is taking more TempDB. One of my other case, A poorly written query used more than 100GB of space.

SELECT TEXT
FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
WHERE session_id = (181)

 

Issue 2:

How to solve the Tempdb contention and improve the performance

This is another one issue with tempDB that, we generally get.

Easy way to find the tempdb contention by using Whoisactive and more workout for TempDB.

You can also see the overall system wait type.

Latch contention can occur in tempDB allocation pages of GAM, SGAM and PFS. It is a common one, but when we have a lot of hits, it creates a performance issue, since the wait queue increases.

If you see a lot wait_type is PAGELATCH or PAGEIOLATCH with tempDB: PFS, GAM and SGAM ([wait_type]:[database_name]:[file_id](page_type)), then we have contention that needs to be fixed to improve the performance.

 

We can fix the contention by creating more data file with equal size, more data file will give more allocation pages (GAM, SGAM and PFS per data file).

If you cannot create a file equal size, since the one file can be very big, we can use trace flag 1117. It forces other files in the filegroup to grow it. It applies to other databases as well.

The easiest way to alleviate tempdb allocation contention is to enable trace flag 1118 and to add more tempdb data files. 

http://www.sqlskills.com/blogs/paul/correctly-adding-data-files-tempdb/

 

The number of data files is it depends. Recommendation from Paul Randal.

Then if you have less than 8 logical cores, create the same number of data files as logical cores. If you have more than 8 logical cores, create 8 data files and then add more in chunks of 4 if you still see PFS contention. Make sure all the tempdb data files are the same size too.

 

Temporary and permanent fix

We can reduce the file size by shrinking the files. Try to shrink the files first, if it is not shrinking, free the procedure cache and shrink again. You can FREEPROCCACHE more than one time until, you see or want to reduce space used by files.

Note: Freeproccache will clear the procedure cache and will cache the data newly.

It is a temporary fix and it does not need a recycle of SQL service. Shrinking the tempDB is fine, but not more frequently, since, it may leads an external disk fragmentation.

USE TEMPDB;
GO
DBCC SHRINKFILE (TEMPDEV,10000) -- Initial size with 10 GB
DBCC FREEPROCCACHE
DBCC SHRINKFILE (TEMPDEV,10000)

When we run more FREEPROCCACHE with a shrink, sometimes you can see the following error, Not sure the error looks like an internal file allocation, it can be fixed by increasing the file size + some MB. Ex: 1000MB file can increased by 1005MB.

File ID 1 of database ID 2 cannot be shrunk as it is either being shrunk by another process or is empty.

Msg 0, Level 11, State 0, Line 0

A severe error occurred on the current command. The results, if any, should be discarded.

Sometimes the free cache will not help, when you do not have a free space in the file. Use the following query and check the free space.

USE [tempdb]
SELECT
[name]
,CONVERT(NUMERIC(10,2),ROUND([size]/128.,2)) AS [Size]
,CONVERT(NUMERIC(10,2),ROUND(FILEPROPERTY([name],'SpaceUsed')/128.,2)) AS [Used]
,CONVERT(NUMERIC(10,2),ROUND(([size]-FILEPROPERTY([name],'SpaceUsed'))/128.,2)) AS [Unused]
FROM [sys].[database_files]

Another case, I had a row versioning enabled for the database which prevent the TempDB shrinking, and disabled it temporarily and shrink it. It helped me.

--check which dbs are in snapshot isoation mode
select name,is_read_committed_snapshot_on,snapshot_isolation_state, snapshot_isolation_state_desc,* from sys.databases
--where is_read_committed_snapshot_on =1
order by 1
--Disable snapshot isolation
alter database DBname set READ_COMMITTED_SNAPSHOT off with rollback after 30 seconds
go
use tempdb
go
dbcc shrinkfile (tempdev, 10000) --Shrink with 10GB
go
select (size*8)/1024.0 as FileSizeMB from sys.database_files --check new size
go
--Enable snapshot isolation
alter database DBname set READ_COMMITTED_SNAPSHOT on with rollback after 30 seconds
go

The permanent fix is hard one, we need to get the query or tasks, which all are hitting the tempDB and need to tune those. In my case, it is an 8 TB database and when the predefined maintenance plan runs, it occupies all the tempDB space. I used different method Ola Hallengren’s script. It was a fantastic script, we can have more control than the maintenance plan. I have skipped the non- clustered index and some historical VLT table, which is not a critical one. It reduced the tempDB space and job run time as well. I had a small contention and I did not create more data files.

Note: My case is different and the very large table is total dump data. I removed from checkDB. Make sure, before you remove any of your tables. Skipping non-clustered index is not always good, instead you can run split checkDB: http://www.sqlskills.com/blogs/paul/checkdb-from-every-angle-consistency-checking-options-for-a-vldb/

In another case, one of the developer code uses all tempDB space and it has more contention as well and yes it got reduced after creating a more data files. Use the whoisactive and other script to find out the exact issue and fix based on what you have in your the system. Since, I mixed with two more issues in this post.

 

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Performance Tuning Series – Main Part

 

Performance tuning is a tricky part. We need to analysis all points of view. Sometimes it is an easy fix and can be solved in a minute and sometimes it take a day or more to find and fix.

I received many emails for performance tuning help. It has a variety of questions, that how can we tune a query and how to find out what is going on in the server, since my application team is does not give a slow running code. The query only slowdown in some time period. My server is banging all time above 95% of resources.

Nowadays, every business has a tool that tiger an alert that resource (CPU, RAM) goes beyond the threshold, that’s mostly from server/Infra team and it’s easy to pull a month or a year report from the server and can ask DBAs, why would the server running always with high resources for the past few days/ months.

It’s always a changeling job for DBAs to tune their system, we even keep all maintenance jobs like re-indexing and statistics update OR adding more RAM, CPU will not help, there are many things can come up. All we need to know, how to gather the information from server, database and code point of view and analysis them one by one.

Most of the time poorly written and designed query and schema will be the issue. Why I am saying so, you will come to know after reading this post.

(I mean the poorly build code is, it is not written in a set based, bad schema design, no indexes used in the where, join clause, pulling all the data even if it has a some column for filtering, high fragmentation, out of stats & parameter sniffing etc.)

We need to use techniques and need to capture the exact issue.

 

To cover this post, will make me to write some other post as well. Please refer the following post as well to get some more idea on the tuning.

Capturing the SQL server queries by using Whoisactive

Capturing the SQL server queries by using server side trace  (Upcoming post: The link will be updated)

Tune query by looking execution plan and effective of the index creation (Upcoming post: The link will be updated)

 

The technique I/we DBAs mostly use:  F&F – Find and Fix.

  1. Whoisactive powerful script – This will capture everything, what are all running on the server/ DB side, if we did not have much information from the application team use this.
  2. SQL server execution plan – Once, you have found the code, which is taking more time, go and run in the SSMS with actual execution plan and see what is going on behind the screen. Even, the Whoisactive has plan info, if you pass the @get_plans = 1 parameter.

I am classifying in three parts to get a clear picture on this.

  1. Server level analysis and tuning
  2. Database level analysis and tuning
  3. Query level analysis and tuning

Before Whoisactive and sp_blitz, I tend to check perfmon and the size of the database, server and database level parameter changes, more DMVs etc. The two scripts give more control to DBAs.

There is no rule that, we should check first something and second something so on. Like (Execution plan, index used or not, fragmentation, count (*) table, stats ect..)

It’s always depends on environment. In some cases, the query might run for more than 10 hours or SP has 100 lines of code or the query might behave differently in runtime etc. We might take a help of an estimated plan to know the query without actual run OR take a look at query line by line.

 

Server and Database level analysis and tuning

More or less the server and database level checks are same. In the deeper it will take to query level only. That is what we keep saying the best practice of t-SQL writing will help the system at the top.

In common, we mostly get unknown parameter, I mean we need to find why the resource usages are peak, without having any information.

 

  • Check the SQL server error log

EXEC Sp_readerrorlog 0

The error log is important to check, which logs all the information.

This is a fantastic script from Paul Randal, which shows the overall wait type percentage and additionally the explanation of each wait URL.

 

WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
-- Maybe uncomment these four if you have mirroring issues
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
-- Maybe uncomment these six if you have AG issues
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT',
N'ONDEMAND_TASK_QUEUE',
N'PREEMPTIVE_XE_GETTARGETSTATE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'PWAIT_DIRECTLOGCONSUMER_GETNEXT',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_RECOVERY',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)
SELECT
MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S],
CAST ('https://www.sqlskills.com/help/waits/' + MAX ([W1].[wait_type]) as XML) AS [Help/Info URL]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95; -- percentage threshold
GO

 


See the results: SOS_SCHEDULER_YIELD – 34%, CXPACKET – 13% waits, Do you think which means we need to add more CPU, Since the wait type indicates there is a high CPU, No, it does not mean that we need to go and add more CPU, It means further analysis needed to find the query which is waiting for those waits and why it is waiting and how can we make that do not wait. Yes, it can be tuned and the wait type will automatically reduce.

  • Check the currently running sessions
Use Exec Whoisactive OR DMVs 
select db_name (r.database_id),r.command,r.wait_resource,s.status,s.session_id,
r.blocking_session_id,r.last_wait_type,r.open_transaction_count,
* from sys.dm_exec_sessions s join sys.dm_exec_requests r
on s.session_id =r.session_id
where s.session_id >=51 and s.status <>'sleeping'and s.session_id <>@@SPID
--and r.blocking_session_id<>0

 

  • Check the server and database level parameter default and changes

Use, Brent Ozar’s script https://www.brentozar.com/blitz/

It is an awesome script, which will help us to quickly analysis the server and database level non default parameter checks with suggestion etc. They have improved a lot. Please check the link and provide the email address and other details. They will send you the ZIP file location to your email address.

 

  • Check the database level running query by DMVs OR Whoisactive

The best way is to use Whoisactive or there are many DMVs, just do some googling.

  • Use the server side traces to capture the information of particular events

Upcoming post: I am going to write, how to collect the queries by using server side trace.

 

Query level tuning and code re-write

Query level tuning is more important, most of the time poorly built queries eats up all the resources, even if you have loads of CPU and RAM and no fragmentation and with updated stats.

 

There are some cases that, we get in real time we need to speed up the code without an actual run.

The following step is not exactly same that, you should follow in all the time. I just used this to show more information for this particular post.

I captured this query by using “Whoisactive”. And there are many like this. And all cannot be shown here. See the wait type and it is almost matching with server over all wait.

I just took the third query, see how it performs and can be tuned and rewrite better.

It is a simple union all with join two tables. It does not have a clustered index, since there is no candidate/ high selectivity column for a clustered index.

 

SELECT DISTINCT j.* FROM
(SELECT temp81.*,[54-80_table].[Wafer Sort Test Program (High Temp)], [54-80_table].[Wafer Sort Test Program (High Temp)_REV],
[54-80_table].[Wafer Sort Test Program (Low Temp)], [54-80_table].[Wafer Sort Test Program (Low Temp) Rev]
FROM temp81 LEFT OUTER JOIN [54-80_table]
ON temp81.[54-80] = [54-80_table].Component
WHERE [54-80_table].[BOM_STATUS]  IN (‘ACT’,’C-ACT’,’CPROD’,’PROD’,’PPROD’,’QUAL’,’RISK’)

UNION ALL

SELECT temp62.*,Null,Null, [54-80_table].[Wafer Sort Test Program (High Temp)], [54-80_table].[Wafer Sort Test Program (High Temp)_REV],
[54-80_table].[Wafer Sort Test Program (Low Temp)], [54-80_table].[Wafer Sort Test Program (Low Temp) Rev]
FROM temp62 LEFT OUTER JOIN [54-80_table] ON temp62.[54-80] = [54-80_table].Component
WHERE [54-80_table].[BOM_STATUS]  IN (‘ACT’,’C-CT’,’CPROD’,’PROD’,’PPROD’,’QUAL’,’RISK’)) j

 

The general query process. The optimization phase is an important one, since we can do something to make the SQL optimizer to right choice.  (By updating stats, recompilation, filtered statistics & query hint etc). I will cover that in advanced tuning.

Query parsed –> Normalized as a query tree –> Complied/optimized –> Executed.

The first important two factors need to be looked in plan 1. Cost 2. Actual vs execution number of rows. I will show this in an advanced tuning article in detail about the rows difference, data skew etc and how to solve that.

1. Check the execution plan, how the query runs behind the screen to get some idea, where the cost is high and scan of the objects, join types etc.

Execution plan shows high cost for parallelism and sort. If we reduce that, it could be run better.

The normal query run time: 46:57 minutes.

2. Check the tables involved in the code and get the count (*) – number of records of the table. This will help us to target the big table regarding the fragmentation and statistics up to date.

Total number of records in the table.

select count(*) from temp81 with (nolock) —4984266

select count(*) from temp62 with (nolock) — 6250470

select count(*) from [54-80_table] with (nolock) – 217

 

It is a heap and no non-clustered index as well – Statistics is nil maybe, some auto stats for column and fragmentation is low.

SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID('Ops_matrix'),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S with (nolock)
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
where f.database_id=DB_ID('Ops_matrix')
and OBJECT_NAME(F.OBJECT_ID) in ('temp81','temp62','54-80_table')
/*WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID('Ops_matrix')
AND OBJECTPROPERTY(I.OBJECT_ID,'ISSYSTEMTABLE')=0
and alloc_unit_type_desc = 'in_row_data'
order by f.avg_fragmentation_in_percent desc
*/

/*
ALTER TABLE temp81 REBUILD;

UPDATE STATISTICS dbo.temp81
*/

 

 

3. Check the table structure of the data type, clustered and non-clustered indexes. Select some top 10 rows and check the values stored in the column, match the data type as well, there are many cases I have seen the data type and table designed badly. Sometimes, the LOB datatype or the larger size has chosen, but the content stored in the column is very small.

Example: create table T10 (n ntext, n1 char( 700))

Note: SQL Server retains the 900-byte limit for the maximum total size of all index key columns.

 

sp_help ‘tablename’

Almost all of the column of the table is nvarchar(510) and it’s a Unicode characters for business needs.

 

4. Re-index the index and heap table.

There is no index to rebuild. I rebuilt the table/heap and updated the stats for table.

ALTER TABLE temp81 REBUILD;
UPDATE STATISTICS dbo.temp81

 

 

5. Create the index for the where clause, join columns and see the result.

Created index, but SQL optimizer is not using it. Since, the optimizer is smart and it thinks to retrieve all the data (select *), the table scan is best with some sort for “union all” and “distinct”. Thus, it always chooses a best plan based on cost, what it has from query tree and the statistics. This can happen many cases, the optimizer will skip the indexes.

create index ix_temp81_54_80 on temp81 ([54-80])
create index ix_temp62_54_80 on temp62 ([54-80])
create index ix_54_80_table_Component on [54-80_table] (Component)
create index ix_54_80_table_BOM_STATUS on [54-80_table] (BOM_STATUS)

 

6. Try using some hint like, MAXDOP, Index hint etc. I am not saying the parallel plan plan is bad, but there are cases that serial plan works best. And we have seen that, the query is waiting for CXPACKET.

 

Forcing the Index to use. Still, there are some lookups we can create some covered index, but here the case is different 20+ columns with nvarchar(510).

See the query run time is 16:11 minutes.

See the MAXDOP run time is 20:24 minutes.

 

7. Filter the data by some condition, generally by date column.

It does not have date column or any other column to limit the data return.

 

8. Check anything can be changed in the query like, select required column, remove some sort operation and add some filer option.

Removed the distinct – Parallel query run time is 13:14 and by using (MAXDOP 1) – 10:00 minutes.

Removed select * – select only a required columns. The run time is better 02:02 minutes.

9. For complex query, take out it as a batch and run it individually. Like, run the two queries separately, run with “union all” and run the full query with “distinct” and see the difference.

Query tuning steps with the execution time for better visibility. It can be reduced even lower than 2 minutes.

Steps Performed Execution Time in Minutes
Query execution time 47
Update Stats and Heap Rebuild 26
Force MAXDOP option 20
Force Index option 16
Without Distinct keyword 10
With Minimal column 2

If the tables temp62 & temp81 has the clustered index, then that is totally different. I am sure that it will run quickly. See my post: http://www.sqlserverblogforum.com/2016/03/how-important-is-clustered-index-for-a-table/.

As a rule of thumb, every table should have a clustered index and yes, it will speed up all operations are on the table. If you know the table is going to be growing bigger and it does not have a candidate for clustered index, I suggest to create a dumpy identity column with a clustered index. The important factor is how effectively we choose the column for the clustered index. Generally a Surrogate key – Ex: IDENTITY

 

I do not think that, the user is going to read “6562117” records. And it returns 50 columns as an output. This is what we always say that , do not run some dump code in the production. If you have no condition to limit, at least get some top 1000 records. Think about the query runs 45 minutes and per day it is called 10 times.

If we analysis all in query and database level. Make a benchmark of server resources (CPU, RAM) usage and storage disk and we can check with the infra team to analysis from their side. But, in the real world, that is not the case most of the time. If you really have a good team and they are ready to add resources and make some changes in storage layers, then you can test that. Since, it is very easy to change the storage layers like moving a VMDK partitions from one VMFS/LUN to another, putting into advanced array with SSDs etc. There is nothing can see from end user point of view. Yes, I have a good team and had tested that.

Again, we need to investigate all the points of view and test everything and make a benchmark of it. The database engine is smart and it will pick and work effectively, if we have good schema and query design.

I cannot add all the images, since, already it is a long post. I hope the tuning steps will help. And there are many great links, I have added to the post. Happy learning!

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Whoisactive capturing a SQL server queries Performance Tuning

 

Whoisactive stored procedure is a powerful script to capture the SQL server queries with tons of information. It’s written by Adam Machanic. It makes DBAs job easier.

It’s a great script with lots of parameter and I am going to show, which are all more important.

Run the stored procedure without any parameter parameters

You can run the procedure without any parameter. It’s a default mode.

Exec [sp_WhoIsActive]

 

Lightweight parameters

@get_plans = 1 – It will show the execution plans of the running query.

@get_task_info = 2 – It will show the wait type of the query.

Ex: Parallel CXPACKET waits. With the additional info like, 3 times waited for CXPACKET number os milliseconds: (Nx: MINms/AVGms/MAXms)[wait_type]

@get_locks = 1 – It will show an XML column output locks each query, useful for blocking situation.

 

Other useful parameters – With little overhead

Sometimes, i use these parameters, when I want to do further more analysis.

@delta_interval = 5 – It will capture the delta interval time period of the usage, metrics for more accurate data, not the cumulative one. It will take 5 seconds to return the results.

@get_additional_info = 1 – It will show the default output plus several more than can be dynamically enabled and disabled options.

@get_avg_time = 1 – It will show the query average run time, which will help to find the query runs in each time differently.

@get_outer_command = 1 – It will show the outer batch query/SP, which will help to get the complete query.

 

Check this for more parameters:

http://sqlblog.com/blogs/adam_machanic/archive/2011/04/06/who-is-active-options-a-month-of-monitoring-part-6-of-30.aspx

 

How effectively use this stored procedure

By default, this stored procedure will show the results in the SSMS. It is useful in the run time check.

In general, the performance reports need to be captured all the time or we do not know, when the issue is occurring.

As every DBAs do, create a table and insert the results in the table. SQL agent job is best for capturing the data in the interval and schedule this SP job every 15 minutes. Sometimes with more parameters which take more time for this SP, can be changed depends on the execution time of the procedure.

 

Following is the table and stored procedure script.

Updated: I have added this after a reply from Fei Yuan.

 

http://sqlblog.com/blogs/adam_machanic/archive/2011/04/25/capturing-the-output-a-month-of-activity-monitoring-part-25-of-30.aspx

How to create a table for the parameters you pass.

You can get the table code from the “@script_table” parameter and I un-commented the print statement. So it will automatically create the table from Exec, with the name you provided in the SET @script_table = REPLACE(@script_table, ‘<table_name>’, ‘dba_data.dbo.Whoisactive_lightweight‘).

 

It is a very, very long script and please download from SQLblog:

Most recent Whoisactive

Updated one with bug fixed – Released on 18-Oct-2016 –>

http://sqlblog.com/blogs/adam_machanic/archive/2016/10/18/sp-whoisactive-the-big-fixes.aspx

 

Download the original Whoisactive procedure:

1. First download the stored procedure and run in your DBA database. Since, if you collecting more data, it will occupy more disk table space. Get the parameter you needed and add into the following code block and it will return the table schema creation script based on the parameter that you pass.

2. Create the Table:

Pass the parameter that you need with the DB name, table name, it will execute the create table “Whoisactive_lightweight”.

 

DECLARE @script_table VARCHAR(MAX) 

EXEC sp_WhoIsActive

@get_plans = 1,@get_task_info = 2,@get_locks = 1,

@format_output = 0,

@return_schema = 1,

@schema = @script_table OUTPUT 

SET @script_table = REPLACE(@script_table, ‘<table_name>’, ‘dba_data.dbo.Whoisactive_lightweight’)

— PRINT (@script_table)

EXEC(@script_table)

3. Execute the stored procedure to insert data into the table.

It will automatically insert the results from the stored procedure to the given table.

 

EXEC sp_WhoIsActive

@get_plans = 1,@get_task_info = 2,@get_locks = 1,

@format_output = 0,

@destination_table = ‘dba_data.dbo.Whoisactive_lightweight’

 

select * from ‘dba_data.dbo.Whoisactive_lightweight

 

You can also run in the SSMS without inserting into a table, by adding multiple parameters.

 

EXEC sp_WhoIsActive
@get_plans = 1,@get_task_info = 2,@get_locks = 1,@delta_interval = 5,@get_additional_info = 1,@get_avg_time = 1,
@get_outer_command = 1,@format_output = 0

 

The important column that we mostly look:

Runing_time ([dd hh:mm:ss.mss]),wait_info,sql_text,tasks,

blocking_session_id,query_plan,locks,request_id

 

A sample select statement: You can select the table and filter conditions based on, the results. I captured one of my production system and there are many quires running more than one hour. Those are all tuned and rewritten 🙂

 

 

select start_time ST,collection_time as CT,* from master..tbl_whoisactive

where login_name not in (‘NT AUTHORITY\SYSTEM’,’DBA’,’sqlserver.service’)

–where blocking_session_id <>0

order by Runing_time desc

 

Once again, Thanks to Adam Machanic. Your script helped me all the time. Managing 100+ DB servers which has more performance issue all the week, are tough job and you script gather all the data every 15 minutes and easy for me to review later and give recommendations and fine tuning.

 

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Lost Administrator SA password SQL server 2012

by Muthukkumaran kaliyamoorthy Published on: July 27, 2014
Comments: 10 Comments
Tags:
Categories:DBA

There are many blog posts talking about, how to recover “SysAdmin” password in case if it’s lost OR the server is new to you and no idea that has own and access to it. In my case the second option J

In my opinion, it’s very easy to work on GUI instead of command line CMD 🙂 I used a SSMS.

 

Steps:

1. Stop the SQL server agent
2. Take the SQL server in single user mode, earlier version, it’s a bit hard

All programs –> SQL server 2012 –> Configuration Tools –> SQL server configuration manager

Stop the agent and take the SQL server into single user mode.

 

3. Reboot the SQL service
4. Launch the SSMS and don’t click the connect button, you need to cancel it and then click the “New query” window on top of the SSMS and then connect.

If you click the connect button the object explorer will take one connection, Single user mode will allow only one connection. Because of that we need to cancel and click the new query window.

Note: You need to be an OS administrator.

 

 

5. Create a new login and add as an admin

 

CREATE LOGIN XXX WITH PASSWORD=Complex password'

SP_ADDSRVROLEMEMBER XXX,'SYSADMIN'

 

Hope this would help someone.

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

An application error occurred on the server running on SQL server

Recently one of my application website went down.

 

I checked all the basic connectivity troubleshooting and seem everything was looking and working fine. Finally, I found the problem with the browser service but that’s also in running state.

Error from the event viewer:

 

 

The quick solution is rebooting the SQL browser (Start –>All programs–>Microsoft SQL server 200X–>Configuration tools –>SQL server configuration Manager) without rebooting SQL service.

I searched and found a couple of MS links (KB-2526552 And SQLBrowser Unable to start) but, I did not apply it. I used another way that is also a permanent fix.

 

Troubleshooting ways and a permanent fix:

For me it’s a named instance and listening a dynamic port and DBAs knows the browser service is mainly for named instance.

From the local machine we can connect the server through SSMS by using server name and server name + port number. But, other than local machine you cannot connect the server by using server name. (You can test that by connecting some other server or better install only SSMS on the application server and try to connect it) so I went to the application server and opened a connection string as expected the data source only has the server name. So We changed it from Datasource “from Data Source=Servername\Instance to Data Source= Servername\Instance,port” Ex: Muthu1\SQL1,5432.

Application team made a standard to always include a port number in the connection string block i.e. FQDN.

 

A Basic SQL Connectivity checks:

  • Check SQL service is running or not and try to connect through SSMS from local and remote
  • Check TCP/IP protocol enabled on SQL server configuration manager and find the port number
  • Connect using a server+port number from SSMS local and remote
  • For firewall block/port not opened you can check through command prompt TELNET server port ex: TELNET server 1433
  • Check remote connections are enabled & SQL Browser service is running (For a named instance which is not using FQDN)
  • Check you have any alias/DNS name

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Create and setup a SQL clustering on your Desktop or Laptop

How to create and setup a SQL clustering on your Desktop or Laptop

 

I thought to write this post after I discussed in a forum. SQL clustering is a bit complicated one. I mean most of the DBAs are not configured or tested it, because we don’t get a chance to do or to learn SQL clustering. Also it needs bit knowledge from out of SQL (Windows AD, Network and Storage (SAN)).

In this post, I will give you an idea how to configure SQL failover clustering to yourself. We are doing everything in virtual environment (Single machine) the same you can do physically but you need all in physically in production environment (Machine, Domain, Storage & network). It’s not possible to show all the steps and screen shot here because it has more than 40 screenshots. I have prepared a document and will share with you if needed.

What is SQL clustering?

In general cluster means more than one.

SQL clustering is a collection of two or more servers access the data from same single shared storage (SAN), if one server failed another one server will take the data from the storage and the application will be running without any problem.

I suggest readers to read Kendra Little’s article. It’s good article and she explains with diagram.

http://www.brentozar.com/archive/2012/02/introduction-sql-server-clusters/

 

The followings are the software for setup clustering.

Software Free/Trail URL location
VMware workstation 30 day trial VMware workstation
Starwind 30 day trial Starwind iscsi SAN
Windows server 2008 60 day trial Windows server 2008
SQL server 2008 180 day trial SQL server 2008

 

Planning the IP address

Component Name IP address
Domain controller xxx.xxx.1.6
Node1 / Machine1 xxx.xxx.1.7
Node2 / Machine2 xxx.xxx.1.8
Windows cluster xxx.xxx.1.9
MSDTC xxx.xxx.1.10
Virtual SQL server xxx.xxx.1.11
Private IP xxx.xxx.2.11
Public IP xxx.xxx.2.12

 

 

Installing VMware

 

Installing a VMware workstation is an easy one. Just click the typical and next.

 

 

Creating VMware

 

Creating three virtual machines using the windows trail edition and assigning the right amount of RAM, CPU & disk Space for each guest host. Configure the network adapter for public and private network for internet connection.

 

1. Domain controller (DC)

 

2. Machine1 (Node1)

 

3. Machine2 (Node2)

Creating Domain DC

 

The domain controller is also known as active directory.

http://en.wikipedia.org/wiki/Active_Directory

 

  • Install and create a domain.

Creating Shared disk

 

We know SQL clustering needs a shared disk typically SAN. The use of StarWind’s ISCSI we can make our local disk as shared disk. Because we don’t have a SAN storage server physically so i am creating the shared storage in the DC machine itself using StarWind’s iSCSI.

 

Format the disk on each guest and use it.

Creating and installing windows Failover cluster

  • Installing the windows fail over cluster.
  • Validate and add the nodes.

 

  • Assign the IP and Cluster the Nodes
  • Add the MSDTC

 

 

Installing SQL Failover cluster

Install the SQL server failover cluster on both the nodes. Its easy one just you can give the network IP address and choose the shared disk.

Second node will automatically display the SQL cluster name and we can just add it.

 

To view the Cluster admin go to run –> type CLUADMIN.MSC

Node1 (Muthu1) is the primary host. Which will be taking care of the SQL server and the database applications.

 

Use this command to check which node the cluster SQL server is currently running.

SELECT SERVERPROPERTY (‘ComputerNamePhysicalNetBIOS’)

 

 

How do I know, Is the cluster automatically fail over the server, if anything goes wrong on the running node?

Just test it. Power off the Node1 (Muthu1) and you can see the SQL server automatically fail over to the node2 (muthu2).

 

When the SQL cluster will be down? Or Is clustering is disaster recovery?

SQL fail over clustering is not a disaster recovery. In case if the storage (SAN) is not available then clustering won’t start.

 

Testing: Power off the DC. In our case DC machine has storage shared disk. Hence the shared disk (SAN) goes off, then SQL clustering will not work.

 

My special thanks go to my VM ware team.

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

page 1 of 4»

Welcome , today is Tuesday, May 23, 2017