AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

 

I have recently come across an issue that, one of my alwaysON secondary replica databases are went into “NOT SYNCHRONIZED and RECOVERY PENDING” state. It is a geo cluster alwaysON with 4+2 nodes configured in both synchronous and asynchronous mode. 

The issue is when the database removed from the primary replica, with the secondary disconnection the higher database IDs on the secondary went into “NOT SYNCHRONIZED and RECOVERY PENDING” state, but the lower database IDs are good and synchronous state only.

I just checked with one of the Microsoft MSFT and he said it considers as a bug. “Since, it occurs a DDL change on the Availability Group as the primary, while a secondary replica server is down”.

I just reproduced the same. It is a two node synchronous replica server.

Version: Microsoft SQL Server 2012 (SP1) – 11.0.3000.0

I have not tested it on SQL 2014.

 

Testing:

1. Create five databases.

CREATE DATABASE dba

CREATE DATABASE dba1

CREATE DATABASE dba2

CREATE DATABASE dba3

CREATE DATABASE dba4

SELECT * FROM SYS.SYSDATABASES

2. Created a test availability group and added a five databases in synchronous mode.

3. Stop SQL Instance for Secondary Replica.

4. On the Primary Replica issue TSQL to remove a database from availability group.

 

ALTER AVAILABILITY GROUP [AG-Test] REMOVE DATABASE dba2;

 

5. Start SQL Instance for the Secondary Replica.

You can see that, databases whose IDs are higher than the one removed dba2 are going to “NOT SYNCHRONIZED / RECOVERY PENDING” mode.

DB ID – 9, 10 for dba,dba1 – Good

DB ID – 11 for dba2 – Removed DB

DB ID – 12,13 for dba3, dba4 went into SYNCHRONIZED / RECOVERY PENDING.

 

SELECT DATABASE_ID,DB_NAME(DATABASE_ID),NAME,REPLICA_SERVER_NAME,

SYNCHRONIZATION_HEALTH_DESC,SYNCHRONIZATION_STATE_DESC,S.*

FROM SYS.DM_HADR_DATABASE_REPLICA_STATES S JOIN  SYS.AVAILABILITY_GROUPS G ON (S.GROUP_ID=G.GROUP_ID)

JOIN SYS.AVAILABILITY_REPLICAS   GR ON (GR.GROUP_ID=G.GROUP_ID)

–WHERE SYNCHRONIZATION_HEALTH_DESC <>’HEALTHY’

–AND REPLICA_SERVER_NAME =”

 

The fix and work around is resume other databases that are stuck in secondary replica.

6. On the Secondary replica that has stuck databases, for each stuck database remove (will make it DB into restoring mode) and add (will make it DB into synchronized mode).

Remove a Secondary Database from an AG –> https://msdn.microsoft.com/en-us/library/hh231120.aspx

 

ALTER DATABASE [DBA3] SET HADR OFF

ALTER DATABASE [DBA4] SET HADR OFF

Join a Secondary Database to an AG –> https://msdn.microsoft.com/en-us/library/ff878535.aspx

 

ALTER DATABASE [DBA3] SET HADR AVAILABILITY GROUP = [AG-TEST]

ALTER DATABASE [DBA4] SET HADR AVAILABILITY GROUP = [AG-TEST]

Repeat for each stuck database.

The database that had been removed <dba2> could be in any number of states, but after it finishes its recovery process, it should return to a RESTORING state.

7. On primary replica, Add the removed database –> primary replica –> right click AG databases–>select database –> join only –> connect the secondary –> finish.

In case, if you get any following error, check the backup table and restore a log backup with norecovery mode in all the secondary replica. Which are all missed or run during that time.

Msg 1478, Level 16, State 211, Line 1

The mirror database, ‘DBA2’, has insufficient transaction log data to preserve the log backup chain of the principal database.  This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

 

Added: Thanks Amit Banerjee for your reply. I just reported to the connect.

https://connect.microsoft.com/SQLServer/feedback/details/3022019

 

 

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

AlwaysON move database without breaking HADR

Move database without breaking alwaysON

 

This post is going to show the database movement from one drive to another drive, without breaking the database from alwaysON configuration. An application has created the many databases to both primary and secondary replica servers to the default location of C drive.

There are methods like detach/attach, backup/restore & alter database. For alwaysON HADR servers, the best method is Alter database. Since it is in the mirror/sync mode. Detach/attach will not work and backup/ restore, we need to break databases from the HADR.

Note:

It’s a two node synchronous replica, if you have more replica, you should plan each well.

If you have any standalone database (Without adding DBs into HADR), you need to plan for the downtime. Since, the secondary replica servers need to take down.

In primary, for high transaction system, make sure you have good space for transaction files.

Steps:

Before going to start the database movement, write the script for each step and you can write a dynamic SQL for larger number databases. That’s what I did, since I had many databases. It will minimize the time.

 

ALTER DATABASE <DB name> SET HADR suspend

GUI: Expand the AG group and right click the DB –>suspend data movement

  • Change the Readable secondary to ‘NO’ for all the secondary replicas, otherwise you will get an error.

Right click the primary replica alwaysON group –> properties –> Readable secondary  –> No

To use ALTER DATABASE, the database must be in a writable state in which a checkpoint can be executed.

 

  • Note down all the files and file location from the system tables.

 

select db_name(a.database_id),a.name,a.physical_name,size/128.0 AS CurrentSizeMB,*

from sys.master_files a join sys.databases b

on a.database_id =b.database_id

–where a.physical_name like ‘c%’

–and a.database_id >4

–and type_desc <>’rows’

order by a.name

 

  • On the mirror server run the “ALTER DATABASE <DB name> MODIFY FILE” command. You need to run this for each database.

 

use master

ALTER DATABASE <DB_name> MODIFY FILE (NAME =<logical name>

,FILENAME =’F:\SQL_DATA\DB_name.mdf’)

 

  • On the secondary replica server stop the SQL Server instance.

 

  • Move the database file (MDF & LDF) files to the changed location (Cut & Paste).
  • Start the SQL Server instance and check the file locations using the above query.

 

  • In primary replica server resume the database by using the following ALTER DATABASE statement:

 

ALTER DATABASE <DB name> SET HADR Resume

GUI: Expand the AG group and right click the DB –>resume data movement

 

  • Change the Readable secondary to ‘Yes’

Fail over and repeat the steps for the partner server.

Additionally, if you add any files in the primary and the folder name is incorrect in the secondary, the database will go into suspend mode.

Just check the error log, you can get more info on, why the database is suspended mode.

Error: 5123, Severity: 16, State: 1.

CREATE FILE encountered operating system error 3(The system cannot find the path specified.) while attempting to open or create the physical file ‘L:\SQL_log\DB_name.ndf’.

 

Created the folder in secondary replica and resume the database, the resume database command will create a data file.

ALTER DATABASE <db name> SET HADR RESUME;

 

Added:

After a reply from Richard L. Dawson – I thought to add this as well. Thanks for that. This will help others as well.

You cannot take the database offline in both primary and secondary replica, unless we remove from alwaysON. Removing the database in the primary will make your secondary database into the recovery mode it is a default configuration.

Same as for secondary, you cannot remove it, it will make the databases into recovery mode and moreover the database will not remove from AG group.

Msg 1468, Level 16, State 1, Line 1

The operation cannot be performed on database “DB” because it is involved in a database mirroring session or an availability group. Some operations are not allowed on a database that is participating in a database mirroring session or in an availability group.

Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

So moving physical database files needs a down/offline, we cannot move database and it requires the database need to be standalone not in the AG. We can run the alter database command and maybe can try this “DBCC SHRINKFILE (A, EMPTYFILE)”. We know, how tough this is. http://www.bobpusateri.com/archive/2013/03/moving-a-database-to-new-storage-with-no-downtime/

 

We can remove the database from AG and can do this like, the process we follow for the normal database. If you have only one database and can afford downtime. Yes, we can do this. (OR) we can completely start from the scratch. There are many methods.

(1.Remove database from AG 2. Alter database and modify the file 3. Set offline 4. Cut & paste the physical file to the new location 5. Set back to the database online 6. Add the database into AG)

Note: If any log backup run during this process the LSN will not match with all the secondary replica, you need to restore with norecoery and need to add the database.

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Performance Tuning Series – Main Part

 

Performance tuning is a tricky part. We need to analysis all points of view. Sometimes it is an easy fix and can be solved in a minute and sometimes it take a day or more to find and fix.

I received many emails for performance tuning help. It has a variety of questions, that how can we tune a query and how to find out what is going on in the server, since my application team is does not give a slow running code. The query only slowdown in some time period. My server is banging all time above 95% of resources.

Nowadays, every business has a tool that tiger an alert that resource (CPU, RAM) goes beyond the threshold, that’s mostly from OS team and it’s easy to pull a month or a year report from the server and can ask DBAs, why would the server running always with high resources for the past few days/ months.

It’s always a changeling job for DBAs to tune their system, we even keep all maintenance jobs like re-indexing and statistics update OR adding more RAM, CPU will not help, there are many things can come up. All we need to know, how to gather the information from server, database and code point of view and analysis them one by one.

Most of the time poorly written and designed query and schema will be the issue. Why I am saying so, you will come to know after reading this post.

(I mean the poorly build code is, it is not written in a set based, bad schema design, no indexes used in the where, join clause, pulling all the data even if it has a some column for filtering, high fragmentation, out of stats & parameter sniffing etc.)

We need to use techniques and need to capture the exact issue.

 

To cover this post, will make me to write some other post as well. Please refer the following post as well to get some more idea on the tuning.

Capturing the SQL server queries by using Whoisactive

Capturing the SQL server queries by using server side trace  (Upcoming post: The link will be updated)

Tune query by looking execution plan and effective of the index creation (Upcoming post: The link will be updated)

 

The technique I/we DBAs mostly use:  F&F – Find and Fix.

  1. Whoisactive powerful script – This will capture everything, what are all running on the server/ DB side, if we did not have much information from the application team use this.
  2. SQL server execution plan – Once, you have found the code, which is taking more time, go and run in the SSMS with actual execution plan and see what is going on behind the screen. Even, the Whoisactive has plan info, if you pass the @get_plans = 1 parameter.

I am classifying in three parts to get a clear picture on this.

  1. Server level analysis and tuning
  2. Database level analysis and tuning
  3. Query level analysis and tuning

Before Whoisactive and sp_blitz, I tend to check perfmon and the size of the database, server and database level parameter changes, more DMVs etc. The two scripts give more control to DBAs.

There is no rule that, we should check first something and second something so on. Like (Execution plan, index used or not, fragmentation, count (*) table, stats ect..)

It’s always depends on environment. In some cases, the query might run for more than 10 hours or SP has 100 lines of code or the query might behave differently in runtime etc. We might take a help of an estimated plan to know the query without actual run OR take a look at query line by line.

 

Server and Database level analysis and tuning

More or less the server and database level checks are same. In the deeper it will take to query level only. That is what we keep saying the best practice of t-SQL writing will help the system at the top.

In common, we mostly get unknown parameter, I mean we need to find why the resource usages are peak, without having any information.

 

  • Check the SQL server error log

EXEC Sp_readerrorlog 0

The error log is important to check, which logs all the information.

This is a fantastic script from Paul Randal, which shows the overall wait type percentage and additionally the explanation of each wait URL.

 

WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
-- Maybe uncomment these four if you have mirroring issues
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
-- Maybe uncomment these six if you have AG issues
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT',
N'ONDEMAND_TASK_QUEUE',
N'PREEMPTIVE_XE_GETTARGETSTATE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'PWAIT_DIRECTLOGCONSUMER_GETNEXT',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_RECOVERY',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)
SELECT
MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S],
CAST ('https://www.sqlskills.com/help/waits/' + MAX ([W1].[wait_type]) as XML) AS [Help/Info URL]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95; -- percentage threshold
GO

 


See the results: SOS_SCHEDULER_YIELD – 34%, CXPACKET - 13% waits, Do you think which means we need to add more CPU, Since the wait type indicates there is a CPU pressure, No, it does not mean that we need to go and add more CPU, It means further analysis needed to find the query which is waiting for those waits and why it is waiting and how can we make that do not wait. Yes, it can be tuned and the wait type will automatically reduce.

  • Check the currently running sessions
Use Exec Whoisactive OR DMVs 
select db_name (r.database_id),r.command,r.wait_resource,s.status,s.session_id,
r.blocking_session_id,r.last_wait_type,r.open_transaction_count,
* from sys.dm_exec_sessions s join sys.dm_exec_requests r
on s.session_id =r.session_id
where s.session_id >=51 and s.status <>'sleeping'and s.session_id <>@@SPID
--and r.blocking_session_id<>0

 

  • Check the server and database level parameter default and changes

Use, Brent Ozar’s script https://www.brentozar.com/blitz/

It is an awesome script, which will help us to quickly analysis the server and database level non default parameter checks with suggestion etc. They have improved a lot. Please check the link and provide the email address and other details. They will send you the ZIP file location to your email address.

 

  • Check the database level running query by DMVs OR Whoisactive

The best way is to use Whoisactive or there are many DMVs, just do some googling.

  • Use the server side traces to capture the information of particular events

Upcoming post: I am going to write, how to collect the queries by using server side trace.

 

Query level tuning and code re-write

Query level tuning is more important, most of the time poorly built queries eats up all the resources, even if you have loads of CPU and RAM and no fragmentation and with updated stats.

 

There are some cases that, we get in real time we need to speed up the code without an actual run.

The following step is not exactly same that, you should follow in all the time. I just used this to show more information for this particular post.

I captured this query by using “Whoisactive”. And there are many like this. And all cannot be shown here. See the wait type and it is almost matching with server over all wait.

I just took the third query, see how it performs and can be tuned and rewrite better.

It is a simple union all with join two tables. It does not have a clustered index, since there is no candidate/ high selectivity column for a clustered index.

 

SELECT DISTINCT j.* FROM
(SELECT temp81.*,[54-80_table].[Wafer Sort Test Program (High Temp)], [54-80_table].[Wafer Sort Test Program (High Temp)_REV],
[54-80_table].[Wafer Sort Test Program (Low Temp)], [54-80_table].[Wafer Sort Test Program (Low Temp) Rev]
FROM temp81 LEFT OUTER JOIN [54-80_table]
ON temp81.[54-80] = [54-80_table].Component
WHERE [54-80_table].[BOM_STATUS]  IN (‘ACT’,'C-ACT’,'CPROD’,'PROD’,'PPROD’,'QUAL’,'RISK’)

UNION ALL

SELECT temp62.*,Null,Null, [54-80_table].[Wafer Sort Test Program (High Temp)], [54-80_table].[Wafer Sort Test Program (High Temp)_REV],
[54-80_table].[Wafer Sort Test Program (Low Temp)], [54-80_table].[Wafer Sort Test Program (Low Temp) Rev]
FROM temp62 LEFT OUTER JOIN [54-80_table] ON temp62.[54-80] = [54-80_table].Component
WHERE [54-80_table].[BOM_STATUS]  IN (‘ACT’,'C-CT’,'CPROD’,'PROD’,'PPROD’,'QUAL’,'RISK’)) j

 

1. Check the execution plan, at least we get some idea, where the cost is high and scan of the objects, join types etc.

Execution plan shows high cost for parallelism and sort. If we reduce that, it could be run better.

The normal query run time: 46:57 minutes.

2. Check the tables involved in the code and get the count (*) – number of records of the table. This will help us to target the big table regarding the fragmentation and statistics up to date.

Total number of records in the table.

select count(*) from temp81 with (nolock) –4984266

select count(*) from temp62 with (nolock) — 6250470

select count(*) from [54-80_table] with (nolock) – 217

 

It is a heap and no non-clustered index as well – Statistics is nil maybe, some auto stats for column and fragmentation is low.

SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID('Ops_matrix'),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S with (nolock)
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
where f.database_id=DB_ID('Ops_matrix')
and OBJECT_NAME(F.OBJECT_ID) in ('temp81','temp62','54-80_table')
/*WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID('Ops_matrix')
AND OBJECTPROPERTY(I.OBJECT_ID,'ISSYSTEMTABLE')=0
and alloc_unit_type_desc = 'in_row_data'
order by f.avg_fragmentation_in_percent desc
*/

/*
ALTER TABLE temp81 REBUILD;

UPDATE STATISTICS dbo.temp81
*/

 

 

3. Check the table structure of the data type, clustered and non-clustered indexes. Select some top 10 rows and check the values stored in the column, match the data type as well, there are many cases I have seen the data type and table designed badly. Sometimes, the LOB datatype or the larger size has chosen, but the content stored in the column is very small.

Example: create table T10 (n ntext, n1 varchar( 7000))

Note: SQL Server retains the 900-byte limit for the maximum total size of all index key columns.

 

sp_help ‘tablename’

Almost all of the column of the table is nvarchar(510) and it’s a Unicode characters for business needs.

 

4. Re-index the index and heap table.

There is no index to rebuild. I rebuilt the table/heap and updated the stats for table.

ALTER TABLE temp81 REBUILD;
UPDATE STATISTICS dbo.temp81

 

 

5. Create the index for the where clause, join columns and see the result.

Created index, but SQL optimizer is not using it. Since, the optimizer is smart and it thinks to retrieve all the data (select *), the table scan is best with some sort for “union all” and “distinct”. Thus, it always chooses a best plan based on cost, what it has from query tree and the statistics. This can happen many cases, the optimizer will skip the indexes.

create index ix_temp81_54_80 on temp81 ([54-80])
create index ix_temp62_54_80 on temp62 ([54-80])
create index ix_54_80_table_Component on [54-80_table] (Component)
create index ix_54_80_table_BOM_STATUS on [54-80_table] (BOM_STATUS)

 

6. Try using some hint like, MAXDOP, Index hint etc. I am not saying the parallel plan plan is bad, but there are cases that serial plan works best. And we have seen that, the query is waiting for CXPACKET.

 

Forcing the Index to use. Still, there are some lookups we can create some covered index, but here the case is different 20+ columns with nvarchar(510).

See the query run time is 16:11 minutes.

See the MAXDOP run time is 20:24 minutes.

 

7. Filter the data by some condition, generally by date column.

It does not have date column or any other column to limit the data return.

 

8. Check anything can be changed in the query like, select required column, remove some sort operation and add some filer option.

Removed the distinct – Parallel query run time is 13:14 and by using (MAXDOP 1) – 10:00 minutes.

Removed select * - select only a required columns. The run time is better 02:02 minutes.

 

Query tuning steps with the execution time for better visibility.

Steps Performed Execution Time in Minutes
Query execution time 47
Update Stats and Heap Rebuild 26
Force MAXDOP option 20
Force Index option 16
Without Distinct keyword 10
With Minimal column 2

If the tables temp62 & temp81 has the clustered index, then that is totally different. I am sure that it will run quickly. See my post: http://www.sqlserverblogforum.com/2016/03/how-important-is-clustered-index-for-a-table/

As a rule of thumb, every table should have a clustered index and yes, it will speed up all operations are on the table. The important factor is how effectively we choose the column for the clustered index. Generally a Surrogate key – Ex: IDENTITY

Example: For complex query, take out it as a batch and run it individually. Like, run the two queries separately, run with “union all” and run the full query with “distinct” and see the difference.

I do not think that, the user is going to read “6562117” records. And it returns 50 columns as an output. This is what we always say that , do not run some dump code in the production. If you have no condition to limit, at least get some top 1000 records. Think about the query runs 45 minutes and per day it is called 10 times.

I cannot add all the images, since, already it is a long post. I hope the tuning steps will help. And there are many great links, I have added to the post. Happy learning!

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Whoisactive capturing a SQL server queries Performance Tuning

 

Whoisactive stored procedure is a powerful script to capture the SQL server queries with tons of information. It’s written by Adam Machanic. It makes DBAs job easier.

It’s a great script with lots of parameter and I am going to show, which are all more important.

Run the stored procedure without any parameter parameters

You can run the procedure without any parameter. It’s a default mode.

Exec [sp_WhoIsActive]

 

Lightweight parameters

@get_plans = 1 – It will show the execution plans of the running query.

@get_task_info = 2 – It will show the wait type of the query.

Ex: Parallel CXPACKET waits. With the additional info like, 3 times waited for CXPACKET number os milliseconds: (Nx: MINms/AVGms/MAXms)[wait_type]

@get_locks = 1 – It will show an XML column output locks each query, useful for blocking situation.

 

Other useful parameters – With little overhead

Sometimes, i use these parameters, when I want to do further more analysis.

@delta_interval = 5 – It will capture the delta interval time period of the usage, metrics for more accurate data, not the cumulative one. It will take 5 seconds to return the results.

@get_additional_info = 1 – It will show the default output plus several more than can be dynamically enabled and disabled options.

@get_avg_time = 1 – It will show the query average run time, which will help to find the query runs in each time differently.

@get_outer_command = 1 – It will show the outer batch query/SP, which will help to get the complete query.

 

Check this for more parameters:

http://sqlblog.com/blogs/adam_machanic/archive/2011/04/06/who-is-active-options-a-month-of-monitoring-part-6-of-30.aspx

 

How effectively use this stored procedure

By default, this stored procedure will show the results in the SSMS. It is useful in the run time check.

In general, the performance reports need to be captured all the time or we do not know, when the issue is occurring.

As every DBAs do, create a table and insert the results in the table. SQL agent job is best for capturing the data in the interval and schedule this SP job every 15 minutes. Sometimes with more parameters which take more time for this SP, can be changed depends on the execution time of the procedure.

 

Following is the table and stored procedure script.

Added: I have added this after a reply from Fei Yuan.

 

http://sqlblog.com/blogs/adam_machanic/archive/2011/04/25/capturing-the-output-a-month-of-activity-monitoring-part-25-of-30.aspx

How to create a table for the parameters you pass.

There are two ways to do this. You can get the table code from the “@script_table” parameter OR you can pass another one parameter “@destination_table” to insert the results into the table automatically, with the name you provided.

The second one is better for data collection.

It is a very, very long script and please download from SQLblog:

Most recent Whoisactive

Download the original Whoisactive procedure:

1. First download the stored procedure and run in your DBA database. Since, if you collecting more data, it will occupy more disk table space. Get the parameter you needed and add into the following code block and it will return the table schema of the code.

Table creation script:

2. Pass the parameter that you need with the DB name, table name, it will generate the create table script OR you can un-comment the EXEC line. Create the table.

 

DECLARE @script_table VARCHAR(MAX) 

EXEC sp_WhoIsActive

@get_plans = 1,@get_task_info = 2,@get_locks = 1,

@format_output = 0,

@return_schema = 1,

@schema = @script_table OUTPUT 

SET @script_table = REPLACE(@script_table, ‘<table_name>’, ‘dba_data.dbo.Whoisactive_lightweight’)

PRINT (@script_table)

– EXEC(@script_table)

Execute the  stored procedure to insert data to table.

3. It will insert the results from the stored procedure to the given table.

 

EXEC sp_WhoIsActive

@get_plans = 1,@get_task_info = 2,@get_locks = 1,

@format_output = 0,

@destination_table = ‘dba_data.dbo.Whoisactive_lightweight’

 

select * from ‘dba_data.dbo.Whoisactive_lightweight

 

The important column that we mostly look:

Runing_time ([dd hh:mm:ss.mss]),wait_info,sql_text,tasks,

blocking_session_id,query_plan,locks,request_id

 

A sample select statement: You can select the table and filter conditions based on, the results. I captured one of my production system and there are many quires running more than one hour. Those are all tuned and rewritten :-)

 

 

select start_time ST,collection_time as CT,* from master..tbl_whoisactive

where login_name not in (‘NT AUTHORITY\SYSTEM’,'DBA’,'sqlserver.service’)

–where blocking_session_id <>0

order by Runing_time desc

 

Once again, Thanks to Adam Machanic. Your script helped me all the time. Managing 100+ DB servers which has more performance issue all the week, are tough job and you script gather all the data every 15 minutes and easy for me to review later and give recommendations and fine tuning.

 

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

How important is clustered index for a table

 

I am not sure, how to name this topic. May be, start by following questions.

 

How important and effective is clustered index for a table?

How to solve the heap fragmentation?

Why my query is running slow most of the time for this particular table?

Are the primary key and clustered index are same?

 

There are developers still they understand that, we must need a primary key to create a clustered index. No, that’s not. Both are different used for different purpose.

The primary key is to enforce the business logic. I.e. enforce the uniqueness. By default it will create a clustered index, if we are not changing manually to non-clustered.

You can create a clustered index without a primary key. In two ways, create [unique] clustered index. The unique is an optional keyword.

A heap table is “without a clustered index is called heap”

 

Coming into the picture.

Recently a developer came to me for an application slowness. The query is in more environment (QA, DEV & Prod) each run differently. DBAs know where to start analysis the query.

 

I cannot show all the codes. But it’s a simple search character code like ‘’ with more conditions.

The table has massive fragmentation. Since, it’s a heap, but small. A 50K data table, with a simple query runs more than 5 minutes to return the data.

1. After a de-fragmentation only of all non-clustered indexes, it still runs 1 minute to return the data.

2. After a clustered index creation, it runs boom, less than a second. This is a best practice and the table has clustered index will be good and it enforce the logical ordering and every record in a non-clustered index has to link to the clustering index key.

Think about the heap it an unordered data, so it needs travel all the pages to get data. Probably, a full scan. But clustered index will be ordered logically with matching of index columns and it’s easy to get the data by clustering key.

From 2008 onwards, the same result we can get by rebuilding the table ALTER TABLE REBUILD. But internally it changes the heap records and ask all non-clustered indexes to rebuild it to match with rebuilt heap, for a big table it is worse.

SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID(‘Regression’),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID(‘Regression’)
AND OBJECTPROPERTY(I.OBJECT_ID,’ISSYSTEMTABLE’)=0
order by f.avg_fragmentation_in_percent desc

 

Before clustered index creation: IX_mtProduct_ProductID & heap has 2372493 pages

Table Index Fragmentation Page count
mtProduct IX_mtProduct_Description

0.342466

292

mtProduct IX_mtProduct_Mat_Seg_Desc

97.4359

819

mtProduct IX_mtProduct_MaterialType

96.81529

471

mtProduct IX_mtProduct_MCG

98.17987

934

mtProduct IX_mtProduct_ProdType

92.49423

866

mtProduct IX_mtProduct_ProductFamily

96.52568

662

mtProduct IX_mtProduct_ProductID

99.07121

646

mtProduct IX_mtProduct_ProductLine

98.75

880

mtProduct NULL

11.40322

2372493

 

After clustered index creation: CX_mtProduct_ProductID Clustered index has 1954 pages

Table Index Fragmentation Page count
mtProduct IX_mtProduct_Description

0.869565217

345

mtProduct IX_mtProduct_Mat_Seg_Desc

1.604278075

187

mtProduct IX_mtProduct_MaterialType

0.591715976

169

mtProduct IX_mtProduct_MCG

1.754385965

171

mtProduct IX_mtProduct_ProdType

2.580645161

310

mtProduct IX_mtProduct_ProductFamily

1.435406699

209

mtProduct CX_mtProduct_ProductID

0.204708291

1954

mtProduct IX_mtProduct_ProductLine

1.369863014

219

 

Now, see the number pages for heap- 2372493 and clustered index – 1954 and it can change over a time, when there is a modification but, not like a heap.

It’s always good to have a clustered index to search the record effectively.

Hope now you have understood the effectiveness of the clustered index and we should create a clustered index, when there is a candidate column available or sometimes create a dumpy column for the very big table. Generally a Surrogate key – Ex: IDENTITY.

 

There are already many posts available to choose the cluster index.

http://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/

https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/

 

 

 

 

 

 

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

Lost Administrator SA password SQL server 2012

by Muthukkumaran kaliyamoorthy Published on: July 27, 2014
Comments: 10 Comments
Tags:
Categories:DBA

There are many blog posts talking about, how to recover “SysAdmin” password in case if it’s lost OR the server is new to you and no idea that has own and access to it. In my case the second option J

In my opinion, it’s very easy to work on GUI instead of command line CMD :-) I used a SSMS.

 

Steps:

1. Stop the SQL server agent
2. Take the SQL server in single user mode, earlier version, it’s a bit hard

All programs –> SQL server 2012 –> Configuration Tools –> SQL server configuration manager

Stop the agent and take the SQL server into single user mode.

 

3. Reboot the SQL service
4. Launch the SSMS and don’t click the connect button, you need to cancel it and then click the “New query” window on top of the SSMS and then connect.

If you click the connect button the object explorer will take one connection, Single user mode will allow only one connection. Because of that we need to cancel and click the new query window.

Note: You need to be an OS administrator.

 

 

5. Create a new login and add as an admin

 

CREATE LOGIN XXX WITH PASSWORD=Complex password'

SP_ADDSRVROLEMEMBER XXX,'SYSADMIN'

 

Hope this would help someone.

Muthukkumaran kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts - Website

page 1 of 7»

Welcome , today is Sunday, August 28, 2016