Sql server Blog Forum Helping SQL server DBAs and Developers

31Aug/164

How to solve the LSN mismatch in SQL server

 

There are many times, we face the LSN mismatch issue in alwaysON and other HA technologies. It is a bit hard to find the missing transaction log backup to apply. Since, there are hundreds of thousands log generated, depends on the transaction log frequency and it can be run in any secondary alwaysON database server.

Think about “the VLDB” and “the backup is in different data center” and database are out of sync in DR site because of LSN mismatch. For VLDB 8 TB database, we cannot take a full or differential backup to fix this. Since, it will take more and more time. Backup is in different data center in CIFS share restore over the WAN will kill the performance and time.

 

LSN: Every record in the SQL Server transaction log is uniquely identified by a log sequence number (LSN). The restore database will work with the sequence of LSN order, no break in log chain.

It is easy if we understand the LSN chains a bit internally. Let me try to show in this post.

Example:

For the first log backup, the first LSN is 100 and Last LSN is 200, second log backup the first LSN should be 200 and last LSN is 300 and third log backup the first LSN should be 300 and last LSN will be some number and the chain goes on.

How to track, where the LSN breaks out and how to fix it. There are many methods, I used. All it depends on the issue and situation. Let me show all the methods and try it out.

Try is in testing not in production: Easy way to break the LSN in AlwaysON database.

Remove a database from an AlwaysON group in a secondary.

ALTER DATABASE [dba3] SET HADR OFF;

Take couple more log backup in primary.

DECLARE @MyFileName varchar(200)
SELECT @MyFileName='\\Sharepatht\dba3_' + REPLACE(convert(nvarchar(20),GetDate(),120),':','-') + '.trn'
--select @MyFileName
BACKUP log dba3 TO DISK=@MyFileName

Try to add the database back to secondary.

ALTER DATABASE [dba3] SET HADR AVAILABILITY GROUP = [AG-Test];

Error:

Msg 1478, Level 16, State 211, Line 1

The mirror database, has insufficient transaction log data to preserve the log backup chain of the principal database.  This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

Use the backup table in MSDB to retrieve the backup history with dynamic SQL.

Note: The filter where clause is important and it will be modified based on my steps. I just uncommented all conditions.

Dynamic Backup script:

 

SELECT  'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery',
b.backup_finish_date,b.first_lsn,b.backup_size /1024/1024 AS size_MB,b.type,b.recovery_model,
b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.*
FROM MSDB.DBO.BACKUPMEDIAFAMILY F
JOIN MSDB.DBO.BACKUPSET B
ON (f.media_set_id=b.media_set_id)
WHERE database_name like'dba3'
--and b.backup_finish_date>='2016-08-29 00:00:00.000'
--and b.first_lsn<=40000000007600001
-- and last_lsn= 39000000047800001
--and b.checkpoint_lsn =39000000047800001
--and database_backup_lsn =39000000047800001
--AND B.type='L'
--ORDER BY b.backup_finish_date desc
--order by b.first_lsn  desc

 

First method – A little tough, but good to understand the first and last LSN and the restore failures of LSN mismatch. This can be used for any other DR LSN issues, not only for alwaysON.

Step 1:

Run the dynamic backup script to get the latest transaction log backup from the server. Run this in all alwaysON group replica servers and make sure, you got the latest one.

Uncomment “ORDER BY b.backup_finish_date desc” to display in first line. Get the backup file and pass on it to the following restore command.

restore database dba3 from disk= '\\share\dba3_2016-08-29 09-43-58.trn' with norecovery

It will error and display the required LSN - “includes LSN 41000000017300001 can be restored”, which means, we need the one before the LSN of includes this.

 

To understand better, following is the example of restoring error, which is a very recent log backup and some old log backup restore.

An Old log backup error:

Msg 4326, Level 16, State 1, Line 1

The log in this backup set terminates at LSN 39000000023900001,

which is too early to apply to the database. A more recent log backup that includes LSN 41000000017300001 can be restored.

Msg 3013, Level 16, State 1, Line 1

RESTORE DATABASE is terminating abnormally.

The Latest log backup error:

Msg 4305, Level 16, State 1, Line 1

The log in this backup set begins at LSN 41000000017600001,

which is too recent to apply to the database. An earlier log backup that includes LSN 41000000017300001 can be restored.

Msg 3013, Level 16, State 1, Line 1

RESTORE DATABASE is terminating abnormally. 

I just run the two restore command with the latest and old log backup, one says it is “too early to apply” which means it is a before LSN and other says “too recent to apply” which means it is a after LSN.

Example:

39000000023900001 – Before/early LSN

41000000017300001Required/includes LSN

41000000017600001 – After/Recent LSN

 

It is a little hard to find which backup set holds the required LSN which is included.

If we know the time when it breaks, you can run restore command one by one from that time or before and can find out the backup file. OR run “restore headeronly from disk = '\\share\dba3_2016-08-28 07-22-32.bak'” to match closer to the required LSN comparing with first LSN.

Again, it is tough when it generated lots of transaction log backup. The above example the LSN are from the “restore headonly” and “restore database” both are showing the same.

Step 2:

We know the required LSN set”41000000017300001” from the restore error.

Use the dynamic backup script add the condition “and b.first_lsn<=41000000017300001” to get the LSN which is early or smaller than the needed (OR) which includes the required LSN with “order by b.first_lsn  desc” clause.

Copy the first_LSN in the first row “41000000016700001”. It is the breaking point. (Error: log backup that includes LSN "41000000017300001" can be restored.)

 

Step 3:

Use the dynamic backup script add “and b.first_lsn>=41000000016700001” with “ORDER BY b.backup_finish_date”. It will take all the required backup files to be applied in a sequence.

 

Another Method: It is a bit easy one for alwaysON.

Use the following command it will display the exact required LSN in the “s.truncation_lsn”, which is same as “First_LSN” =41000000016700001. The “s.recovery_lsn & s.last_hardened_lsn” will show the recovery LSN which is same as “includes LSN” = 41000000017300001.

Find the required LSN of using “DMV of alwaysON” and note the truncation_lsn.

 

select database_id,db_name(database_id),name,replica_server_name,synchronization_health_desc,synchronization_state_desc,
s.truncation_lsn,s.recovery_lsn,s.last_hardened_lsn,s.last_sent_time,s.last_received_time,s.last_redone_lsn,
s.end_of_log_lsn,s.last_commit_lsn,s.last_sent_lsn,s.last_received_lsn,s.last_commit_lsn
from sys.dm_hadr_database_replica_states S join  sys.availability_groups G on (s.group_id=g.group_id)
join sys.availability_replicas   GR on (GR.group_id=g.group_id)
--where synchronization_health_desc <>'HEALTHY'
and db_name(database_id)='dba3'
and synchronization_health_desc ='NOT_HEALTHY'
-- change the DB name and health status

For more about columns read the MS DMVs link:

https://msdn.microsoft.com/en-us/library/ff877972.aspx

 

Run the dynamic backup script with the filter clause of the “s.truncation_lsn” “41000000016700001”

SELECT  'restore database dba3 from disk= ''' +f.physical_device_name+''' with norecovery',
b.backup_finish_date,b.first_lsn,b.last_lsn,b.database_backup_lsn,b.checkpoint_lsn,
b.type,b.is_copy_only,b.recovery_model,b.backup_size /1024/1024 AS size_MB,
b.server_name ,b.database_name,b.user_name, f.physical_device_name,b.*
FROM MSDB.DBO.BACKUPMEDIAFAMILY F
JOIN MSDB.DBO.BACKUPSET B
ON (f.media_set_id=b.media_set_id)
WHERE database_name like'dba3'
--and b.backup_finish_date>='2016-08-29 05:50:00.000'
AND b.first_lsn>=41000000016700001
AND B.type='L'
ORDER BY b.backup_finish_date

Copy the restore script and execute in the LSN break server, if you know the log backup only configured and run on the server. Mostly in two node alwaysON, it is configured and run more server see the following and use it.

Join the database into the alwaysON group.

ALTER DATABASE [dba3] SET HADR AVAILABILITY GROUP = [AG-Test];

 

How to fix if the log backup runs more than one secondary replica, when the break happens.

I had a case in more than one replica server, which are two in the primary data center and two are in DR data center.

We can use the above command and steps and can run on all the available replica server and compare the results and can prepare the restore command. But, it will take a little time and maybe some confusion as well. I used “Xp_cmdshell” to get the files from the backup folder and insert into a table to prepare the restore command.

To find the required transaction backup file, we need the required LSN:

1. Get the required LSN either from “DMV of alwaysON” OR from restore error 2. Get the backup file name by using “Dynamic backup script” by passing the required LSN. Filter and compare it with the “Xp_cmdshell” results.

It is a different test, so the LSN will differ from the previous one. 

1. You can use the above command “DMV of alwaysON” to find the required LSN, but it will show all the replica server, you just note the unhealthy of database server of required LSN from “s.truncation_lsn”

2. Use “Dynamic Backup script” and filter it “AND b.first_lsn>=41000000021500001”  with “ORDER BY b.backup_finish_date”. Note the “f.physical_device_name”. That is “\\share\dba3_2016-08-30 05-10-00.trn

Step 1: Enable the xp_cmdshell

 

EXEC sp_configure 'show advanced options', 1; 
RECONFIGURE; 
EXEC sp_configure 'xp_cmdshell', 1; 
RECONFIGURE;

Step 2: Create table to export the backup files

--drop table tbl_backup_filename
create table tbl_backup_filename (id int identity, Bak_filename varchar(500))
insert into tbl_backup_filename
exec xp_cmdshell 'dir \\share\SQL-Backup\QA\DBA_Test /b /O:D' -- Bare format sort by date

Step 3: Compare the backup file name with the export backup file name and note the identity ID.

select * from tbl_backup_filename
--\\share\dba3_2016-08-30 05-10-00.trn
select * from tbl_backup_filename where Bak_filename like '%dba3_2016-08-30 05-10-00%'

Step 4: Run the dynamic SQL restore command to generate the restore script with the condition of identity number, which you got it from above step.

select 'restore headeronly from disk=''\\sclfilip13\MFGProcess\SQL-Backup\QA\DBA_Test\'+Bak_filename+'''', *
from tbl_backup_filename where id >=76
-- DB
select 'restore database dba3 from disk=''\\sclfilip13\MFGProcess\SQL-Backup\QA\DBA_Test\'+Bak_filename+''' with norecovery', *
from tbl_backup_filename where id >=76

Step 5: Disable the xp_cmdshell

 

EXEC sp_configure 'xp_cmdshell', 0; 
RECONFIGURE; 
EXEC sp_configure 'show advanced options', 0; 
RECONFIGURE;

 

If you have a backup folder separate for a replica server, then you need to export all and can compare it with all replica servers.

If anyone interested in the test script, please drop me a note. It is always great to share the my findings and learning with you all.

Happy learning!

 

Muthukkumaran Kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts

7Aug/160

AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

AlwaysON database NOT SYNCHRONIZED and RECOVERY PENDING

 

I have recently come across an issue that, one of my alwaysON secondary replica databases are went into “NOT SYNCHRONIZED and RECOVERY PENDING” state. It is a geo cluster alwaysON with 4+2 nodes configured in both synchronous and asynchronous mode. 

The issue is when the database removed from the primary replica, with the secondary disconnection the higher database IDs on the secondary went into “NOT SYNCHRONIZED and RECOVERY PENDING” state, but the lower database IDs are good and synchronous state only.

I just checked with one of the Microsoft MSFT and he said it considers as a bug. “Since, it occurs a DDL change on the Availability Group as the primary, while a secondary replica server is down”.

I just reproduced the same. It is a two node synchronous replica server.

Version: Microsoft SQL Server 2012 (SP1) - 11.0.3000.0

I have not tested it on SQL 2014.

 

Testing:

1. Create five databases.

CREATE DATABASE dba

CREATE DATABASE dba1

CREATE DATABASE dba2

CREATE DATABASE dba3

CREATE DATABASE dba4

SELECT * FROM SYS.SYSDATABASES

2. Created a test availability group and added a five databases in synchronous mode.

3. Stop SQL Instance for Secondary Replica.

4. On the Primary Replica issue TSQL to remove a database from availability group.

 

ALTER AVAILABILITY GROUP [AG-Test] REMOVE DATABASE dba2;

 

5. Start SQL Instance for the Secondary Replica.

You can see that, databases whose IDs are higher than the one removed dba2 are going to “NOT SYNCHRONIZED / RECOVERY PENDING” mode.

DB ID – 9, 10 for dba,dba1 - Good

DB ID – 11 for dba2 - Removed DB

DB ID – 12,13 for dba3, dba4 went into SYNCHRONIZED / RECOVERY PENDING.

 

SELECT DATABASE_ID,DB_NAME(DATABASE_ID),NAME,REPLICA_SERVER_NAME,

SYNCHRONIZATION_HEALTH_DESC,SYNCHRONIZATION_STATE_DESC,S.*

FROM SYS.DM_HADR_DATABASE_REPLICA_STATES S JOIN  SYS.AVAILABILITY_GROUPS G ON (S.GROUP_ID=G.GROUP_ID)

JOIN SYS.AVAILABILITY_REPLICAS   GR ON (GR.GROUP_ID=G.GROUP_ID)

--WHERE SYNCHRONIZATION_HEALTH_DESC <>'HEALTHY'

--AND REPLICA_SERVER_NAME =''

 

The fix and work around is resume other databases that are stuck in secondary replica.

6. On the Secondary replica that has stuck databases, for each stuck database remove (will make it DB into restoring mode) and add (will make it DB into synchronized mode).

Remove a Secondary Database from an AG --> https://msdn.microsoft.com/en-us/library/hh231120.aspx

 

ALTER DATABASE [DBA3] SET HADR OFF

ALTER DATABASE [DBA4] SET HADR OFF

Join a Secondary Database to an AG --> https://msdn.microsoft.com/en-us/library/ff878535.aspx

 

ALTER DATABASE [DBA3] SET HADR AVAILABILITY GROUP = [AG-TEST]

ALTER DATABASE [DBA4] SET HADR AVAILABILITY GROUP = [AG-TEST]

Repeat for each stuck database.

The database that had been removed <dba2> could be in any number of states, but after it finishes its recovery process, it should return to a RESTORING state.

7. On primary replica, Add the removed database --> primary replica --> right click AG databases-->select database --> join only --> connect the secondary --> finish.

In case, if you get any following error, check the backup table and restore a log backup with norecovery mode in all the secondary replica. Which are all missed or run during that time.

Msg 1478, Level 16, State 211, Line 1

The mirror database, ‘DBA2’, has insufficient transaction log data to preserve the log backup chain of the principal database.  This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database.

 

Added: Thanks Amit Banerjee for your reply. I just reported to the connect.

https://connect.microsoft.com/SQLServer/feedback/details/3022019

 

 

Muthukkumaran Kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts

3Aug/166

AlwaysON move database without breaking HADR

Move database without breaking alwaysON

 

This post is going to show the database movement from one drive to another drive, without breaking the database from alwaysON configuration. An application has created the many databases to both primary and secondary replica servers to the default location of C drive.

There are methods like detach/attach, backup/restore & alter database. For alwaysON HADR servers, the best method is Alter database. Since it is in the mirror/sync mode. Detach/attach will not work and backup/ restore, we need to break databases from the HADR.

Note:

It’s a two node synchronous replica, if you have more replica, you should plan each well.

If you have any standalone database (Without adding DBs into HADR), you need to plan for the downtime. Since, the secondary replica servers need to take down.

In primary, for high transaction system, make sure you have good space for transaction files.

Steps:

Before going to start the database movement, write the script for each step and you can write a dynamic SQL for larger number databases. That’s what I did, since I had many databases. It will minimize the time.

 

ALTER DATABASE <DB name> SET HADR suspend

GUI: Expand the AG group and right click the DB -->suspend data movement

  • Change the Readable secondary to ‘NO’ for all the secondary replicas, otherwise you will get an error.
  • To use ALTER DATABASE, the database must be in a writable state in which a checkpoint can be executed.

Right click the primary replica alwaysON group --> properties --> Readable secondary  --> No

 

  • Note down all the files and file location from the system tables.

 

select db_name(a.database_id),a.name,a.physical_name,size/128.0 AS CurrentSizeMB,*

from sys.master_files a join sys.databases b

on a.database_id =b.database_id

--where a.physical_name like 'c%'

--and a.database_id >4

--and type_desc <>'rows'

order by a.name

 

  • On the mirror server run the “ALTER DATABASE <DB name> MODIFY FILE” command. You need to run this for each database.

 

use master

ALTER DATABASE <DB_name> MODIFY FILE (NAME =<logical name>

,FILENAME ='F:\SQL_DATA\DB_name.mdf')

 

  • On the secondary replica server stop the SQL Server instance.

 

  • Move the database file (MDF & LDF) files to the changed location (Cut & Paste).
  • Start the SQL Server instance and check the file locations using the above query.

 

  • In primary replica server resume the database by using the following ALTER DATABASE statement:

 

ALTER DATABASE <DB name> SET HADR Resume

GUI: Expand the AG group and right click the DB -->resume data movement

 

  • Change the Readable secondary to ‘Yes’

Fail over and repeat the steps for the partner server.

Additionally, if you add any files in the primary and the folder name is incorrect in the secondary, the database will go into suspend mode.

Just check the error log, you can get more info on, why the database is suspended mode.

Error: 5123, Severity: 16, State: 1.

CREATE FILE encountered operating system error 3(The system cannot find the path specified.) while attempting to open or create the physical file 'L:\SQL_log\DB_name.ndf'.

 

Created the folder in secondary replica and resume the database, the resume database command will create a data file.

ALTER DATABASE <db name> SET HADR RESUME;

 

Added:

After a reply from Richard L. Dawson – I thought to add this as well. Thanks for that. This will help others as well.

You cannot take the database offline in both primary and secondary replica, unless we remove from alwaysON. Removing the database in the primary will make your secondary database into the recovery mode it is a default configuration.

Same as for secondary, you cannot remove it, it will make the databases into recovery mode and moreover the database will not remove from AG group.

Msg 1468, Level 16, State 1, Line 1

The operation cannot be performed on database "DB" because it is involved in a database mirroring session or an availability group. Some operations are not allowed on a database that is participating in a database mirroring session or in an availability group.

Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

So moving physical database files needs a down/offline, we cannot move database and it requires the database need to be standalone not in the AG. We can run the alter database command and maybe can try this “DBCC SHRINKFILE (A, EMPTYFILE)”. We know, how tough this is. http://www.bobpusateri.com/archive/2013/03/moving-a-database-to-new-storage-with-no-downtime/

 

We can remove the database from AG and can do this like, the process we follow for the normal database. If you have only one database and can afford downtime. Yes, we can do this. (OR) we can completely start from the scratch. There are many methods.

(1.Remove database from AG 2. Alter database and modify the file 3. Set offline 4. Cut & paste the physical file to the new location 5. Set back to the database online 6. Add the database into AG)

Note: If any log backup run during this process the LSN will not match with all the secondary replica, you need to restore with norecoery and need to add the database.

http://www.sqlserverblogforum.com/2016/08/how-to-solve-the-lsn-mismatch-in-sql-server/

 

 

Muthukkumaran Kaliyamoorthy

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

More Posts