Saturday 28 March 2015

Failover or switchover to another mailbox database copy in a DAG for maintainence / DR


In this scenario I wanted to re-created a scenario where a mailbox database has been dismounted due to corruption, user error or whatever and let the other (passive) DAG member replicating the mailbox take over (become the active copy.)

There are two types for recovering mailbox databases within DAGs:

- Failover: An automatic process that switches the active database when it detects the current active database has failed.

- Switchover: A process that is manually invoked by the administrator changing the active mailbox database copy (usually in per-determined / scheduled situations e.g. maintenance / backups)

In my environment I have a mailbox database that is replicated as part of a two node DAG, I will forcefully dismount the active copy of the database:

Dismount-Database -Identity "MailboxDatabase1"

Doing this (not unexpectedly!) results in any Outlook clients that have a mailbox situated on the affected DB to return messages such as "Trying to connect..." and finally Outlook will give up and return a "Disconnected" state. We want to change / move the active mailbox database so we can get our users back up and running again - the dismounting process (because manually invoked) does not perform a failover (automatic) so we will use the Move-ActiveMailboxDatabase cmdlet to perform a swithover (manual):

Move-ActiveMailboxDatabase <mailbox-database> -ActivateOnServer <mailbox-server> -MountDialOverride:None

** Note: For the above procedure to work correctly the target database must be in a "Healthy" state. **

Since I had not gracefully dismounted the mailbox database the Content Indexing did not finish on the target server and hence the Content Index State returned a "Failed" state - due to this I received the following error:

An Active Manager operation failed. Error: The database action failed. Error: An error occurred while trying to validate the specified database copy for possible activation. Error: Database copy <mailbox-database> on server <server-name> has content index catalog files in the following state: 'Failed'. If you need to activate this database copy, you can use the Move-ActiveMailboxDatabase cmdlet with the -SkipClientExperienceChecks parameter to forcibly activate the database.

Move-ActiveMailboxDatabase <mailbox-database> -ActivateOnServer <mailbox-server> -MountDialOverride:None -SkipClientExperienceChecks

Repairing the state with Update-MailboxDatabaseCopy "DB Name\Server Name" -CatalogOnly will not work as we do not have access to the database on the other server! So we will use the -SkipClientExperienceChecks switch:

Move-ActiveMailboxDatabase <mailbox-database> -ActivateOnServer <mailbox-server> -MountDialOverride:None -SkipClientExperienceChecks

We will then have to mount the database:

Mount-Database <mailbox-database>

The Content Indexing will begin (since this is now the Active copy) and repair itself automatically.

We can verify that the DAG member is now holding the active copy of the database:

Get-MailboxDatabaseCopyStatus -Identity <MailboxDatabase> | FL *ActiveDatabaseCopy*

We can also get Exchange to perform an automatic failover by imitating a database failure by killing the store worker threads by simply stopping the Microsoft Exchange Information Store service:

sc stop MSExchangeIS

Finally verify that Outlook is now connected successfully and we can also make sure that the appropriate CAS is being used with:

Get-MailboxDatabase |fl Identity, RpcClientAccessServer

0 comments:

Post a Comment