Wednesday, 13 December 2017

vSphere Replication 6.5 Bug: 'Not Active' Status

This happened to myself when setting up a brand new vSphere lab with vSphere 6.5 and the vSphere Replication Appliance 6.5.1.

After setting up a new replicated VM I was presented with the 'Not Active' status - although there was no information presented in the tool tip.

So to dig a little deeper we can use the CLI to query the replicated VM status - but firstly we'll need to obtain the VM id number:

vim-cmd vmsvc/getallvms

and then query the state with:

vim-cmd hbrsvc/vmreplica.getState <id>

Retrieve VM running replication state:
        The VM is configured for replication. Current replication state: Group: CGID-1234567-9f6e-4f09-8487-1234567890 (generation=1234567890)
        Group State: full sync (0% done: checksummed 0 bytes of 1.0 TB, transferred 0 bytes of 0 bytes)

So it looks like it's at least attempting to perform the replication - however is stuck at 0% - so now devling into the logs:

cat /var/log/vmkernel.log | grep Hbr

2017-12-13T10:12:18.983Z cpu21:17841592)WARNING: Hbr: 4573: Failed to establish connection to []:10000(groupID=CGID-123456-9f6e-4f09-
8487-123456): Timeout
2017-12-13T10:12:45.102Z cpu18:17806591)WARNING: Hbr: 549: Connection failed to (groupID=CGID-123456-9f6e-4f09-8487-123456): Timeout

It looks like the ESXI host is failing to connect to (the Virtual Replication Appliance in my case) - so we can double check this

cat /dev/zero | nc -v 10000


However if we attempt to ping it:


we get a responce - so it looks like it's a firewall issue.

I attempt to connect to the replication appliance from another server:

cat /dev/zero | nc -v 10000

Ncat: Version 7.60 ( )
Ncat: Connected to

So it looks like the firewall on this specific host is blocking outbound connections on port 10000.

My suspisions were confirmed when I reviewed the firewall rules from within vCenter on the Security Profile tab of the ESXI host:

Usually the relevent firewall rules are created automatically - however this time for whatever reason they have not been - so we'll need to
proceed by creating a custom firewall rule (which unfortuantely is quite cumbersome...):

SSH into the problematic ESXI host and create a new firewall config with:

touch /etc/vmware/firewall/replication.xml

and set the relevent write permissions:

chmod 644 /etc/vmware/firewall/replication.xml
chmod +t /etc/vmware/firewall/replication.xml

vi /etc/vmware/firewall/replication.xml

<!-- Firewall configuration information for vSphere Replication -->
<rule id='0000'>

Revert the permsissions with:

chmod 444 /etc/vmware/firewall/replication.xml

and restart the firewall service:

esxcli network firewall refresh

and check its there with:

esxcli network firewall ruleset list

(Make sure it's set to 'enabled' - if not you can enable it via the vSphere GUI: ESXI Host >> Configuration >> Security Profile >> Edit Firewall

and the rules with:

esxcli network firewall ruleset rule list | grep vrepl

Then re-check connectivity with:

cat /dev/zero | nc -v 10000
Connection to 10000 port [tcp/*] succeeded!

Looks good!

After reviewing the vSphere Replication monitor everything had started syncing again.



Post a Comment