This happened to myself when setting up a brand new vSphere lab with vSphere 6.5 and the vSphere Replication Appliance 6.5.1.
After setting up a new replicated VM I was presented with the 'Not Active' status - although there was no information presented in the tool tip.
So to dig a little deeper we can use the CLI to query the replicated VM status - but firstly we'll need to obtain the VM id number:
vim-cmd vmsvc/getallvms
and then query the state with:
vim-cmd hbrsvc/vmreplica.getState <id>
Retrieve VM running replication state:
The VM is configured for replication. Current replication state: Group: CGID-1234567-9f6e-4f09-8487-1234567890 (generation=1234567890)
Group State: full sync (0% done: checksummed 0 bytes of 1.0 TB, transferred 0 bytes of 0 bytes)
So it looks like it's at least attempting to perform the replication - however is stuck at 0% - so now devling into the logs:
cat /var/log/vmkernel.log | grep Hbr
2017-12-13T10:12:18.983Z cpu21:17841592)WARNING: Hbr: 4573: Failed to establish connection to [10.11.12.13]:10000(groupID=CGID-123456-9f6e-4f09-
8487-123456): Timeout
2017-12-13T10:12:45.102Z cpu18:17806591)WARNING: Hbr: 549: Connection failed to 10.11.12.13 (groupID=CGID-123456-9f6e-4f09-8487-123456): Timeout
It looks like the ESXI host is failing to connect to 10.11.12.13 (the Virtual Replication Appliance in my case) - so we can double check this
with:
cat /dev/zero | nc -v 10.11.12.13 10000
(Fails)
However if we attempt to ping it:
ping 10.11.12.13
we get a responce - so it looks like it's a firewall issue.
I attempt to connect to the replication appliance from another server:
cat /dev/zero | nc -v 10.11.12.13 10000
Ncat: Version 7.60 ( https://nmap.org/ncat )
Ncat: Connected to 10.11.12.13:10000.
So it looks like the firewall on this specific host is blocking outbound connections on port 10000.
My suspisions were confirmed when I reviewed the firewall rules from within vCenter on the Security Profile tab of the ESXI host:
Usually the relevent firewall rules are created automatically - however this time for whatever reason they have not been - so we'll need to
proceed by creating a custom firewall rule (which unfortuantely is quite cumbersome...):
SSH into the problematic ESXI host and create a new firewall config with:
touch /etc/vmware/firewall/replication.xml
and set the relevent write permissions:
chmod 644 /etc/vmware/firewall/replication.xml
chmod +t /etc/vmware/firewall/replication.xml
vi /etc/vmware/firewall/replication.xml
<!-- Firewall configuration information for vSphere Replication -->
<ConfigRoot>
<service>
<id>vrepl</id>
<rule id='0000'>
<direction>outbound</direction>
<protocol>tcp</protocol>
<porttype>dst</porttype>
<port>
<begin>10000</begin>
<end>10010</end>
</port>
</rule>
<enabled>true</enabled>
<required>false</required>
</service>
</ConfigRoot>
Revert the permsissions with:
chmod 444 /etc/vmware/firewall/replication.xml
and restart the firewall service:
esxcli network firewall refresh
and check its there with:
esxcli network firewall ruleset list
(Make sure it's set to 'enabled' - if not you can enable it via the vSphere GUI: ESXI Host >> Configuration >> Security Profile >> Edit Firewall
Settings.)
and the rules with:
esxcli network firewall ruleset rule list | grep vrepl
Then re-check connectivity with:
cat /dev/zero | nc -v 10.11.12.13 10000
Connection to 10.0.15.151 10000 port [tcp/*] succeeded!
Looks good!
After reviewing the vSphere Replication monitor everything had started syncing again.
Sources:
https://kb.vmware.com/s/article/2008226
https://kb.vmware.com/s/article/2059893
No comments:
Post a Comment