NSX 6.2 Communication Channel Health Check

 
In NSX 6.2 we introduced a new feature to provide users with a way to check the communication channel between NSX Manager, the control plane agent (netcpa) and the firewall agent. If the channel is broken, the NSX manager will perform a sync operation to attempt to recover. The following communication channels are checked, along with their intervals below.
 

  • NSX Manager to Firewall agent – A heartbeat is sent every 3 minutes, if two iterations are lost a sync will occur
  • NSX Manager to Control Plane Agent – A heartbeat is sent every 2 minutes, if two iterations are lost a sync will occur
  • Host to controller – Heartbeats are sent every 30 seconds, if three iterations are lost a sync will occur

 
To verify the connection, log into the vSphere Web Client and navigate to -> Networking & Security -> Installation -> Host Preparation -> Actions -> Communication Channel Health
 
1
 
As you can see from the image, everything in this specific cluster is healthy and all channels are up.
 
2
 
Let’s play around with stopping the firewall agent and the control plane agent to see the new status. Log into the ESXi host and run the following commands.
 

[root@esx-01a:/var/log] /etc/init.d/netcpad status
netCP agent service is running
[root@esx-01a:/var/log] /etc/init.d/netcpad stop
watchdog-netcpa: Terminating watchdog process with PID 35036
Memory reservation released for netcpa
netCP agent service is stopped

 
As you can see above, I have stopped the netcpad service. If I run the channel check again, you will see the NSX Manager to control plane agent as Down
 
Note: As mentioned, it will take approximately 4 minutes to update the status as it will have to lose two heartbeats at 120 seconds each.
 
3
 
I will also stop the firewall agent, and you will see the NSX Manager to Firewall Agent as Down
 

[root@esx-01a:/var/log] /etc/init.d/vShield-Stateful-Firewall status
vShield-Stateful-Firewall is running
[root@esx-01a:/var/log] /etc/init.d/vShield-Stateful-Firewall stop
watchdog-vShield-Stateful-Firewall: Terminating watchdog process with PID 35474
vShield-Stateful-Firewall stopped
watchdog-dfwpktlogs: Terminating watchdog process with PID 35454
Resource pool 'host/vim/vmvisor/vsfwd' released.

 
4
 
If you like to use REST API calls to check the communication health, you can do so as shown below. The call is GET https://nsxmanager/api/2.0/vdn/inventory/host/host-28/connection/status
 
5
 
It’s possible to change the default interval that the host will send heartbeats by editing the file and line shown below. The value below is in seconds. You can also change the other iterations, however you will need to call support for this as you will need root shell access to the NSX manager. Sorry 🙂
 

[root@esx-01a:/var/log] cat /usr/lib/vmware/netcpa/etc/netcpa.xml | grep Heart
      120

 
To troubleshoot, or find out in the logs if you have had any connection issues, or force sync events you will look in the logs below for the following entries.
 
NSX Manager Log (show log manager): Messages lost for application: , sequence number from it: , but on VSM

netcpa.log (/var/log/netcpa.log): Got mismatched VSM seqNum with host seq
 
If you received these error messages you should a sync shortly after. The sync messages reported in the netcpa log is “Received full sync notification message
 
I have had customer’s ask me if something existed to check the health of the NSX components and sure enough it does! I am hoping these checks help point everyone in the right direction if something is not working. Please feel free to let me know if you have any questions or comments!
 

Posted by:

Sean Whitney

Leave A Comment

Your email address will not be published. Required fields are marked (required):

Back to Top