Just the other night, I’d received a call from the guys about NetApps changes affecting virtual machines in Hyper-V that has caused automatic migrations to be triggered. There was a lot of confusions, and our Hyper-V clusters had proven to be be mostly resilient. Most guest virtual machines (VMs) were back online after 10 minutes of confusion. However, there were some that still showed as unreachable by Nagios. I’d manually consoled into each VM to fix them. Generally there were two main issues causing machines to remain offline:
- Windows 7 blue screens
- Windows 7 automatic repair wizard causing reboots to become stuck
- Windows 10 machines with static IP addresses shown as duplicate due to a conflict with another machine on the network. Per protocol, the NIC would render itself as offline if it detect IP conflicts. In my case, I’ve traced the collisions and forcefully set 1 machine to be authoritative toward its assigned IP. Here is the command to reset the ARP count so that it will refresh the ARP table and not mark that IP address as a duplicate.
Set-ItemProperty "REGISTRY::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" -name "ARPRetryCount" -value 1; restart-computer