IIS Server Troubleshooting

Issue:

There has been a time when a production IIS server becomes unreachable via RDP, WinRM, and RPC. Here’s the story of that scenario:

1. List Dynamic Ports
SHELL> netsh int ipv4 show dynamicport tcp
Protocol tcp Dynamic Port Range
---------------------------------
Start Port : 1025
Number of Ports : 64511

2. Show existing connections
SHELL> netstat | findstr -i "ESTABLISHED LISTEN CLOSE_WAIT TIME_WAIT"
TCP 127.0.0.1:80 192.168.2.250:4845 TIME_WAIT
TCP 127.0.0.1:80 192.168.2.250:29244 TIME_WAIT
TCP 127.0.0.1:80 192.168.2.250:31519 TIME_WAIT
TCP 127.0.0.1:80 192.168.2.250:39922 TIME_WAIT
TCP 127.0.0.1:80 192.168.2.250:55248 TIME_WAIT
TCP 127.0.0.1:80 192.168.2.250:63718 TIME_WAIT
TCP 127.0.0.1:42708 EAFBL:http ESTABLISHED
TCP 127.0.0.1:42709 EAFBL:http ESTABLISHED
TCP 127.0.0.1:43974 EAFSQL:ms-sql-s ESTABLISHED
TCP 127.0.0.1:47371 edwin1:http ESTABLISHED
###################### Trucated for brevity #####################
TCP 127.0.0.1:8402 CONTRA001:42673 ESTABLISHED
TCP 127.0.0.1:42652 CONTRA001:42653 ESTABLISHED
TCP 127.0.0.1:42653 CONTRA001:42652 ESTABLISHED
TCP 127.0.0.1:42673 CONTRA001:8402 ESTABLISHED
TCP 127.0.0.1:47001 CONTRA001:53423 ESTABLISHED
TCP 127.0.0.1:53423 CONTRA001:47001 ESTABLISHED

3. Find PID of offending process(es) running on such ports
SHELL> netstat -aon | findstr :808

4. Find Process Name of PID
SHELL> tasklist /fi "pid eq 3008"
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
w3wp.exe 3008 Services 0 1,400,800 K

5. Checking the Event Logs and Addressing various errors/warnings:

Event ID 56
1. Disable IPv6
2.Disable all SNP Features:
netsh int tcp set global chimney=disabled
netsh int tcp set global rss=disabled
netsh int tcp set global taskoffload=disabled
netsh int tcp set global autotuninglevel=disabled
netsh int tcp set global congestionprovider=none
netsh int tcp set global ecncapability=disabled
netsh int tcp set global timestamps=disabled
3. Disable IPv4 Large Send Offload, Checksum Offload, and TCP Connection Offload

Event ID 4427
1. The location of the TcpTimeWaitDelay is:
HKEY_LOCAL-MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters. Add REG_DWORD named TcpTimedWaitDelay, we may set the value to 30 seconds, by default, the value is 4 minutes.
Here is the detailed information about TcpTimedWaitDelay:
https://technet.microsoft.com/en-us/library/cc938217.aspx
2. Then we may use command netsh int ipv4 set dynamicport tcp start=10000 num=20000 to expand dynamic port range.

Event ID 5719
System can't find DC
Configure the Netlogon registry setting to a value that is safely beyond the time that is required allow DC connectivity. Please note this is only effective if the machine already has an IP address. This applies to scenarios where a NAP solution puts the machine into a quarantine network. Use the following settings as guidelines
Registry subkey: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Netlogon\Parameters
Value Name: ExpectedDialupDelay
Data Type: REG_DWORD
Data Value is in seconds (default=0)
Data Range is between 0 and 600 seconds (10 minutes)

Event ID 36871 - nothing to do
FSMO must be accessible from this server

A suggestion is to set the IIS ApplicationPoolIdentity ‘s “Maximum Worker Processes” to = 1. Here are some source documentation: https://docs.microsoft.com/en-us/iis/configuration/system.applicationhost/applicationpools/add/cpu and https://serverfault.com/questions/563140/the-relationship-between-iis-application-pool-maximum-worker-processes-and-compo

Resolution:

From the System Admin / Dev Ops perspective, this issue is very illusive. The only clue we have with this scenario, which has not been mentioned previously, is that some new code has been deployed on the Application. If I were to guess, this may have to do with memory management of the SQL connections being overflown. There are tools that can validate that theory. However, it’s sometimes not appropriate to derive at a root cause and trigger opening “a can of worms” with regards to highlighting a developer’s possible mistake. We all do make mistakes at various occasions. Hence, the resolution here is to bypass any further discovery to proceed with a code roll-back plan. Another has been saved, and all will be forgiven as well as forgotten. 

Leave a Reply

Your email address will not be published. Required fields are marked *