Thursday 22 September 2016

RDP re-connection / disconnection events when running behind haproxy

In my case client's were being disconnected during RDP sessions and although appeared to be random - sometimes there was a noticeable pattern e.g. disconnection occurs ever x minutes - which led to me believe that the problem might be related to server / client configuration (e.g. keep alive / timeout issues).

This kind of issue could also be related to problems such as an unreliable network connection somewhere between the backend server.

We should firstly verify whether RDP server keep alive / timeout configuration is setup as needed on the RDS server:

https://technet.microsoft.com/en-us/library/cc754272(v=ws.11).aspx

There are also several settings that could potentially cause these kind of problems with haproxy itself:
timeout server
timeout client
To further isolate the issue we should monitor the RDP session when directly accessing the RDP server - using the 'Applications and Services logs >> Microsoft >> Windows >> TerminalServices-LocalSessionManager >> Operational' event log we can monitor disconnection / re-connection events on the local RDS server by filtering the following events:

Event ID 25: Re-connection
Event ID 24: Disconnection

Since in my case the RDP session is actually reestablishing itself part way through a user session this information should come in handy when diagnosing issues affecting the application layer.

We can utilize a utility such as mtr (or smokeping) to monitor latency and a utility that will monitor bandwidth availability for us such as iperf - so we will use these utilities to helps us gain an overall picture of the connection quality across several different paths:

Client >> Reverse Proxy
Reverse Proxy >> RDP Server

We should firstly use mtr (winmtr for Windows machines) to monitor a connection between the client and reverse proxy - such as:

mtr 1.2.3.4

We will firstly monitor the bandwidth throughput between the client and reverse proxy - on the reverse proxy we should issue something like:

iperf3 -s -p 5003 &

and on the client:

iperf3 -c 1.2.3.4 -i 1 -t 720 -p 5003

* Where -t defines how long (in seconds) to run the test for and -i defines how often data should be reported to the user.

Now if (hopefully) we can recreate the problem or wait for it to happen at least we can use tcpdump (or something similar) to capture the traffic from the above paths while this is happening to get a deeper view of what exactly is going on e.g. on the server we could issue something like:

tcpdump -i ethX -w out.pcap host 1.2.3.4 and port 3389

* where 1.2.3.4 is the client.

Interestingly it appeared at least from the pcap dump that RDP seems to only send data when my RDP session window was active and stopped sending data when a visible RDP window was idle for more than 60 seconds - which I imagine is part of the RDP bandwidth optimization techniques - after several minutes I noticed a tcp segment was being sent with a FIN flag - that (obviously) was closing the TCP session and hence forcing the end-user to re-establish the connection  to the reverse proxy again.

The solution involved increasing the 'timeout server' and 'timeout client' directives to something a little higher than the defaults. Although this does carry a heavy risk (especially with public facing endpoints) of client's potentially performing a Slowloris style attack.

Another (more suitable) alternative it to set a keepalive on the RDS server side to ensure they connection remains open.

0 comments:

Post a Comment