This article discus <AuthBy ROUNDROBIN>
feature of
Radiator. That feature allows proxying of requests to multiple proxy
RADIUS servers and distribution of load equally to all configured
servers, but because distribution of packet is not session aware it is
usable only for simple RADIUS packets not to EAP streams.
Misuse of this technique seams to be quite common on sites running Radiator RADIUS in eduroam. I hope to show why is bad and why it should not to be used.
Eduroam infrastructure is usually drawn as hierarchical tree, with top showing interconnection of NREN's and end-leafs representing organisations. Image bellow shows real lines existing between RADIUS servers of hypothetical orgA.cz and orgB.de, assuming that NREN have two servers, also TOP level has two servers and orgB.de have two too.
EAP session begins at notebook of user@orgB.de visiting organisation orgA.cz. User's request for network access starts exchange of number Access-Request and Access-Challenges pairs and terminates with Access-Accept of Access-Reject packets.
Let study which servers participate on communication. Communication begins at reset of all systems and there is no other communication running in parallel to this. That assumptions cannot be fulfilled in real world but in lab it is possible, even it's hard to synchronise all events.
In table bellow A-Req stands for Access-Request, and A-Ch for Access-Challenge.
visited organisation | CZ NREN | TOP level | DE NREN | home organisation | |||||
1. | orgA.cz (send A-Req) | -> | r1.cz | -> | r1 | -> | r1.de | -> | r1.orgB.de |
2. | orgA.cz | <- | r1.cz | <- | r1 | <- | r1.de | <- | r1.orgB.de (send A-Ch) |
3. | orgA.cz (send A-Req) | -> | r2.cz | -> | r1 | -> | r2.de | -> | r1.orgB.de |
4. | orgA.cz | <- | r2.cz | <- | r1 | <- | r2.de | <- | r1.orgB.de (send A-Ch) |
5. | orgA.cz (send A-Req) | -> | r1.cz | -> | r2 | -> | r1.de | -> | r2.orgB.de (???) |
6a. | r1.de | -> | r1.orgB.de | ||||||
... | <- | r1.orgB.de (send A-Ch) | |||||||
6b. | r2 | -> | r2.de | -> | r2.orgB.de (???) | ||||
6c. | r1.cz | -> | r1 | -> | r1.de | -> | r2.orgB.de (???) | ||
6d. | orgA.cz (re-send A-Req) | -> | r2.cz | -> | r2 | -> | r1.de | -> | r1.orgB.de (???) |
Possible source of troubles begins in step no. 5. At that moment
Access-Request reaches r2.orgB.de which does not know nothing about
work r1.orgB.de is working on . They both ask same user database, but
don't share state. r2.orgB.de should ignore this packet, but in some
cases it sends Access-Reject. See source
Radiator/Radius/EAP_25.pm
, function response()
, I
don't know whatever this is right according RFC 3748 or not,
analysis of this problem is out of scope of this document.
If I will assume that r2.orgB.de will ignore packet, than infrastructure will face numerous retransmits from each RADIUS server participating on communication. In points 6a-d will packet get to right server (r1.orgB.de), but also again to bad one.
I'm aware of fact that my "analysis" shown above is not very exact, maybe I missed some serious points which maybe can be used for picking holes and in final distracting from main subject of this article which is to try convince readers that Round Robin feature should not be used for EAP proxy network. Better show how results from experiments in lab.
For tests I'm using virtual servers powered by VMWare server. I was testing with five servers in configuration shown on image below. All tests were performed by the rad_eap_test doing PEAP-MSCHAPv2 authentication.
All tests were done by submiting queries from another computer simulating AP of r1orgC. There were 100 atempts with identity user@orgA simulating visitor. As client was used rad_eap_test.
test | access-accepts [count / avg. duration] |
access-reject [count] |
timeouts [count] |
1. | 99 / 0.85 | 0 | 1 |
2. | 100 / 11.8 | 0 | 0 |
3. | 0 / n.a. | 48 | 52 |
4. | 0 / n.a. | 0 | 100 |
Test no. 1. during this test was Round Robin is defined only on r1orgC. All servers were up, doing 100 tests with user@orgA on r1orgC. Everything was working, in logs was huge amount of not necessary retransmits, but for user it works.
Test no. 2. Round Robin is defined only on r1orgC as in test no. 1, but r2nren is down, rest of servers is up. Purpose of this test is to show how slowdown will authenication process if there is dead backup server. Whole process was slowed down more than 10times. There is also very big amount of retransmits. User is still able to get online.
Test no. 3 Round robin was defined on r1orgC, r1nren and r2nren, every server was up. User never get online. Servers r1orgA and r2orgA usually says in log files one of following messages:
Deleting session for semik@orgA.etest.cesnet.cz, 127.0.0.1, Handling with Radius::AuthFILE: CheckFILE Handling with EAP: code 2, 104, 6 Response type 25 EAP PEAP Nothing to read or write AuthBy FILE result: IGNORE, EAP PEAP Nothing to read or write
Deleting session for semik@orgA.etest.cesnet.cz, 127.0.0.1, Handling with Radius::AuthFILE: CheckFILE Handling with EAP: code 2, 104, 6 Response type 25 TLS not initialised Authy FILE result: IGNORE, TLS not initialised
Deleting session for semik@orgA.etest.cesnet.cz, 127.0.0.1, Handling with Radius::AuthFILE: CheckFILE Handling with EAP: code 2, 3, 13 Response type 25 EAP TLS SSL_accept result: -1, 1, 8576 EAP TLS error: -1, 1, 8576, 14840: 1 - error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number EAP result: 1, EAP PEAP TLS error AuthBy FILE result: REJECT, EAP PEAP TLS error Access rejected for semik@orgA.etest.cesnet.cz: EAP PEAP TLS error
Test no. 4 Round Robin was defined on r1orgC, r1nren and r2nren as in test no. 3. r2nren was down, rest is up. Doing 100 tests with user@orgA on r1orgC. User never get online.
Round Robin is not intended for use with EAP, it was said by one of developer of the Radiator in their support list.
There is trick which make possible to use Round Robin. It's necessary to put an aggregating server in front of end servers. By end servers I mean r1orgA and r2orgA in my test lab or r1.orgB.de and r2.orgB.de in analysis section. But other parameters seen in tests show it does not worth to invest into another server.
Result of test no. 2 shows that Round Robin infrastructure slows down seriously even if failure of backup server. Result of test no. 4 shows that infrastructure is unstable if there are multiple servers with Round Robin configuration.
It's possible to say that any site which is using Round Robin configuration is abusing eduroam infrastructure by not necessary retransmits which can lead into failure of user authentication.
I show proper configuration of EAP proxy with Radiator in separate article.