Why you should design for a worst case scenario

Disaster proof your critical infrastructure

If you live in an earthquake zone, it’s important to engineer buildings to survive an earthquake. You don’t know when an earthquake will happen, or where exactly, or how big it’s going to be, but you know that it will happen at some point during the lifetime of the building. And the consequences of not earthquake proofing can be deadly.

The same goes for your critical network infrastructure. At some point, some part of your network will go down. The consequences are not usually deadly, but it can feel that way when it’s happening to you.

Maybe there will be a major power outage in your area.

Maybe a junior sysadmin will type the wrong thing into a terminal window in a moment of inattention.

Maybe an OS upgrade will break some library dependencies.

Maybe a ransomware attack will hold your data hostage.

Maybe a file somewhere will become corrupted.

Maybe someone will accidentally unplug the wrong thing in the server room.

You don’t know what will happen, or how bad it will be. But you know it will happen. If you think that it won’t happen to you, stay in the business for a few more years.

When you are designing your RADIUS infrastructure, you should consider:

How bad would it be for your business if people couldn’t get onto the network?

For most businesses, the answer is somewhere between “really bad” and “catastrophic”.

The good news is that with just a little bit of planning and forethought, you can design your system to be a lot more resilient to failures.

Most of these solutions don’t cost anything other than some extra hardware and some additional disk space. In our experience, the upfront investment in this infrastructure pales in comparison to what you can expect to pay in emergency rates for network specialists when disaster strikes. Not to mention the loss to your business and reputation when your network goes down.

Disaster-proof your RADIUS infrastructure

1) Put your RADIUS server on a virtual machine - by itself. When something goes wrong, all you have to do is revert to a previous snapshot and be up and running again in a few minutes. Using a VM for your RADIUS server is incredibly easy to do and has virtually no downside.  

2) Give your RADIUS server enough resources to withstand unexpected surges in demand. In most organizations, the volume of authentication requests happens in a fairly predictable pattern - until something bad happens and your network goes down. When you bring your network back up, all your users will try to authenticate at the same time. If you haven’t sufficiently resourced your RADIUS server, this can bring your network down again. As a rule of thumb, we recommend limiting the RADIUS VM at no more than 5-10% CPU usage in the “normal” case. Any less than that, and there might not be enough room to deal with spikes in traffic.

3) Put your databases on separate hardware from your RADIUS server. Separate hardware for separate components means that a single hardware failure will be less catastrophic. Maintaining dedicated RADIUS hardware also means that authentication performance won’t be affected by resource-heavy database queries.

4) In multi-site systems, secondary RADIUS servers should be simple clones of the primary one. When RADIUS policies and configuration files are cloned across all sites, a RADIUS server failure at any given satellite location is almost trivial to recover from. A new server can simply be cloned again from the primary RADIUS server within minutes. See our design blueprint for multi-site RADIUS systems for more detail.

5) In multi-site systems, consider deploying two primary instances of the database. Losing access to the database is generally a catastrophic failure for your network. Our recommended design strategy to ensure redundancy for this critical component is to deploy two primary instances of your database. Bear in mind that some up-front engineering effort will be required to ensure that the two primary instances are kept in sync. However, the extra network resilience (and peace of mind!) gained by providing database redundancy far outweighs this additional effort.

By following these simple design best practices, you will set up your network infrastructure to recover much more quickly from unexpected failures, which are inevitable. None of these solutions are expensive in either time or money. It only requires some forethought and planning when initially configuring your RADIUS infrastructure.

Need more help?

Network RADIUS has been helping clients around the world design and deploy their RADIUS infrastructure for 20 years. We specialize in complex systems and have seen pretty much every variation and problem out there. If you want help from the people who wrote FreeRADIUS, contact us for a consultation.

Read more...

Related articles