Two-factor authentication system AUTH.AS: Technical overview
Let‘s question ourselves - what is a two-factor authentication system? Without going into details, the answer is quite simple – it’s a system, that helps protect data. But, as you know, the devil is in details, and details are, that the system becomes a subject to very specific requirements and it’s not easy to comply with them.
We defined main requirements for our system:
- Scalability and Resiliency
- Comfort, Functionality and Ease of use
These requirements are sound simple, but, at the same time, complicated in details. We relied on each of these requirements, when we were designing and constructing our system. We used the best solutions, in our opinion, to meet these requirements.
Let’s briefly look at the “technology stack” we are using.
- We are using open-source, Linux - family operating system (OS), as a platform; we took CentOS as a basis. This OS is actively developed and supported.
- As soon as the system have to store data, we took a sufficiently mature and actively developing decision - Apache Cassandra. This is NoSQL database with unique functions, that will be reviewed later.
- Any system, that is a link in the chain of data access, should have a good, scalable logging system. For that, we used a specialized database called “ElasticSearch”.
- We use a web-based interface, as the main working tool, which allows to perform needed operations easy and convenient. No client software - only a browser! Also, we have created a simple and easy to use API, for integration with other software.
The system architecture is relatively simple within a single node - a Linux server, where two databases are set up, each for its purpose: Cassandra, as the primary data store and ElasticSearch as a log and event storage. Also, there are some other applications running on a server, to process system tasks.
- Web-service – handles web-based interface
- API-service – handles integration and access to main system functions via software interface. HTTPS protocol is used to access this service.
- Radius-service – handles the integration with outside systems via RADIUS protocol.
- SSL Offload-service – releases high-load SSL-workflow off the main applications on a special NGINX-based utility.
The Service architecture within a cluster won’t be more complicated – it’s a set of similar nodes, that perform same functions and and contain excessive amounts of data (redundancy) to ensure performance and availability. Each of them can replace another one, taking over “neighbour duty”.
Security is the set of requirements about everything, but nothing in particular. What does Security mean for us and our system auth.as:
1. Operating System
- The open-source Linux-family operating system (CentOS) with the mandatory access control module (SELinux), and file integrity monitoring. The access control module always controls: what system processes access certain files, what system methods were used. That is being controlled by security policies and no process can bypass it. All data about processes access to any resources are logged. File integrity monitoring module calculates basic files checksums on daily basis and compares them with the reference values. Audits are also logged.
- The file subsystem. On-the-fly Data Encryption is available as an option. This feature is extremely useful when the system is hosted on a third-party datacenter and / or when it’s needed to secure the data physically. The encryption is provided using a set of OpenSSL libraries, including GOST module.
- The network subsystem. Preconfigured firewall is used by the system mandatory. By default, only trusted connections are allowed, only to white-listed ports. Administrative access is available only via the console or SSH tools. Only special command-line interface (CLI) is available for the Administrator. System interfaces access, by default, is available only via encrypted connections - HTTPS.
- Logging subsystem. Local log service allows to store all the events for all subsystems, such as SELinux, database, kernel and other. Remote log allows all event logs redirection to the specialized control and monitoring systems and log storages. rSyslog and logstash protocols are currently supported.
2. Regular software updates
- We issue regular AUTH.AS updates on regular basis. We are constantly working to improve the reliability and system performance, doing that, to protect the investments of our customers.
3. Backup operations
- There are full backup copies of all databases available, that are created simultaneously on all nodes of the system. Also, the eventual database backup can be created within a single node. Incremental database backups are also available. Backup files are created “on-the-fly”, that does not require the system to stop and can be easily archived by centralized enterprise backup solutions.
- Logging database backup procedure goes the same way, as the primary database backup. Backup files can be also easily archived.
4. All actions and events logging
- All admins actions are logged;
- Audits of successful/unsuccessful tokens usage by users;
- System errors logging;
- Event logs are available from any system node;
- Event logging can be re-routed to external storage, monitor and archive systems.
5. User rights and permissions management is available.
6. The system doesn’t store user passwords.
- its useless, system stores usernames and token keys only, that data is sufficient for the system to operate.
7. For the user convenience, there are a set of time-based passwords, valid:
- a certain time period
- N amount of times
- till a specified date (in case, when a user lost his token or the token doesn’t work properly).
8. The system is brute-force proof.
9. The system works with white and black lists of access.
Network access to domains is restricted by the internal firewall rules for each type of query sources.
10. Mobile application stores all its data encrypted.
Its has its own PIN-code based protection and supports biometric user authentication method, Touch ID on iOS. In the near future, same functionality will be also available on Android devices.
Scalability and Resiliency
Scalability and Resiliency - two terms, that mean a lot. Every developer understands it in its own way. Let's see, what these words mean in our case.
We are sure, that there is only one fair way of scalability - horizontal, with a linear performance increase. This means, that if one node processes 1000 requests per second, then 2 nodes must process 2000 requests, and 10 nodes - already 10 000. And so, till the customer is satisfied with the performance. We followed this logic when we built our system. And the major resiliency factor is the absence of a single point of failure.
1. The key system element is NoSQL-database Cassandra. Its main advantages:
- High performance;
- Automatic distribution and redundant data storage;
- Automatic database clustering.
2. Another key element is the two-factor authentication system itself – the software package, that performs all the logic of the system, it is self-sufficient and independent in sessions. Its performance is only limited by the network and the database performance.
3. All system nodes are unified and identical by its functions. That brings convenience in scalability and easy, accurate planning.
4. A single node performance is approximately 1000 authentications per second (depends on the server configuration and third-party systems integration).
5. The system checks AD-servers availability every 5 minutes, if it’s set up to interact with the corporate Active Directory, and in the case of a failure, will not interact with faulty AD-server, notifying the administrator about this failure. AD-servers accessibility occurs with each authentication event.
6. Load Balancing is an important factor, when scaling systems. AUTH.AS system processes queries with all available nodes. DNS balancing method (DNS round-robin) is the best fit for multi-node operations. In other words, the system does not require any external hardware or software load balancers when scaling. At the same time, the system supports external load balances, and works fine with it. We use pretty standard protocols, based on HTTP(S), RADIUS. Most of the modern load-balancers, like Citrix Netscaler, Riverbed Zeus, Microsoft Network Policy Server, works fine with our system.
7. The ability to increase system performance in a short period is another important feature of the scalability. As AUTH.AS nodes are pretty much unified, scaling becomes a simple node amount increase.
The most common software delivery solution is the system image, packed into virtualization platform container (VMware, KVM, Hyper-V, Xen). In this case, the upscaling process is just to launch additional virtual machines and to include it to the Cassandra database cluster. It can be done “on-the-fly” and should not take much time. When the automatic data replication on newly-added machines will be done, all that is rest, is to balance the load.
Convenience, usability and ease of use
A few words about our understanding of simplicity and functionality:
- Simple, intuitive and good-looking interface - Web-based frontend built using modern web-technologies, AJAX is used as much as possible, we don’t reload every page with each click. All data are being loaded asynchronously and transparent for the user. Bootstrap 3 framework is used to build web interfaces, which looks neat and accurate. Unique corporate style interface customization is available for our customers.
- When we develop the interface we stick to one common rule – any action must be accomplished in max 3 clicks, and, at the same time, interface must remain simple and intuitive.
- Nice looking and convenient mobile application with an extra means of protection.
- The ability to import hardware tokens data from another two-factor authentication system, using the LazySync function. This feature works “on-the-fly”, in the background, allowing to sync Event-Based tokens.
- TrueOTP increases the security of Time-Based passwords. It blocks the one-time password after its usage. In other systems, Time-Based password is valid till its expiry, even after being used.
- SMS Messaging module is used to confirm the user identity, by sending messages to his phone. That can be used to allow the user to bind, disable, lock or unlock the token by yourself.
- The system includes many monitoring functions: track system parameters, system health, “self-restore” function, load balancing to healthy nodes.
- The convenient software delivery option - the system image, packed into virtualization platform container (VMware, KVM, Hyper-V, Xen).
Working and tested solutions
- Desktop virtualization solution, based on VMware Virtual Desktop Infrastructure, currently, over 1000 desktops are tested and working using the two-factor authentication system AUTH.AS. RADIUS is used as an integration protocol, without intermediate load-balancing.
- Citrix Metaframe virtual applications solution, more than 10 000 concurrent users. RADIUS is used as an integration protocol, with intermediate balancer, based on Microsoft Network Policy Server. The solution has been tested, and operates in a limited production mode. It’s planned to start the full scope usage in the nearest future. Also, it was successfully tested with the Citrix NetScaler, as an alternative load-balancer.
- Corporate portal security solution, based on Microsoft SharePoint Portal, more than 10 000 concurrent users. RADIUS is used as an integration protocol, with intermediate authentication server, based on Microsoft Forefront Treat Management Gateway and an intermediate load balancer, based on the Microsoft Network Policy Server. The solution has been successfully tested, it is now is in a limited productive operation. It’s planned to start the full scope usage in the nearest future.
- 1C-Bitrix: Corporate portal 24 security solution, more than 1 000 users. HTTP-API is used as an integration protocol. The solution has been successfully tested, it is now operates in production environment.
- Remote remote access to corporate network over the Internet, based on the OpenVPN solution, more than 1 000 users. RADIUS and HTTP-API are used as an integration protocols. The solution has been successfully tested, it is now operates in production environment.
- Integration with Identity Service Engine by Cisco Systems was tested and approved. It allows to secure the access to the Cisco network equipment by using two-factor authentication. The implementation process was launched in one the telecom operator.
- Linux-based server security solution, where admin access is only allowed using one-time passwords. RADIUS is used as an integration protocol, through the PAM authentication Linux sub-system.
Download the full Russian version of this AUTH.AS: Technical overview in PDF format