Avoiding Split-Brain Scenarios for Cybersecurity Solutions

Morey Haber, Chief Technology Officer
May 16th, 2018

When one half of your cybersecurity solution is not aware of what the other half is doing.

When architecting a fault tolerant or high availability solution, one of the conditions that stymie designers is a split-brain configuration.

If you’re not familiar with the terminology, “split-brain” is essentially what happens when two or more resources are supposed to be synchronized but somehow loses referential integrity, operate independently, and begin storing and processing information without synchronizing content. This leads to a split-brain configuration where the data is different, not easily reconcilable, and none of the resources are the record of authority.

Recovering from a split-brain scenario is typically difficult. It requires the merging of database records from transaction logs or making a judgment call to lose some information while the solution is recovered and placed back into a high availability state. Neither of these scenarios is desirable to the end user, and unfortunately signals a potential flaw in the vendor’s design or client’s implementation. This leads us to our story, client, and vendor security – how to avoid split-brain scenarios cybersecurity solutions.

Minimizing split-brain problems

The first problem we need to address is when a split-brain scenario can occur. We do not typically think of security solutions as Tier 1 applications but modern implementations of identity and access management (IAM), privileged access management (PAM), Firewalls, Intrusion Prevention System (IPS) etc., have all become just as important any other information technology (IT) resource.

If they fail open, then threats can circumvent your defenses and potentially own the environment. If they fail closed, then environments can experience an unexpected outage disrupting operations. Neither scenario is acceptable. To that end, these technologies have been elevated to Tier 1 status and must be operational all the time, with minimal to no downtime, and have the infrastructure in place to stay fault tolerant and highly available. It is important to note that fault tolerance and high availability have different requirements for a deployment.

Therefore, as a Tier 1 service, multiple databases may be needed, disaster recovery may require special considerations, and a single point of failure from a service to network switch cannot be the reason for an outage. These must be designed in as a part of the solution and any technology deployed immune from a split-brain condition.

The second problem to consider when minimizing the risk of a split-brain scenario is the enterprise readiness of resources. While this may seem like a “can of very expensive worms,” choosing a database design or other hardware based on a third-party vendor with no formal support is not the best decision for a Tier 1 application.

For example, with fault tolerance, if you are dependent on a server with a single power supply then you obviously have a single point of failure. Realistically, it is cost prohibitive to cover every use case, but this is why server manufacturers typically put dual power supplies in servers. In addition, for high availability, if the design does use multiple database instances for replication (which is normally the root cause of split-brain scenarios outside of solutions that use file replication) why would you consider an application that does not natively support recovery, backup, and reconciliation and lacks technical support in case of a critical situation?

To be blunt, many open source databases are not ideal for Tier 1 applications because of these limitations, unless someone can truly provide technical support to match or you have the in-house expertise to manage the requirements. This is where a vendor’s design and promises typically exceed the real-world expectations of the client. While a single fault can introduce a split-brain scenario, designs should consider use cases for both (fault tolerance and high availability) that can lead to this predicament. The trick is the balance between both and the cost and resiliency with the underlying technology and its support.

Ultimately secure

Finally, cybersecurity solutions themselves need to be secure from potential threats. This means that data at rest, in a lab, and live in operations needs to be protected. While this may seem outside of a split-brain scenario, let’s explore how it is extremely relevant. Consider a backup of a Tier 1 database.

For many solutions, the restoration of the database may cause operational issues if runtime changes stored in the database impact business continuity. For example, consider an enterprise-ready password management solution. The backup is a snapshot in time of all the current passwords, and as the backup ages, it differs from credentials used in production. If a full restore is performed, stored passwords would not be equal operational values and you have a split-brain problem that needs to be reconciled.

Typically, password managers rectify this problem by changing all the passwords, so they are now in sync, but the backup reflects the problem. This implies that databases used in Tier 1 applications cannot be simple backups, as they may have a split-brain problem if a restoration is required. They need to replicate in real time, backup in real time, and be dependent on time-based versioning to avoid this problem. And, if the data is used for anything else, its encryption, protection, and prevention from misuse are considered so that static copies, in case of a vulnerability, do not provide a source of data for threat actors to exploit.

If you need an example of this, think of the Uber breach. The production data compromised was out of date, but still had plenty of relevant information that was a high risk for consumers. Yes, it was a backup database, but it was not properly protected and all data from a Tier 1 application should be protected regardless of utilization. While this is not a split-brain problem, detaching a production Tier 1 database and testing or recovering it can certainly create one.

Failures in your high availability cybersecurity solution can lead to split brains

As with any story, there is a moral. When considering Tier 1 security applications, consider the use cases that will create split-brain scenarios. They are undesirable, and the architecture and technology choices made by your business, and the supplying vendor, need to avoid these problems. This is where the expectations of the client and the vendor also create a split-brain problem and the expectations need to be analyzed up front—before a real production problem occurs.

Editor’s note: This article was originally published on CSO Online as part of the IDG Contributor Network.

Morey Haber, Chief Technology Officer

With more than 20 years of IT industry experience and author of Privileged Attack Vectors, Mr. Haber joined BeyondTrust in 2012 as a part of the eEye Digital Security acquisition. He currently oversees BeyondTrust technology for both vulnerability and privileged access management solutions. In 2004, Mr. Haber joined eEye as the Director of Security Engineering and was responsible for strategic business discussions and vulnerability management architectures in Fortune 500 clients. Prior to eEye, he was a Development Manager for Computer Associates, Inc. (CA), responsible for new product beta cycles and named customer accounts. Mr. Haber began his career as a Reliability and Maintainability Engineer for a government contractor building flight and training simulators. He earned a Bachelors of Science in Electrical Engineering from the State University of New York at Stony Brook.