Ethereum Has A Client Diversity Problem

The launch date for “the Merge” (merging the Beacon Chain with Ethereum Mainnet) is rapidly drawing close. Per the latest stats, the Beacon Chain has over 300,000 validators and more than 10 million ether (ETH) in validator stakings.

Ethereum’s switch from a proof-of-work to proof-of-stake consensus mechanism has several implications for its future. With the Beacon Chain as a consensus layer, Ethereum PoS promises better network security, energy efficiency, scalability, and decentralization.

However, it also introduces new challenges that could threaten Ethereum’s functionality.

One of those challenges is achieving ideal client diversity. In a decentralized network like Ethereum, diversity of client software is necessary for enhancing network security, functionality, and censorship resistance.

However, current stats show that Ethereum's PoS consensus layer (the Beacon Chain) is lacking in the client diversity department. As we'll find out in this article, centralization of software implementation has critical implications for the Ethereum ecosystem.

ELI5: What Does Client Diversity Mean?

Ethereum is a decentralized peer-to-peer network sustained by a globally distributed collection of nodes (i.e., computers). Each node runs a “client”—this is software that helps nodes to interact with the Ethereum blockchain. Without clients, nodes cannot broadcast and verify transactions, execute smart contracts, or reach consensus on the blockchain state.

Clients are a critical part of Ethereum’s infrastructure, which is why its developers opted for a multi-client arrangement. As opposed to running a single client, nodes are incentivized to run different client implementations—the end goal being that no client should hold a supermajority.

Client diversity simply means having a fair distribution of client software on network nodes. Ideally, no client should power more than 33% of nodes in a decentralized system like Ethereum. Or else, several bad things (which we explore later) can start to happen.

Currently, Ethereum's Beacon Chain doesn't have an equitable spread of client software across nodes running the network. As the chart below shows, the majority client controls > 68% of validators, which can put the system at risk for different reasons.

ethereum consensus client diversity.jpg [Image source]

Client diversity is a hotly debated topic in the Ethereum community. And that's due to its importance to Ethereum's performance, especially in the context of the Merge. The debate also exists because of disagreements on how client diversity should be realized.

So, why exactly is client diversity so important? That's what we're about to find out.

The Importance of Client Diversity

As a distributed system, Ethereum relies on the network of peer-to-peer nodes to function optimally. In turn, those nodes rely on clients to help them contribute to the network.

Any flaws or bugs in client software will affect a node's participation in the network. If the problem is constrained to one client, then the network can continue to function. Other nodes running alternative clients will manage the blockchain, while the affected nodes can switch to a non-affected client.

The above scenario can only work if, and only if, a small number of nodes are using the affected client software. A case where the majority of nodes run the malicious software could create single points-of-failure and affect the network significantly.

Here are some potential problems that could happen in this case:

Restrictions on Transaction Finality

Finality in blockchain lingo is a guarantee that transactions cannot be reversed, altered, or canceled. If a blockchain finalizes, committed blocks cannot be revoked.

Finality is what gives Ethereum, and other blockchain networks, their "immutable" quality. Anyone can use Ethereum knowing transactions will be recorded permanently on the chain and cannot be tampered with.

Typically, blockchain networks need the majority of nodes to reach consensus before achieving transaction finality. In the Beacon Chain, at least 2/3 of staked ETH is required to finalize the chain. Anything lower than that, and the chain cannot finalize—putting the system in jeopardy.

So, how does this relate to flawed client software? Well, you have to consider what can cause a situation where nodes are unable to reach consensus.

We could think of different reasons, but the most realistic one is simple: validating nodes are malfunctioning. If 1/3 of validators are inactive, then there wouldn't be a 2/3 supermajority required for Beacon Chain finality.

To remedy the problem, the Beacon Chain activates the Inactivity Leak Mechanism. This mechanism is designed to gradually reduce the stakes of offline validators, allowing the remaining nodes to re-form a two-third majority.

If all goes well, the remaining nodes can achieve consensus and keep the chain running. In this scenario, the heaviest losers are validators who have their ETH stakes slashed.

Network Split

The preceding section explored what could happen if a faulty client powers more than 1/3 of validating nodes. But, that’s only the beginning—worse things can happen, especially if > 1/2 of validators are running the bad client software.

In this case, the Beacon Chain network can split into two separate chains. Both chains will be unable to finalize because half their validators are missing; hence, the Inactivity Leak mechanism will activate.

Deposits belonging to affected validators will get destroyed over a while—three to four weeks, say—until they control less than 1/3 of staked ETH. The result is that the old and new chains will finalize independently, making it hard to merge them later.

To merge both chains, the community would need to agree on the canonical chain—a process likely to be fraught with politics. The reason is simple: asking validators to join another chain would cause them to lose their entire stakes.

A network split would force over half of the community to suffer huge economic losses. To avoid this fate, validators on the discarded chain may decide to continue running the forked chain, splitting the Ethereum community.

This is reminiscent of the events surrounding the DAO hack of 2016 when a faction—disgruntled with the handling of the situation—forked the network and created Ethereum Classic. Not only would a split weaken Ethereum, but it could depreciate the price of ether on the market.

To prevent a network split, developers would need to quickly patch the consensus bug before each chain finalizes. Otherwise, it’d become difficult, if not impossible, to ever merge both chains into one canonical chain.

Reversing Transactions

If a faulty client running > 1/2 of Beacon Chain validators is catastrophic, then a situation where > 2/3 of nodes use the same vulnerable software is Armageddon. A new chain comprising nodes with the affected client will split from the Beacon Chain and—because it has a 2/3 supermajority—finalize independently. At most, developers will have ~ 13 minutes to patch the software before the split chain achieves finality.

This is assuming the developers are nice guys who genuinely love Ethereum. If the developers go rogue—or malicious actors infiltrate the organization—they can hijack the chain and do considerable damage. They could very well rewrite the entire blockchain history (and double-spend funds), censor transactions, or even reverse old transactions.

Even if the bug were an honest mistake, the situation would still be less than ideal. Let’s look at some hypothetical scenarios in this case:

Fix #1

One way to fix this problem is to have nodes accept the bug as the normal behavior of Ethereum. What that means is that unaffected nodes will reproduce the bug and try to join the other chain. Afterward, developer teams can coordinate on a fix, and stakers can update their client software to reflect the change.

But this isn’t an easy solution to implement. First, unaffected nodes will be punished for inactivity even if they acted honestly. Second, this fix only covers trivial errors; severe consensus bugs would make the corrupted chain incompatible with the rest of the network.

This brings us to the second solution.

Fix #2

Another option in this event would be to patch the vulnerable client. However, there’s no way validators running the once-faulty client can safely rejoin the canonical chain without getting penalized. Here’s why:

Due to a large number of validators exiting their stakes on the compromised chain, the Inactivity Leak mechanism will activate. Which means affected validators can lose a lot of money, especially because exits can be painfully slow.

Now, this is the point where I say, “but wait, there’s more!”. Let’s imagine the affected validators successfully exit the bad chain and attempt to rejoin the correct chain. Because they already attested to one chain, any attempt to attest to another unfinalized chain would lead to a slashing of their ETH stakes.

In other words, we have a dilemma on our hands:

If the affected validators voluntarily exit their stakes, they’ll likely get caught up in the inactivity-related slashing. If they try to join the ‘good’ chain immediately (when it hasn’t finalized), their stakes get slashed per protocol rules.

In the end, no one benefits from having this problem happen. Which is why a client controlling more than 2/3 of validating nodes is cause for alarm.

Can Ethereum Client Diversity Improve?

By now, the importance of diversity to the health of the Ethereum network should be obvious—especially if we want to keep Ethereum decentralized. But, if the numbers are anything to go by, Ethereum’s Beacon Chain is yet to achieve a fair distribution of clients.

The dominant client, Prysm, controls > 68% of nodes on the Beacon Chain, putting us firmly in Armageddon territory. It may seem alarmist to raise concerns over a single client having a supermajority, but these problems are from being hypothetical.

In August 2020, Medalla (a Beacon Chain tesnet) suffered a major crash. The reason? Nodes running Prysm software malfunctioned due to a clock bug in the client. As a result, Prysm nodes went offline, leaving the chain unable to finalize.

Developers finally fixed the problem hours later, but not before validators lost their stakes due to slashing. If the majority of nodes hadn’t been using the same software, then the problem would have been corrected easily.

Medalla testnet outage.png The Medalla tesnet fluctuated below the 2/3 majority needed for finality. [Image source]

History also shows why client diversity is important. For example, in 2016, hackers targeted Ethereum with a distributed denial-of-service (DDoS) attack by exploiting vulnerabilities in the Go Ethereum (Geth) client. The network survived because nodes could switch to the Parity client which was safe from the attack.

Client diversity is hard to achieve because nodes cannot be unilaterally forced to run a particular client. Moreover, dominant clients usually enjoy first-mover advantages and network effects, making them hard to displace.

Nonetheless, there are various possible solutions for improving software diversity:

Protocol Incentives

Ethereum’s Casper PoS mechanism, which controls the Beacon Chain, is designed to discourage majority use of a particular client. It incentivizes client diversity through anti-correlation penalties and quadratic inactivity leak fees.

Quadratic Inactivity Leak

If you’ve read the article up until this point, you may be familiar with the quadratic inactivity leak. Typically, validators on the Beacon Chain can get penalized for dishonest activity or failing to perform their duties (e.g. going offline).

Standard inactivity penalties are low (around 1 ETH) and validators will still have most of their stake intact. However, if > 1/3 of validators are inactive, things start to look different. The chain cannot finalize in this situation, which threatens liveness—an indispensable quality for all distributed systems, not just blockchains.

In a quadratic leak scenario, inactivity fees will increase quadratically until offline validators have their stakes reduced to less than 1/3 of the network. The rationale is that, if anything knocks a large number of validators offline, the chain can still achieve finality.

Remember that we need 2/3 of stakers to reach consensus for the Beacon Chain to finalize? The Inactivity Leak mechanism will ensure that offline nodes hold less than 1/3 of stakings, allowing remaining validators to create the required two-thirds supermajority.

The implication is that if you run the dominant client, your node will likely go offline during the inactivity leak. This means you’ll lose ETH like crazy until the bug gets fixed and liveness is restored to the chain.

Anti-Correlation Penalties

In addition to encouraging software diversity, anti-correlation penalties are designed to prevent the possibility of malicious collusion. The quadratic inactivity leak is a type of anti-correlation penalty, although the punishments discussed in this section are heavier.

The first anti-correlation penalty touches on validators producing bad attestations, i.e., validating malicious transactions. If a majority client experiences a bug that causes validators to give false attestations, they’ll lose 100% of their stakings. In other words, the more nodes that act maliciously, the higher the penalty.

The second anti-correlation penalty touches on validators attesting to an invalid block. This is something we’ve explained earlier, but we’ll look at it again.

If an invalid block is committed to the beacon chain, other nodes will reject it and form a new chain that doesn’t have the block in it. The only option for nodes on the valid chain is to switch to the correct chain; however, this attracts a penalty, as a significant percentage of ETH stakings get slashed.

These incentives are put in place to promote a multi-client culture in Ethereum. As such, validators are encouraged to run minority clients.

Running Minority Clients

The more obvious step to achieving better client diversity is getting nodes to run different clients. By running different software, validators can ensure Ethereum remains robust, secure, and functional.

Moreover, running a minority client is in your best interests as a validator—especially with anti-correlation penalties and inactivity leak fees. You won’t necessarily escape punishment by running a minority client, but the penalties will be lower.

These are some minority consensus clients you can run:

Lodestar
Teku
Nimbus
Lighthouse

We’ve focused squarely on consensus clients, i.e. Beacon Chain nodes, in this article. However, Ethereum Mainnet has a similar problem. Per stats, Go Ethereum (Geth) controls > 80% of nodes, with Nethermind, OpenEthereum, and Erigon trailing with low figures.

Ethereum execution client diversity chart.jpg [Image source]

Since the execution layer is as important as the consensus layer, it needs client diversity as well. To solve the problem, nodes on Ethereum Mainnet can run the following execution clients:

Hyperledger Besu
Nethermind
Erigon
CoreGeth

Final Thoughts

Beyond the immediate security benefits, client diversity can create a richer Ethereum ecosystem. Each client works with different languages, design features, and architectures, inspiring greater innovation and exchange of ideas.

As Ethereum undergoes the biggest change since its inception, it is important to encourage diversity across the network. Only then can it remain a robust and highly functional decentralized system.

References

Cover image courtesy of Getty Images

Emmanuel Awosika