Ethereum underwent an "unannounced hard fork" today that disallowed firms in the cryptocurrency space from properly servicing their users.
Many exchanges and service providers rely on Infura or other API services that did not correctly upgrade their nodes to align with the majority of users.
News of this hard fork began to spread between 7:00 or 8:00 (UTC) on Wednesday, which was when the Ethereum blockchain suddenly split.
Infura, a tool that allows developers and service providers to interface with the Ethereum blockchain and IPFS networks, began to investigate a service outage at 8:12.
Withdrawals of ether and ERC-20 tokens began to shut down at the same time. Binance chief executive Changpeng Zhao wrote on Twitter:
"There was a possible ETH chain split at block 11234873. Etherscan and Blockchair are showing two different chains and data after this block. We’re resolving now but have temporarily closed withdrawals. Funds are #SAFU."
According to Zhao, Binance runs an Ethereum node, but discontinued withdrawals due to lack of clarity about whether that node is now synchronizing an orphan chain. Should that be the case, continuing withdrawals while following the wrong chain could result in sizeable monetary losses for the exchange.
Other leading exchanges reported similar issues around the same time.
MetaMask, an Ethereum web wallet, also began to experience outages as it relies on Infura nodes. MetaMask is the most popular Ethereum wallet, servicing a million monthly active users. Both Infura and MetaMask are owned by ConsenSys.
After three hours of investigation, Infura revealed what had happened: "several components" within Infura's infrastructure were "locked" to an older and stable version of the go-Ethereum client, which did not account for new code included in a new client. Several versions of the Geth client were also affected.
According to Infura, the root cause was a consensus bug affecting the versions of Geth (v.1.9.9) and (v1.9.13) used for some internal systems, which caused block syncing to stall across several of those subsystems.
This discontinuity resulted in a consensus bug that split the chain between entities that were running updated clients and those that were running older clients.
Notably, had Infura’s system been running the latest version of Geth, Infura would not have experienced a service outage; a chain split, however, would possibly still have occurred through other entities running the older version of Geth.
Blockchair's lead developer Nikita Zhavoronkov called this an "unannounced hard fork" that left those with older clients in the dust. Blockchair itself was affected, along with a small contingent of miners that processed around 30 blocks over the span of two hours.
Not all service providers and applications running Ethereum were affected as other API services had updated their nodes.
The High Cost of Running Ethereum Nodes
The issue appears to be related to the high costs of running Ethereum nodes—especially archive nodes, which store a "full history of interim states each block resulted in," according to Arcane Assets CIO Eric Wall.
As the BTC Times reported previously, running an archive node can be costly, both in terms of time and the capital investment required. Igor Artamonov, the founder of Ethereum Classic development consortium ETCDEV, said it took him one day to sync 20 days of Ethereum blocks.
SideShift.ai founder Andreas Brekken explained to the BTC Times that his attempt to run an archive node cost him over $10,000, along with "a few weeks" of syncing. After this investment, his node did not even finish its syncing process.
The lack of archive nodes presents a problem to Ethereum, according to Brekken.
To enable automated "deposits and withdrawals for exchanges in a secure manner," archive nodes with a setting called "tracing" are essential, he explained.
Reliance on centralized services with a single point of failure could result in more issues down the line if there are further consensus issues.
Don't Underestimate This Issue: Blockchair Lead Developer
Developers are picking up on the gravity of the situation. Zhavoronkov concluded the aforementioned thread by writing that this consensus "failure" should not be underestimated:
"In my opinion, today’s consensus failure in #Ethereum shouldn’t be underestimated and should be considered as the most serious issue Ethereum has faced since the DAO debacle 4 years ago. An investigation is in order."
The situation also shows how important it is to properly ensure that all service providers are running the proper clients to mitigate the risk for outages.
Already, it appears that Ethereum users must be prepared for another update. The Twitter page "Go Ethereum" reports that there will soon be a "critical" update to the Geth client:
"Tomorrow (12th Nov) Google will publish a security release (CVE-2020-28362) for #golang, in the form of Go v1.15.5 and v1.14.12. This is a critical release for #Ethereum! We will push a new Geth release with it, but if you use an older version, you'll need to rebuild yourself!"