The Ethereum cloud vs. on-premises nodes conundrum
Is there a way to actually identify how many Ethereum nodes are running in cloud and on-premises? There is, and it’s easy enough.
In this article, we are going to get the Ethereum mainnet node data and have a look at it.
- To check the results of analyzing the data right away, head right to the Results section.
- To understand the mechanics of the analysis, check the Cracking the data section.
- Before jumping to conclusions, however, read the introductory Seeing through the data section.
Seeing through the data
Data is impartial, but interpretation never is.
Back in 2012, when the Winklevoss brothers started investing in Bitcoin and accumulated as much as 1% of the circulating supply, they had to take proper measures to store their wealth. What is proper, however, when the access to your wealth is in a private key?
What they did was split the private key into three shards, print each shard on a piece of paper, and put it in a plastic envelope. Then they proceeded to put each of the envelopes in safety-deposit boxes of different banks. Not just different branches of the same bank, but different banks and in different locations.
The Winklevoss brothers repeated the process four times across different geographic regions. And thus they had a secure and fault-tolerant system that had a high chance of withstanding even the real world cataclysms, let alone theft attempts.
They didn’t put the shards in chests and bury them in various places, they used the existing and time-proven system of bank safety boxes to their advantage and built their own secure and redundant system on top of that.
The topic of decentralization is a complicated and multilateral one, and it’s hair-splitting to some. There are good projects like DAppNode that offer on-premises boxes with Ethereum nodes, there are miners, there are many community members running their own nodes, there are projects like Infura that often get a disapproving look from the proponents of “true decentralization.”
And at the same time, pretty much anyone who has ever spent more than a few hours with Ethereum welcomes any news that says enterprise adoption. One of the major organizations even has the word in its name—Enterprise Ethereum Alliance. The more adoption, the better.
The website ethereum.org has recently been redesigned from the ground up and populated with more documentation—all in an effort to be less alien and more welcoming to newcomers. An adoption effort.
There’s Vyper, a pythonic smart contract language, under development that is more auditable and human-readable than Solidity. Everything about Vyper says it should bring in more developers, who are otherwise reluctant to wet their toes in Solidity.
The list is rich and can go on and on, but the main point is there’s an incredible amount of effort in the Ethereum space to make it mainstream. And all of this effort is building on top of the existing technology. Just like the Winklevoss brothers did with their private key.
The vast majority of businesses and enterprises are using cloud computing. It’s easy, it saves the costs, it’s something that’s been proven by time. When faced with the decision of whether to hire a team to launch and maintain their own node or use a platform, they will go for the optimal choice—the one that saves costs. The cloud and the service to deploy a node.
Anyone running a node in the cloud is certainly not a miner, but it’s almost always a person or an entity interested in onboarding more users to Ethereum and putting their processes on blockchain—be it a DApp developer for end users or a B2B enterprise. Does this spell adoption? You’d be hard-pressed to say it doesn’t.
So, would knowing the actual numbers of cloud and on-premises Ethereum nodes give an insight? And is there a way at all to get the numbers?
Yes, there is. And it’s easy to do.
All you need to do is to crawl the IP addresses of the currently running Ethereum nodes on the mainnet and match them against the autonomous system numbers (ASN) of their providers. Cloud hosting providers—just like any provider—have their own ASN, so it’s easy to not only see who’s on cloud, but also on what cloud.
And we have done it for you. Read on for the methodology and the results.
Cracking the data
A short primer on the Ethereum discovery protocol
Having a brief understanding of the Ethereum discovery protocol is crucial to interpreting the data in this research.
The Ethereum discovery protocol is the mechanism through which Ethereum nodes find each other and join the global peer-to-peer network.
The protocol is based on Kademlia, and this is how the Ethereum peer-to-peer discovery works in brief:
- Each node keeps a table of other nodes on the network.
- When you start a new node, the table has a few hardcoded bootstrap nodes. These bootstrap nodes are maintained by the Ethereum Foundation.
- The bootstrap nodes keep a table of all nodes that connected to them in the past 24 hours.
- Once the new node retrieves the node data from the bootstrap nodes, it connects to these newly discovered nodes and goes through the same discovery process with them.
Data used for the research
- ethernodes.org — a third-party Ethereum network explorer.
- A maintained open-source list of autonomous system numbers (ASN) of cloud hosting providers.
The ethernodes.org service is a privately maintained Ethereum network explorer that was started in 2016 and has been operating since then.
The service is running several Ethereum nodes to crawl the network using the Ethereum discovery protocol. The crawled data is processed to map the node IP addresses to their geographical locations. The processed data is then exposed to the ethernodes.org website.
Autonomous system numbers (ASN)
An autonomous system number is a globally unique number for an autonomous system.
An autonomous system is a group of IP networks operated by an organization. The organization must have a clearly defined external routing policy to be registered as an autonomous system with a unique autonomous system number.
The autonomous system number is required for the routing requests from within an autonomous system to other autonomous systems. In short, whenever you send a request over the public Internet, it goes through your ISP’s ASN.
All cloud hosting providers, managed hosting providers, and colocation facilities have a unique ASN. Any IP address can be mapped to an ASN.
Having identified the proper data sources for the research, the actual mechanics of crunching the numbers is trivial.
1. Get the Ethereum mainnet host data of all nodes
You can get all the Ethereum mainnet data from ethernodes.org.
2. Deduplicate the IP addresses
The list of Ethereum nodes at ethernodes.org fluctuates in the number of actual nodes it reports. On some days, the fluctuation may go as high as 25%.
The reason for the fluctuation is that there are duplicate IP addresses reported by the service. You need to remove the duplicates from the list to have the actual numbers.
One way to do this is to extract the IP addresses from the data dump and then remove the duplicates.
3. Map the host IP addresses to ASNs
Run the IP addresses through any paid or free service that maps the IP address to its originating ASN.
4. Compare your list of ASNs against the list of ASNs of cloud hosting providers
You can use any tool you are comfortable with to compare the two lists—anything from Python to Microsoft Excel.
The data used for the results is from the September 20, 2019 snapshot.
- Total Ethereum nodes in cloud and on-prem: 8933 nodes
- Ethereum nodes running in the cloud: 61.6% (5499 nodes)
- Ethereum nodes running fully on-premises: 38.4% (3434 nodes)
Top ten cloud hosting providers
The top ten cloud hosting providers amount to a total of 57% Ethereum nodes.
Top ten cloud hosting providers by country
A total of 37% of all Ethereum nodes hosted in cloud run in the U.S. datacenters.
Top 10 on-premises Ethereum node locations
A total of 45% of all Ethereum nodes running on-premises are located in China.
All of the data is on GitHub gists, so you can poke through it:
- Raw data with mapped ASNs
- Cloud and on-premises matched against ASNs
- Cloud and on-premises geos
- On-premises geos