Implementing a Merkle tree in Python
Unless you’ve been living under a rock for the past few years you have probably heard of Bitcoin, the decentralized, peer-to-peer, public, trustless, cash system protocol. Regardless of whether or not you think bitcoin’s current valuation is in a bubble or worse that it’s a Ponzi scheme, it’s undeniable that the protocol solves a real and old problem in distributed computing, a.k.a the Byzantine’s General Problem. One of the underlying data structures ensuring the immutability of the blockchain is the Merkle tree.
A Merkle tree is a binary tree in which each leaf node is a hash of a block of data and each internal node is a hash of its children. In the bitcoin protocol it ensures the immutability of the set of transactions by taking the hash of each data transaction and generating a single unique root hash representing the hash of the entire set.
In this post we’ll look at a possible implementation of a Merkle tree in python using the typing module. If you’re not familiar with typing you can take a look at my last post where I go a little more in depth on how to use this module. The API for the Merkle tree data structure is very straightforward and consists of a single method which exposes the root hash of the tree. The constructor takes a list of objects from which we wish to build our tree.
Our Node class stores its hash and a reference to the left and right child nodes. We’ll add two static methods to do the hashing using the SHA-256 algorithm. We’ll apply the hashing two times for an extra level of security. Because of the way the Merkle tree is structured we need an even number of leaf nodes to build our tree. If we have an odd number of data blocks then we just hash the last one two times, duplicating the last leaf node.
Then the Merkle tree class itself comprised of two methods for recursively building the tree, two others for printing the nodes in prefix order and the one we mentionned earlier for getting the root hash.
We can then test our data structure by sending in a list of strings and getting its corresponding merkle tree root hash.
That’s it! Feel free to try this code and improve it at will.
Bye for now!