Implementing a Merkle tree in Python
Unless you’ve been living under a rock for the past few years you have probably heard of Bitcoin, the decentralized, peer-to-peer, public, trustless, cash system protocol. Regardless of whether or not you think bitcoin’s current valuation is in a bubble or worse that it’s a Ponzi scheme, it’s undeniable that the protocol solves a real and old problem in distributed computing, a.k.a the Byzantine’s General Problem. One of the underlying data structures ensuring the immutability of the blockchain is the Merkle tree.
A Merkle tree is a binary tree in which each leaf node is a hash of a block of data and each internal node is a hash of its children. In the bitcoin protocol it ensures the immutability of the set of transactions by taking the hash of each data transaction and generating a single unique root hash representing the hash of the entire set.
In this post we’ll look at a possible implementation of a Merkle tree in python using the typing module. If you’re not familiar with typing you can take a look at my last post where I go a little more in depth on how to use this module. The API for the Merkle tree data structure is very straightforward and consists of a single method which exposes the root hash of the tree. The constructor takes a list of objects from which we wish to build our tree.
Our Node class stores its hash and a reference to the left and right child nodes. We’ll add two static methods to do the hashing using the SHA-256 algorithm. We’ll apply the hashing two times for an extra level of security. Because of the way the Merkle tree is structured we need an even number of leaf nodes to build our tree. If we have an odd number of data blocks then we just hash the last one two times, duplicating the last leaf node.
from typing import List import typing import hashlib class Node: def __init__(self, left, right, value: str)-> None: self.left: Node = left self.right: Node = right self.value = value @staticmethod def hash(val: str)-> str: return hashlib.sha256(val.encode('utf-8')).hexdigest() @staticmethod def doubleHash(val: str)-> str: return Node.hash(Node.hash(val))
Then the Merkle tree class itself comprised of two methods for recursively building the tree, two others for printing the nodes in prefix order and the one we mentionned earlier for getting the root hash.
class MerkleTree: def __init__(self, values: List[str])-> None: self.__buildTree(values) def __buildTree(self, values: List[str])-> None: leaves: List[Node] = [Node(None, None, Node.doubleHash(e)) for e in values] if len(leaves) % 2 == 1: leaves.append(leaves[-1:]) # duplicate last elem if odd number of elements self.root: Node = self.__buildTreeRec(leaves) def __buildTreeRec(self, nodes: List[Node])-> Node: half: int = len(nodes) // 2 if len(nodes) == 2: return Node(nodes, nodes, Node.doubleHash(nodes.value + nodes.value)) left: Node = self.__buildTreeRec(nodes[:half]) right: Node = self.__buildTreeRec(nodes[half:]) value: str = Node.doubleHash(left.value + right.value) return Node(left, right, value) def printTree(self)-> None: self.__printTreeRec(self.root) def __printTreeRec(self, node)-> None: if node != None: print(node.value) self.__printTreeRec(node.left) self.__printTreeRec(node.right) def getRootHash(self)-> str: return self.root.value
We can then test our data structure by sending in a list of strings and getting its corresponding merkle tree root hash.
def test()-> None: elems = ["Hello", "mister", "Merkle"] mtree = MerkleTree(elems) print(mtree.getRootHash())
~$ python3.6 merkletree.py 8617d7a42c4ef170227e97fc407a9af29e1d52db4a0dcff56a039fd16cde0f69
That’s it! Feel free to try this code and improve it at will.
Bye for now!