Building blocks of Blockchain
To protect changes in any document or data we have several methods from public key infrastructure domain (PKI domain) such as hashing, public key encryption, digital signatures.
Hashing
Hash is essentially a digest of chunk of data, a hash can be created for any type of data and a small change in data leads to large change in created hash.
Popular algorithms like SHA256
are used to calculate hash.
So, any change in document will not lead to same hash and hence it can be used to protect changes in document.
Public key encryption
Encryption is used to prevent data from unauthorised access.
Hashing along with encryption provides decent protection from change.
Digital Signatures
First we find the hash of document proceeding with the encryption and then the encrypted digest is attached to document, which may be verified by others to check for modifications.
But to verify this digital signatures, how will you get these private or public keys which were used to encrypt data, you cannot store the public key or private key of each and every organisation ?
Generally, a certificate authority (CA) issues digital signatures certificates which contains the information about the certificate owner, along with the required public key.
NIST Definition of Blockchain
A blockchain is a collaborative, tamper-resistant ledger that maintains transactional records. The transactional records (data) are grouped into blocks. A block is connected to the previous one by including a unique identifier that is based on the previous block’s data. As a result, if the data is changed in one block, it’s unique identifier changes, which can be seen in every subsequent block (providing tamper evidence).
Here, a typical block contains a number of transaction varying upon how new data arrives.
previous identifier = identifier of previous block
current identifier = hash( transactional data, previous identifier, some constraints )
This is how data is stored in a blockchain, lets suppose if a transaction is to be modified then new hash to be calculated which in turns start a chain reaction to change the hash of next block.
If a blockchain has sufficiently high number of blocks, this chain reaction makes a change difficult.
Basic Blockchain Terminologies
A blockchain contains two things Nodes and Participants
Nodes are the machines that performs computations to store data on blockchain.
Participants are the people or organisation which are using blockchain to achieve some goal.
A blockchain may be permission-less ( means public such as bitcoin, any new participant can participate ) or permissioned ( means private, only pre-approved participants can store data )
Typical structure of blocks of blockchain
Block size — total size of block excluding this field in bytes (4 bytes)
Block header — A wide range of metadata related to this block (80bytes), it contains version, previous block’s hash, merkle root, timestamp, difficulty, nonce.
Transaction counter — no. of transaction in this block ( 1–9 bytes )
Transactions — transactions data ( it is variable, actually its a unit of data stored over the blockchain, it can look like Alice paid 5$ to bob. In a decentralised system trust over transactions legitimacy is provided through PKI )
Merkle Trees
Merkle trees are a data structure designed to create a summary of large number of transaction. Its a tree which uses hash-based method to create cumulative summaries that are created over all the transactions in block
Lets suppose we have 5 transactions in block: A, B, C, D, E
We will define a hash function for us, lets say
hash = sha256(sha256(transaction)
This hash function will give output hash = 32 byte long irrespective of length of transaction.
We calculate hashes of these transactions: Ha, Hb, Hc, Hd, He
We will pair 2 consecutive transactions and create a new hash with them by hashing the hashes of A & B and if we have only one transactions we will create a dummy transaction by repeating the same transaction.
Hab = hash(Ha + Hb)
Hbb = hash(Hb + Hb)
And this process continues untill we have only one node which is called as root of merkle tree.
The root node of Merkle Tree is dependent on all transactions, so if any transaction is tempered with, the hash of root changes which can be detected.
Merkle Trees are useful for non-full nodes in a blockchain:
- A full-node in a blockchain, is a node that contains every bit of information for every block of the blockchain
- This includes the whole chain from the genesis block (the first block in the blockchain), till the last added block. For each block, it stores the complete Transaction Details as well as the complete Merkel Tree.
- If a non-full node wishes to check a transaction’s validity, it can be done by contacting some full node and “only” retrieving some hashes, which can then be verified locally !!
If we wish to check if transaction D is valid or not, we need only the hashes circled here (which is actually the path from D to the root node).
Proof of Work
In decentralised system, trust is difficult thing. To tackle this problem proof-of-work is used to incentivise the good nodes for their work
Nonce field is counter that is incremented by one after each iteration of a process called mining. The idea is to find the hash of the block header, with different Nonce values, until we achieve a “suitable hash” for eg hash should have n zeroes in left etc.
Applications of blockchain
Blockchains are used where data has to be prevented from modifications such as:
- Storing land records
- Storing financial transactional data (cryptocurrencies)
- Digital Arts ( NFT )
- Voting systems ( so that no central authority has a control over votes )
- Healthcare ( storing health records decentralised )
- Supply chain ( using permissioned blockchain so that entity cant change its reported state )