Data Base Storage Engine types and Data Structure used in it
There are two main kinds of databases. The first one makes sure the data is always right but might not be very fast, while the other is super fast but sometimes the data might not be exactly right.
Some databases prioritize data consistency above all else and follow the ACID principles (Atomicity, Consistency, Isolation, and Durability). Meanwhile, another group of databases focuses on being available and fast, and they follow the BASE principles (Basically available, Soft State, and Eventually Consistent).
The choice between ACID and BASE for a database depends on how the database is built.
The way a database engine is built determines whether it has weak or strong consistency. Now, let’s discuss the different types of storage engines and data structures used in various types of databases available today.
Storage engine falls into two broad categories:
- OLTP (Online Transaction Processing) systems are designed to handle a large number of queries efficiently.
- OLAP (Optimized for Analytics) systems are built to handle slow queries that involve retrieving a significant portion of the dataset. Column-oriented databases are becoming increasingly popular in this context.
In case of OLTP, there are two types of storage engines:
- In first category, they use a method called "Write-Ahead Log (WAL)" to store data. It's like writing in a diary – you can only add new information, not change or erase anything. When data becomes old or unnecessary, it's removed. Bitcask, SSTables, LSM Trees, LevelDB, HBase, Cassandra, Lucene, and many more use this approach.
- Other category is update-in-place: In this approach disk is treated as a notebook with pages that you can write on and erase. The B-Tree is the most common way to organize the information in this notebook, and it's used in all major relational databases and in many non-relational ones too.
In this part, we'll discuss three kinds of data structures found in OLTP databases that rely on the Write-Ahead Log (WAL):
SSTable(Sorted String Table):
An immutable data structure that stores a large number of key:value pairs sorted by key. You can think this as a log file where records only gets appended to the file not updated.
Now, let’s just say that each row contains a Key and let’s assume a separator as comma and then value.
- LSM Trees:
- Hash Index:
To be continued…