Benefits of Using Columnar Storage in Relational Database Management Systems
Relational database management systems (RDBMS) store data in rows and columns. Most relational databases store data row-wise by default, but a few RDBMS, including MariaDB and PostgreSQL, provide the option to store data column-wise, which is a useful feature.
The benefits of columnar storage are based on a couple of assumptions:
- A typical database table has very few columns compared to rows—a table could have 10 columns and a million rows of data. A large data set would typically need to be stored across multiple disks
- The database queries are for a subset of the data rather than complete rows of data
Let’s look at the benefits of being able to use columnar storage for data.
What’s the difference?
When RDBMS store data row-wise, in the following example, the ID, first name, and last name would be stored as follows, in which “;” demarcates data stored together:
1,John,Smith;2,Deepak,Vohra;3,Brad,Jones
Columnar storage for the same data would be:
1,2,3;John,Deepak,Brad;Smith,Vohra,Jones
Querying a subset of data
A query on a subset of data on row-oriented RDBMS has to scan several rows of data on the hard disks before finding the data that matches the query. Columnar databases are more efficient to query a subset of data because once the column containing the data has been found, the corresponding row values may be obtained. The query is optimized to scan only the necessary columns instead of scanning rows and rows of data. The query performance of a columnar database is much better than a row-oriented database if the query is over a subset of data.
Working with big data
If a table has only 10 rows of data and three columns, the query performance wouldn’t be much different. The main benefit of columnar data is realized if a table has many more rows than columns, and you really start to see this benefit when working with big data. The MariaDB ColumnStore engine could be used to process petabytes of data with linear scalability and high performance.
Indexes aren’t needed
If a row-oriented storage engine, such as InnoDB engine in MySQL and MariaDB, is used for processing large data sets, the big data would need to be indexed to improve query performance. Indexes also improve the performance of queries over a subset of data. But indexes add additional disk storage. In comparison, a columnar storage engine data set doesn’t need to be indexed, as it responds to queries over a subset of data much faster inherently.
More suitable for OLAP
Columnar storage is more suitable for analytical queries—such as online analytical processing, or OLAP—rather than transactional queries—such as online transaction processing, or OLTP. Analytic queries with columnar storage return a response in real time. MariaDB’s ColumnStore would be a good choice to use for big data analytic queries involving complex joins and aggregations.
Easier to compress
Another benefit of columnar storage is compression efficiency. As columnar data is of uniform type, compression efficiency is improved over compressing row-oriented data with possibly dissimilar data types.