A columnar database is a database management system (DBMS) that stores data in columns instead of rows.
In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row.
As shown in the figure below, a database table is conceptually a two-dimensional structure composed of cells arranged in rows and columns. Because computer memory is structured linearly, there are two options for the sequences of cell values stored in contiguous memory locations:
- Row Storage - The data sequence consists of the data fields in one table row(Example Oracle RDBMS).
- Column Storage - The data sequence consists of the entries in one table column ( Example HANA)
One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly.
The following example shows the different usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries because these queries get just a few attributes from every data entry. But for traditional OLTP queries, it is more advantageous to store all attributes side-by-side in row tables
As the use of in-memory analytics increases, however, the relative benefits of row-oriented vs. column oriented databases may become less important. In-memory analytics is not concerned with efficiently reading and writing data to a hard disk. Instead, it allows data to be queried in random access memory (RAM).