Row Store in HANA Database

Hello Consultants , In the last blog we have seen about different table types and impacts.

Lets deep dive in more and understand what is row store and how does it work.

Records that are inserted in table in same form in the main memory i.e. store the data in form resembling the logical table structure . Each record is saved as one concatenated chunk of values for every column in memory.

Source : HANA ADMIN BOOK

Properties in HANA :-

Advantages :-

1. Direct Mapping of logical table layout and operation performed it to actual data manipulation that happens in memory which makes easy to understand for developer and administration.

2. When records are most often accessed with all columns, mass data processing and analysis do not play any role , then row store tables can show better performance than column store table.

Disadvantage :-

1. DBMS cannot directly access a specific column of table whole data pages need to be transferred

2. Structuring the data representation by row is not very effective for many type of operation , Every values is stored again for occurrence of value within the table.

3. Even with normalized data models , the repetition of data, especially for very common values cannot be prevented because Foreign key references need to be stored. On top of it , this reference needs to be resolved during process by joins which need high computational power.

Note :- Row Storage is entirely stored in the main memory unlike column store

Limitations of HANA Row Store

1. Row store table cannot be partitioned , which limits the possible total size of all row store tables to the memory available on a single server that tables are located on.

What it means suppose you have 2 servers with memory of say 512 GB both and you have a table with 1024 GB . In row store you cannot store : You will need to have a server with 1024GB memory .

In Column store : You can partition and save among the two servers.

2. No Compression offered by HANA for row store table.

3. Columns in row store cannot be accessed independently and in parallel .

For example , we have a table with columns : Name, City, mobile no, Employee ID etc. . you cannot access only Name and City and expect fast processing in HANA . It does not work that way.

But it does not mean that the row store won't be processed in parallel . In fact many operations such as sorting, grouping , index creation and window function processing can be heavily parallelized.

4. Row store table cannot be displaced from memory . It should be in memory always when system is up and running . Therefore the table is automatically uploaded into memory during system startup. Obviously , This increases the startup time .

5. If row store table is not loaded fully in memory system cannot started.

6. In most SAP HANA informational model , ROW store tables cannot be used directly as data source.

For SAP NW system running on SAP HANA defines which tables shall be row store tables. Upon installation or migration of a SAP NW on SAP HANA Database the correct assignment is performed automatically.

If you want to check all the tables that are stored in row store :-

select * from M_RS_TABLES

select * from M_RS_TABLES where HOST='<worker node>’

(if checking on particular hosts)

Two important aspect of HANA :-

1. Multi version concurrency control : Lock for free data access and manipulation while maintaining transactional consistency , and indexes are a technique

2. Indexes : Technique of optimizing data access.

Diving Deep in both of these aspects

Multi version Concurrency control

MVCC is a well known technique to allow parallel access to same bits of information to multiple session, even when one or more session are actively changing this information .

This is achieved by keeping copies of original version of the record and presenting each session with the version appropriate to sequence of system change : COMMITS that the session has been exposed to

For the Developers and Administrator this happens automatically and no additional care or precaution is needed . However this changes are implemented in different ways in row store and column store but it brings different challenges for the Administrator .

In Row Store , Each Changed paged is copied first and placed into a chain of page version and with each version reflecting the state of data for a specific commit point . These page chains are stored in virtual container structure called undo cleanup files that can be monitored in M_UNDO_CLEANUP_FILES. But this is generally not a concern for Administrator and it is managed by Garbage Collector. The note worthy point is clearing this won't result in immediate free usable memory.

Garbage collector can only remove those old version for which transaction is completed (either committed or rolled back).

One know issue is :- If a transaction which is modifying tens of thousands of records without committing them we will end up in a situation in which large amount of redundant row store data need to be kept in main memory as there will be tens of thousands of record locks and new active record version kept in database.

Source : wiki.scn.sap.com

Indexes

As we have in other DBMS , HANA also offers the concept of Indexes

Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed.

For our understanding purpose indexing is something like this

To review information of indexes on row store tables , we can use monitoring views M_RS_INDEXES.

Indexes on row store table are not saved to the persistency and is rebuilt when table is loaded into the memory. this happens during index server startup and logs are written on the trace file of index server process.

For row store table we have two type of indexing that is present :-

1. Classic b- tree Index :- Used for all other data types other than string , binary string or decimal types.

E.g. based on whatisdbms.com

Let’s take an example as to explain how B-tree indexing is helpful. Imagine books are arranged in the college library based on the alphabetical manner, the library has books of all departments such as Automobile, Aeronautical, Bio-tech, Chemical, Civil, Electronics and so on. After entering the library, you see that ground-floor contains books by department name A-G, first-floor H-N, second-floor O-U and third-floor V-Z. So based on your requirement you can quickly find the required book. Consider equivalent database search now, just Imagine books database table, with a B-tree index on the dpt_name column. To find your book of civil, you can simply perform below query.

2. cpb+ tree Index :- Compressed prefix b- tree index , this is highly optimized to handle character based index keys in memory. It uses partial keys to store and navigate within the index structure.

To understand :-

This basically means, the B-tree index and leaf nodes do not contain the full strings for keys. Instead, the parts of the key-strings that are common among the keys (the prefixes) are stored separately. The leaf and index nodes then only contain

1.the pointer to the prefix
2. a kind of “delta” that contains the remaining key (this is where the partial key from the pkB-tree comes in)
3.and a pointer to the data record (row id)

This technique is rather common in many DBMS, usually attached to a feature called “index compression”

Hana uses this for columns that are string , binary string or decimal types

We will be dropping follow up blogs on this topic so stay tuned and let us know if anything's needs to be added up here.

References :-

cpb+tree

rsc1

btree

Comments

You might find these interesting

How to properly Start/Stop SAP system through command line ?

Starting/stopping an SAP system is not a critical task, but the method that most of us follow to achieve this is sometimes wrong. A common mistake that most of the SAP admins do is, making use of the 'startsap' and 'stopsap' commands for starting/stopping the system. These commands got deprecated in 2015 because the scripts were not being maintained anymore and SAP recommends not to use them as many people have faced errors while executing those scripts. For more info and the bugs in scripts, you can check the sap note 809477. These scripts are not available in kernel version 7.73 and later. So if these are not the correct commands, then how to start/stop the sap system? In this post, we will see how to do it in the correct way. SAP SYSTEM VS INSTANCE In SAP, an instance is a group of resources such as memory, work processes and so on, usually in support of a single application server or database server with...

Notes for Build Resilient Applications on SAP BTP with Amazon Web Services [ Week 1]

Welcome back to the next chapter in our ongoing series dedicated to unraveling the dynamic interplay between SAP Business Technology Platform (BTP) and Amazon Web Services (AWS). For those just joining us, this blog serves as an invaluable resource for individuals delving into the world of SAP BTP or seeking a comprehensive reference guide. SAP BTP, or SAP Business Technology Platform, is a comprehensive platform that brings together various essential capabilities for application development, automation, data management, analytics, planning, integration, and AI. These features are all integrated into a unified environment, making it user-friendly for both professional IT developers and citizen developers. Image Credit Key Features of SAP BTP: Application Development: SAP BTP offers a range of tools for development. For instance, SAP Build enables low-code development, while the SAP Business Application Studio caters to core developers, providing services like document management a...

8 Must-Know Questions About Object Store on SAP Business Technology Platform

What is the problem that Object Store solves ? Modern enterprise systems increasingly deal with massive volumes of unstructured data such as documents, logs, media files, and backups. Traditional relational databases are not optimized for such workloads. What is Object Store ? Object storage—commonly referred to as blob storage—addresses this gap by providing scalable, durable, and cost-efficient storage for unstructured data. Object storage is a storage architecture designed to manage unstructured data as discrete units called objects. Each object consists of: Binary data (file content) : Image , File etc Metadata (descriptive attributes) : File size, Content type, Last modified timestamp, Storage class (hot, cool, archive) Unique identifier (key or URL) : unique path-like string used to locate a blob inside a bucket Unlike file systems or relational databases, object storage does not rely on hierarchical file structures or schemas. The SAP BTP Object Store service is a managed, ...

Building the Foundation of the PO System: Architecture and some terminologies

1. Loosely Coupled and Tightly Coupled Services : Loosely Coupled Services : These services interact with each other with minimal dependencies. Changes in one service don't significantly impact others. Pros include flexibility, easier updates, and better scalability. A common example in PO is when a shipping service communicates with an inventory service. Changes in the inventory service won't necessarily disrupt shipping. Tightly Coupled Services : These services are interdependent, so changes in one service can affect others. While they might provide faster communication, they can be less flexible and harder to maintain. For example, tightly coupling an order processing service with a payment service means any change in payment could ripple to order processing. 2. SOA - Service-Oriented Architecture : SOA is an architectural approach where everything is treated as a service, encapsulating specific functionality. Service Orchestration Example (Banking Transaction) : Consider ...

Execute HANASitter for hang situation analysis

The SAP HANAsitter is configured to perform default checks once every hour to ascertain the online and primary status of SAP HANA. Upon confirmation, it initiates tracking procedures, which involve regular responsiveness assessments (typically every minute). If SAP HANA becomes unresponsive, the HANAsitter commences recording activities, potentially capturing call stacks of active threads, run-time dumps, index server gstacks, and/or kernel profiler traces, although, by default, no recording occurs. When SAP HANA is responsive, the script scrutinizes critical features, including a standard check for more than 30 active threads. If this threshold is exceeded, the script triggers recording. Upon completing the recording process, the script exits, with an option to be configured for restart using the command line. Setup Steps Overview: Begin by creating an SAP HANA user with the desired name (e.g., HANASITTER) and assign the CATALOG READ privilege to it. Establish a user key in the hdbuse...

Understanding SAP BTP Global Accounts, Directories, Subaccounts, and Entitlements

In SAP Business Technology Platform (BTP), organizing your resources effectively is crucial for efficient management and scalability. This blog provides a comprehensive overview of global accounts, directories, subaccounts, and entitlements within SAP BTP. What is a Global Account in SAP BTP? A global account in SAP BTP represents the contractual agreement you have with SAP. It serves as the top-level container for managing various resources, including directories, subaccounts, members, entitlements, and quotas. Within a global account, you receive entitlements and quotas for platform resources, which can be allocated to subaccounts for actual consumption. How Do Directories Function in SAP BTP? Directories in SAP BTP allow you to organize and manage your subaccounts based on your technical and business requirements. A directory can contain other directories and subaccounts, enabling you to create a hierarchical structure. This hierarchy can be up to 7 levels deep, with the global ac...

Huge Multiversion Concurrency Control (MVCC) Versions in HANA

What is MVCC? MVCC is a database concurrency control method that allows multiple transactions to occur concurrently without conflicting with each other. In a nutshell, it ensures that each transaction sees a snapshot of the database at a specific point in time, even if other transactions are making changes concurrently. MVCC in SAP HANA: SAP HANA uses MVCC to manage concurrent access to data. Each transaction in HANA sees a consistent snapshot of the data at the time the transaction began. This is achieved by maintaining multiple versions of a data row, each associated with a specific transaction or point in time. The Issue of Huge MVCC Versions: Now, the term "Huge MVCC Versions" indicates a situation where there is a significant number of these versions for a particular set of data. Here's why this might become a problem: Increased Memory Usage: Each version of a data row consumes memory. As the number of versions increases, the overall memory consumption by the databas...

Deploying SAP on Google Cloud : Part 1

Connect to Google Connection Method To be Used for Speed Explanation Example Uses Cloud VPN Proof of Concept Variable, up to 3 Gbps Connects on-premises network to Google Cloud securely over the internet using IPsec VPN tunnels. Creating a Cloud VPN tunnel between on-premises and Google Cloud. Encrypted IPsec tunnels Dedicated Interconnect For Enterprise level connect 10 Gbps to 100 Gbps Provides a dedicated, private connection between on-premises and Google Cloud through Google's network. Provisioning a dedicated interconnect connection. Direct physical connection between on-premises and Google Cloud network infrastructure Partner Connect If you have a data center which cannot be reached to Dedicated Google facility. Variable, up to 100 Gbps Allows connecting to Google Cloud through supported service providers. Establishing a connection with a supported service provider. Utilizes service provider's network infrastructure. Configure Tunnels with Google Cloud Platform IPsec

KPIs for Recovery in HANA Database Administration

Introduction: In the dynamic landscape of database administration, ensuring the robustness of a system is paramount. One crucial aspect that demands meticulous attention is the recovery process following a system failure. Two key performance indicators (KPIs) stand out in this realm – Recovery Point Objective (RPO) and Recovery Time Objective (RTO) . In this technical blog, we will delve into the significance of these KPIs for HANA database administrators and explore strategies to optimize them. Recovery Point Objective (RPO): RPO is a critical metric that defines the maximum acceptable data loss in the event of a system failure . For HANA database administrators, establishing an RPO involves a careful balance between data consistency and the overhead of continuous data replication. Continuous Data Backups: To meet stringent RPO requirements, implementing continuous data backups is imperative. Utilizing HANA's native backup capabilities and integrating them with a robust backup s...

HANA System Replication - Prerequisites & Setup

Hey Folks! Welcome back to Hana high availability blog series. In our last blog we checked out operation & replication modes in hana system replication. If you haven't gone though that blog, you can checkout this link In this blog we will be talking about the prerequisites of hana replication and it's setup. So let's get started. When we plan to setup hana system replication, we need to make sure that all prerequisite steps have been followed. Let's have a look at these prerequisites. HANA System Replication Prerequisites: Primary & secondary systems should be up & running HDB version of secondary should be greater than or equal to Primary database sever But, for Active/Active(read enabled config), HDB version should be same on both sites. System configuration/ini files should be identical on both sides Replication happe...

EIENTRA

Search This Blog