SAP HANA Lock Analysis: Problems and Solutions
1999998 – FAQ: SAP HANA Lock Analysis
Symptom
SAP HANA connections are hanging because they have to wait for locks.
Environment
SAP HANA
Cause
1. Where do I find information about SAP HANA lock analysis?
2. Which indications exist for SAP HANA locking issues?
3. What types of locks exist?
4. How can I check if my SAP HANA database suffers from lock waits?
5. How can transactional lock waits be analyzed and optimized?
6. How can internal lock waits be analyzed?
7. How can special lock waits be analyzed?
8. Do timeouts for lock waits exist?
9. What happens in case of deadlocks?
10. Is it possible to identify the record responsible for a record lock?
11. How can I determine how long certain types of locks are held?
12. What is the unit of column TOTAL_LOCK_WAIT_DURATION in monitoring view M_SQL_PLAN_CACHE?
13. How can I automatically capture additional information in case of unusual locking issues?
14. Why are there deadlocks although changes are done in a sorted manner?
15. How can the lock holder be determined?
Resolution
1. Where do I find information about SAP HANA lock analysis?
SAP Note 1858357 provides an overview how to analyze lock situations.
The SAP HANA Troubleshooting and Performance Analysis Guide at SAP HANA Troubleshooting and Performance Analysis Guide contains further information related to lock analysis.
2. Which indications exist for SAP HANA locking issues?
You suffer from terminations and short dumps due to SQL errors 131 or 133:
SQL error 131: transaction rolled back by lock wait timeout SQL error 133: transaction rolled back by detected deadlock SQL error 146: resource busy and NOWAIT specified
The trace files contain messages like:
There are too many lock items on this system. Deadlock detected: Deadlock detected while executing transaction (TRANSACTION_ID=<tid>, UPDATE_TRANSACTION_ID=<utid>): This is not an HDB error. User or application may cause a deadlock due to incorrect access sequences on shared objects.
The following SAP HANA alerts indicate problems in the locking area:
Alert | Name | Description |
16 | Lock wait timeout configuration | Determines whether the ‘lock_wait_timeout’ parameter in the ‘transaction’ section of the indexserver.ini file is between 100,000 and 7,200,000. |
49 | Long-running blocking situations | Identifies long-running blocking situations. |
59 | Percentage of transactions blocked | Determines the percentage of transactions that are blocked. |
SQL: “HANA_Configuration_MiniChecks” (SAP Notes 1969700, 1999993) returns a potentially critical issue (C = ‘X’) for one of the following individual checks:
Check ID | Details |
870 | SelfWatchDog activity time (%, last hour) |
1010 | Age of oldest active trans. lock wait (s) |
1011 | Trans. lock wait durations > 600 s (last day) |
1020 | Threads currently waiting for locks |
1021 | Maximum threads waiting for locks (last day) |
1030 | Concurrently blocked transactions (last hour) |
1031 | Concurrently blocked transactions (last day) |
1040 | Total current record locks |
1045 | Transactional locks older than 1 day |
1050 | Significant internal lock waits (last hour) |
1052 | Significant internal lock waits (last day) |
1720 | Supported nameserver lock file location |
3. What types of locks exist?
We can distinguish the following SAP HANA lock types:
Lock type | Lock wait thread state | Scope | Views | Details |
Record lock | ConditionalVariable Wait (RecordLockWaitCondStat / TransactionLockWaitCondStat) |
transactional | M_RECORD_LOCKS M_CONDITIONAL_VARIABLES M_BLOCKED_TRANSACTIONS M_OBJECT_LOCK_STATISTICS |
Exclusive locks on record level, typically caused by concurrent changes of the same records by different transactions |
Object lock | ConditionalVariable Wait (TableLockWaitCondStat / TransactionLockWaitCondStat) |
transactional | M_OBJECT_LOCKS M_CONDITIONAL_VARIABLES M_BLOCKED_TRANSACTIONS M_OBJECT_LOCK_STATISTICS |
Locks on object level, typically caused by DDL operations requiring an object lock:
|
Metadata lock | ConditionalVariable Wait (TransactionLockWaitCondStat) |
transactional | M_CONDITIONAL_VARIABLES M_BLOCKED_TRANSACTIONS |
Metadata locks |
Read / write lock | ExclusiveLock Enter IntentLock Enter SharedLock Enter |
internal | M_READWRITELOCKS | Read / write lock waits, e.g. waits during critical savepoint phase |
Barrier Semaphore | BarrierSemaphore Wait | internal | ||
Semaphore | Semaphore Wait | internal | M_SEMAPHORES | Low level locks based on semaphores, e.g. waits for a critical delta merge phase |
Mutex | Mutex Wait | internal | M_MUTEXES | Low level locks based on mutexes |
Barrier | Barrier Wait | internal | Low level locks similar to mutexes | |
Speculative locks | Speculative Lock Retry backoff Speculative Lock Wait for fallback |
internal | Low level transactional memory locks (SAP HANA >= Rev. 122) | |
liveCache lock | internal | M_LIVECACHE_LOCKS | Locks related to integrated liveCache (if used) | |
Nameserver topology | special | The nameserver topology is locked based on a file (/tmp/.hdb_<sid>_<inst_id>_lock) |
In addition to these explicit locks there can be some “hidden” lock waits of threads in status ‘Running’. Related call stacks (SAP Note 2313619) contain:
- __GI___sched_yield
- Synchronization::impl::SpinLock::lock
- ptime::Futex::lock
- Synchronization::SystemMutex::lock
- __lll_lock_wait / pthread_mutex_lock (Linux kernel lock)
Record and object locks are mainly linked to the application transactions while read / write locks, semaphores, mutexes and barriers are managed by SAP HANA internally.
Be aware that the table above concludes from a lock type to a thread state. The opposite way is not generally valid. For example, there can be “ConditionalVariable Wait” situation that are not linked to a record lock or object lock.
4. How can I check if my SAP HANA database suffers from lock waits?
On a very elementary level you can identify lock waits based on the thread states.
Current threads can be displayed via:
- Transaction DBACOCKPIT: Performance –> Threads
- SAP HANA Studio: Performance –> Threads
- SQL: “HANA_Threads_CurrentThreads” (SAP Note 1969700)
Historic thread activities can be determined via:
- SQL: “HANA_Threads_ThreadSamples_FilterAndAggregation” and “HANA_Threads_ThreadSamples_AggregationPerTimeSlice” (SAP Note 1969700)
The following thread states indicate lock wait situations:
- Barrier Wait
- ConditionalVariable Wait
- ExclusiveLock Enter
- Mutex Wait
- Semaphore Wait
- SharedLock Enter
- Sleeping
Not every occurrence of these states is critical. It happens frequently that a thread submits a request to another thread and waits for a related semaphore until the called thread has returned the result.
The comprehensive check command SQL: “HANA_Configuration_MiniChecks” (SAP Note 1969700) contains also lock related checks. An ‘X’ in column C indicates potentially critical situations. Example output:
The following SQL errors also indicate lock related problems:
- SQL error 131: transaction rolled back by lock wait timeout
- SQL error 133: transaction rolled back by detected deadlock
The following SAP HANA alerts are another indication for locking issues:
- Alert 49: Long-running blocking situations
- Alert 59: Percentage of transactions blocked
Below you can find more specific information how to analyze certain lock wait situations.
5. How can transactional lock waits be analyzed and optimized?
If a high number of transactional (record and object) lock waits exist, but at the same time there are also many semaphore and mutex waits, we can usually assume that the record and object lock waits are a consequence of the mutex and semaphore waits. In this case you should focus on the analysis of mutex and semaphore waits (see below). Otherwise you can use the following approaches to analyze transactional lock situations.
Current lock wait situations are visible as blocked transactions:
- View M_BLOCKED_TRANSACTIONS
- SQL: “HANA_Locks_Transactional_LockWaits[_Hierarchy]” (DATA_SOURCE = ‘CURRENT’) available via SAP Note 1969700
- SAP HANA Studio: Performance -> Blocked Transactions
- DBACOCKPIT: Diagnostics -> Locks -> Blocked Transactions
Historic lock wait situations can be found in the related history:
- View HOST_BLOCKED_TRANSACTIONS
- SQL: “HANA_Locks_Transactional_LockWaits”[_Hierarchy] (DATA_SOURCE = ‘HISTORY’) and SQL: “HANA_Locks_Transactional_LockWaits_PerObject” available via SAP Note 1969700
- SQL: “HANA_LoadHistory_Services” (SAP Note 1969700)
Among others this information provides information about the involved table and transactions. With some further research it is usually possible to identify the related business scenarios.
The following table lists typical approaches to optimize record and object locks:
Lock type | Optimization approaches |
Record lock | Check from an application and / or scheduling perspective if you can reduce the amount of concurrent changes of the same records by different transactions.
Check from application side if you can reduce the critical time frame between DML operation and next COMMIT (lock is always held until next COMMIT is executed). Manually terminate the lock holding transaction if it is responsible for a critical current problem. Lock wait related recommendations for specific tables like REPOSRC, FKKDIHDTMP, NRIV or /PLMB/FRW_BUFFER can be found in SAP Note 2000002 (“Are there standard recommendations for specific SQL statements available?”). |
Object lock | Make sure that critical DDL operations (e.g. offline repartitioning) are executed at times with minimum concurrent workload.
Check if you can reduce the amount of critical DDL operations (e.g. by using online operations). If another transaction holds an uncommitted change on a table, a DDL operation on the same table like CREATE INDEX will escalate this row level lock to a table level lock because it prevents any new DML operation to be started until it acquires the lock itself and finalizes its tasks. In order to minimize this risk, you can reduce the lock wait timeout on transaction level by executing the following command before starting the DDL operation: SET TRANSACTION LOCK WAIT TIMEOUT <timeout_in_ms> A value of 0 for <timeout_in_ms> would result in an immediate termination of the DDL operation with the following error: SQL error 131: transaction rolled back by lock wait timeout: Lock timeout occurs while waiting OBJECT_LOCK of mode EXCLUSIVE For index creations additional options exist to reduce the risk and duration of object locks. See SAP Note 2160391 (“Are accesses to the underlying table locked when an index is created?”) for more information. If an index is created with NOWAIT option and there is still an open change on the table, it is terminated with the following error: SQL error 146: resource busy and NOWAIT specified Manually terminate the lock holding transaction if it is responsible for a critical current problem. |
Metadata lock | This kind of locking situation in combination with thread method “Authentication” can happen if many sessions connect to the SAP HANA database at the same time. Typically this scenario is a victim of another, major problem. The risk of running into this scenario can be reduced by adjusting the SAP HANA parameter indexserver.ini -> [authentication] -> last_successful_connect_update_interval. See SAP Note 2460123 for more details. |
6. How can internal lock waits be analyzed?
Read / write lock, semaphore and mutex waits are typically related to internal lock situations. If you experience a high number of threads waiting for these locks you can run SQL: “HANA_Sessions” (SAP Note 1969700) in the first step. The column WAITING_FOR can provide some additional expert insight for threads with states ‘SharedLock Enter’, ‘ExclusiveLock Enter’, ‘Mutex Wait’ and ‘Semaphore Wait’.
Example:
In this example the lock names already indicate a contention related to the nameserver process.
Similar information can be found in column LOCK_WAIT_NAME of views M_SERVICE_THREADS and M_SERVICE_THREAD_SAMPLES.
The following table contains typical general reasons for contention on internal locks:
Check | Details |
Memory bottleneck | Check if there are indications for a memory bottleneck like paging or column store unloads at times of the internal lock waits. If yes, identify and eliminate the root cause of the memory bottleneck. See SAP Note 1999997 for more details. |
CPU bottleneck | Check if there are indications for a high CPU consumption at times of the internal lock waits. If yes, try to identify the root cause like expensive SQL statements (SAP Note 2000002) or CPU-intensive processes outside of SAP HANA. Be aware that sometimes the high CPU consumption is a consequence of the internal locks because of active waits. |
SAP HANA bug | Bugs like permanent deadlocks on internal resources can be responsible for internal lock waits. Therefore check SAP Notes for potential bugs. |
Furthermore the following typical root causes exist for certain scenarios:
Thread state | Lock name | Reason | ||||||||||||||||||||||||||||||||||||||||
Barrier Wait | Barrier waits don’t show a lock name, so it is sometimes hard to understand the root cause without further details, e.g. the call stack. The following typical scenarios exist:
|
|||||||||||||||||||||||||||||||||||||||||
ConditionalVariable Wait | ParallelIndexHash_cond | This lock wait is typically linked to thread method ‘HashDict worker’ and it impacts parallel job workers processing the same database request. See SAP Note 2114710 (Thread Method = ‘HashDict worker’) for more details.
Massive waits for this lock are typically a consequence of an expensive database request (SAP Note 2000002), involving intermediate results (dicts) in contexts like calculation scenarios or joins. With SAP HANA Rev. <= 121 it can happen that after an out-of-memory situation threads permanently remain in a state waiting for ParallelIndexHash_cond. This bug is fixed with SAP HANA Rev. 122. |
||||||||||||||||||||||||||||||||||||||||
ExclusiveLock Enter SharedLock Enter ExclusiveLock Enter Semaphore Wait Sleeping |
unnamed Mutex
BTree GuardContainer SleepSemaphore |
The BTree GuardContainer lock synchronizes read and write accesses to individual columns in delta storage. Typically these locks happen on a single table at a time. Typical lock types and call stacks are:
Known situations are:
|
||||||||||||||||||||||||||||||||||||||||
Job Exec Waiting | CSPlanExecutorWaitForResult | This lock indicates that a SQL executor waits for its job workers executing a column store statement (thread method: PlanExecutor calc). So it is basically an “idle” lock and itself not critical. If the lock is visible for a long time, you have to check why the execution of the SQL statement by the job worker(s) takes so long. | ||||||||||||||||||||||||||||||||||||||||
Job Exec Waiting | unnamed BinarySemaphore (SAP HANA 1.0)
JobBarrier Wait for Jobs (SAP HANA >= 2.00) |
These locks are visible when a thread has delegated the work to one or more parallel JobWorker threads and waits for the results. It can be considered as idle wait. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | AttributeStore Resource Load | This lock is used when a database query requires a column which currently isn’t in memory and needs to be loaded into the column store. The actual load is done by special threads (LoadField) and the thread preparing or executing the query has to wait for the “AttributeStore Resource Load” lock during this time. See SAP Note 2127458 for more information about loads and unloads. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | ceManager_GCLock | This lock indicates a wait for the calculation engine garbage collector (CalcEngineGarbageCollector, see SAP Note 2169283). | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | ConfigurationRegistry | This lock synchronizes concurrent changes of SAP HANA parameters (SAP Note 2186744). Related call stacks contain modules like:
The related thread detail is typically reconfigAll. If the lock isn’t available after a few minutes, the change may be terminated with the following error: 129: transaction rolled back by an internal error exception 70000030: error: general configuration error If you see a high amount of these wait situations you should check if an unnecessary high amount of parameter changes is done or if a transaction got stuck while holding the lock. A bug with SAP HANA Rev. <= 112.02 can be responsible for this locking issue. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | CreateDropSnapshotLock | This lock is required when a snapshot activity (e.g. triggered by system replication or a backup) can’t be processed immediately. For example, a running savepoint may prevent snapshot activities. See SAP Note 2100009 for more information regarding savepoints and snapshots. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | CSPlanExecutorLock | Job worker threads of a parallelized execution have to wait for other threads of the same connection. You can consider these waits as idle waits. In order to improve the performance and reduce the wait time, you have to check what the other threads of the same connection are doing and how their activity can be optimized (e.g. by eliminating lock scenarios, resource bottlenecks or expensive SQL statements). | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | DeltaDataObject | This lock is used to synchronize changes to the delta storage data page chain which includes adding new pages or rewrite of pages. The delta data page chain is used to persist rows inserted into delta. This lock is required during DML or DDL operations, but not when doing SELECTs. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | Federation | This lock is linked to Smart Data Access (SDA), i.e. accesses to remote databases. See SAP Note 2180119 and check if you can optimize the configuration and use of SDA. Make sure that you use supported client software (e.g. ODBC driver). | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | FixedDictFlush | This lock is used to synchronize page dirty operations for fixed delta columns. This shows up during DML operations when new values are added to the fixed data type columns. It also shows up for every INSERT as the internal delta row id is also a fixed dictionary column. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | fo-Stats | This lock is required to synchronize the collection of operations of a specific job graph (i.e. of a specific database request), so only job worker processes are impacted. It mainly protects adding new jobs. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | FunctionProfilerLock | This lock is related to the function profiler trace which can e.g. be activated in the context of a performance trace (SAP Note 1787489).
With SAP HANA <= Rev. 102.01 this lock is also acquired when PlanViz (SAP Note 2073964) is used or when the related use_profiler parameter was deleted from the ini file rather than explicitly being set to false. These issues are fixed with SAP HANA Rev. 102.02. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | HugeAlignmentPool | This lock can show up in case several threads execute specific memory allocations at the same time. A general memory bottleneck situation can increase the probability of these waits. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | Mutex[releasable_handle.hpp:202]: TransactionManager::WaitGraphImpl::WaitGraphImpl | This lock is linked to the deadlock wait graph creation. With SAP HANA Rev. 110 to 112.02 it can be critical in the context of the SAP HANA bug described in SAP Note 2400515. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | JE_SynchronizedNTuple_Lock | This lock synchronizes a parallelized creation of tuples during join operations. It only locks threads belonging to the same parallelized statement execution. If you repeatedly suffer from these lock waits, you may suffer from a bad SQL statement or an inefficiently processed join. See SAP Note 2000002 and check if you can optimize the underlying SQL statements. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | jx-pq00-nopref jx-pq01-nopref jx-pq02-nopref jx-pq03-nopref jx-pq04-nopref jx-pq05-nopref jx-pq06-nopref jx-pq07-nopref jx-pq00-nosteal jx-pq00-numa jx-pq00-system |
These locks synchronize operations on job queues (separated by NUMA node and priority), so only job worker processes are impacted.
If you observe a high number of waits for this lock you can check if there are many expensive SQL statements (SAP Note 2000002) being processed with a high number of jobs. The meanings of the suffixes are:
|
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | LastValuesHolder::AccessCoordinator | This mutex synchronizes the accesses to the monitoring view M_STATISTICS_LASTVALUES, where the standalone statistics server stores data historicized during the most recent snapshot. The most efficient way to bypass this bottleneck is an upgrade to the embedded statistics server that is possible as of Rev. 74 (see SAP Note 1917938). Alternatively you can reduce the collected data volume as described in SAP Note 2084747. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | LoadLock | This lock secures the load of delta storage information from disk. A significant amount of waits can indicate memory shortages (SAP Note 1999997) or disk I/O bottlenecks (SAP Note 1999930). | ||||||||||||||||||||||||||||||||||||||||
BarrierSemaphore Wait Mutex Wait |
LoggerBufferSwitch | This lock indicates wait situations for a log buffer switch. Typical reasons are:
The related call stacks typically contain DataRecovery::LoggerImpl::reserveLogWriter and DataRecovery::LoggerImpl::reserveLog. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | *_fox_FoxFinalView_cache *_lookup_lookup_cache *_result_snapshot |
These named locks are used while populating cache nodes in planning sessions. The complete lock names include also variables like schema, planning session or calc scenario, e.g.:
SAPSR3:_SYS_PLE:20161110140816_6401020:DATA.cv034_fox_cv032_fox_FoxFinalView_cache SAPSR3:_SYS_PLE:20161110140816_8775690:DATA.cv005_lookup_cv004_lookup_lookup_cache SAPSR3:_SYS_PLE:20161110140816_6401020:DATA.00011_result_snapshot |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | merger_queue_mutex | This lock can show up when the result of an OLAP engine request, parallelized across several job workers, is merged. See SAP Note 1999997 (“Which general optimizations exist for reducing the SQL statement memory requirements?”) in order to make sure that SQL statements are executed as efficient as possible. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | Mutex[RowLockConflictManager.cpp:97]@0x<id>: TrexStore::SyncQueue::SyncQueue | This lock is usually the consequence of other, general issues (e.g. memory, CPU, I/O, network or system replication). Therefore you should check for other symptoms you see at the time when these locks appear.
See SAP Note 2000000 for a general SAP HANA performance check. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | NameserverLock_ReadLock NameserverLock_WriteLock |
See “How can special lock waits be analyzed?” -> “Nameserver topology” below for more information. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | PerformanceTracer_FileLock | This lock is used when creating a performance trace. See SAP Note 1787489 and make sure that the performance trace is only activated when really required. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | PersistenceSessionRegistry | This lock is required to open and close sessions on persistence level (SAP Note 2400005), e.g. during ptime::Connection::close or ptime::Connection::open). The related call stack modules are:
DataAccess::PersistenceSessionRegistry::registerSession DataAccess::PersistenceSessionRegistry::unregisterSession Long wait times can be caused by a thread that allocated this mutex and has to wait for “Barrier Wait” for the so called iteration barrier for quite some time. See the related section related to “Barrier Wait” above. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | PoolAllocator-MemoryPool | Starting with SAP HANA 1.00.122.12 and 2.00.012.01 the former “unnamed Mutex” in MemoryManager::MemoryPool::allocate is explicitly named with PoolAllocator-MemoryPool.
This lock secures accesses to heap memory. Typically this wait is observed during memory defragmentation (SAP Note 1999997). |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | Queue.cpp: queueserver2::Queue::Queue | Idle wait of “Request” / “Queue Pull” threads from preprocessor to indexserver (see SAP Note 2114710), usually not critical | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | Sequence Manager | These waits are linked to bottlenecks when maintaining SAP HANA sequences. The related threads are typically working in modules ptime::SequenceManager::fillEnvBuffer / ptime::SequenceManager::nextval. Contention is possible if many tables with sequences are modified at the same time. In this case you can consider to activate or improve caching for frequently used sequences, e.g. with 100 elements:
ALTER SEQUENCE "<sequence_name>" CACHE 100 An overview of existing sequences can be retrieved via SQL: “HANA_Configuration_Sequences” (SAP Note 1969700). Due to a bug in SAP HANA <= 1.00.122.11 it can happen that remote accesses to the CurrvalLocationMap happen in scale-out environments although the information should be available locally (module ptime::RemoteSequence::updateCurrvalLocationMap). Moving tables with critical sequences to the nodes with the CurrvalLocationMap can improve the accesses and optimize performance. Starting with SAP HANA Rev. 1.00.122.12 relevant map information is generally retrieved locally and the remote calls are no longer required. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | SessionMutex | This mutex is used for different purposes in the areas of SqlExecutor thread handling and channel management. These mutex waits are typically symptoms of other underlying issues, so you should check for general problems at the time of the waits (e.g. memory reclaims, I/O or network issues) and have a look at the generated database traces (SAP Note 2380176). If required, open a SAP incident in order to request assistance from SAP support. Optimally you generate a runtime dump (SAP Note 2400007) at the time of the waits. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | system replication: SendQueueLock | This lock secures the send queue in system replication environments (SAP Note 1999880) that contains shipping information for log buffers. It is reset in scenarios like disconnect or log shipping timeout. Waits are typically a secondary effect of issues in areas like system replication (SAP Note 1999880) or I/O (SAP Note 1999930). | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | ThreadLockInfo | This lock secures the ring buffer for thread samples (SAP Note 2114710). In general this lock is only held for a short time, but in rare cases a thread can be stuck while holding this lock:
|
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | TRexAPI_RefCountedLock_Lock | This lock synchronizes trigger activities. Check if and why triggers or related features like sequences are implemented on the underlying tables and optimize or reduce them. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | TRexAPI_TableDeltaMerge_statisticLock | This lock is held while CPU and memory information is collected. This information is required to determine a proper number of merge tokens. See SAP Note 2057046 for more information. | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | TrexNet_Requestor_EndPointsLock | Among others, the critical path of this lock contains domain name service (DNS) accesses (e.g. “Communication::getHostAddress”) up to SAP HANA SPS 09. If the DNS lookups are slowing down, threads may queue up on the TrexNet_Requestor_EndPointsLock lock. Therefore you should check if there are issues with DNS if you see threads queueing on this lock. See SAP Note 2222200 for more information regarding network checks in SAP HANA environments.
Consider switching the listeninterface parameter from .global to .internal in scale-out environments as recommended in SAP Note 2183363, because with .internal setting will suppress DNS requests. |
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | UndoAnchor | This lock secures the materialization of undo data in parallelized update transactions. Contention is typically a side-effect of major underlying issues in areas like I/O (SAP Note 1999930) or system replication (SAP Note 1999880). | ||||||||||||||||||||||||||||||||||||||||
Mutex Wait | unnamed Mutex | Normally all important mutexes should be named properly, so “unnamed Mutex” should rarely pop up. If you face significant “unnamed Mutex” waits, consider the following known constellations:
|
||||||||||||||||||||||||||||||||||||||||
Mutex Wait | ZipResultInput_SectionLock | This lock can appear in the context of parallel sorting. Only threads of the same statement execution can block itself, so it is usually a local effect. If this lock appears frequently, you should analyze and optimize the related SQL statements (SAP Note 2000002). | ||||||||||||||||||||||||||||||||||||||||
Resource Load Wait | This thread state is used when a resource requested from the resource container is currently loaded. In this situation the requesting thread needs to wait until the read I/O completes. If you see significant waits, the following reasons are possible: | |||||||||||||||||||||||||||||||||||||||||
Mutex Wait
Semaphore Wait Semaphore Wait |
unnamed Mutex
AsyncRequest.cpp: TrexNet::RequestSet::go AsyncRequest.cpp: (anonymous namespace)::IOThread::run |
If you see these kinds of locks and only accesses to a certain table are blocked, you can check for the related thread types and thread methods (see SAP Note 2114710). In case you see significant amounts of
it is possible that the problem is caused by problems when modifying a partitioned table with more than one unique index. Check if the involved table has more than one unique index and is partitioned (across different nodes). If yes, check if it is possible to implement a workaround like disabling partitioning on this table. In particular you should avoid partitioning of BW SID tables (/BI0/S*). The underlying problem is fixed as of Rev. 97. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | BackupDestBackint_Executor::Connect | This lock happens when a SAP HANA backup waits for a 3rd party backup tool connecting to the Backint pipe. See SAP Note 1730932 for more information related to Backint. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | BackupDest_DestCallbackSemaphore | This lock secures the backup output channel (e.g. target file or backint pipe). It is normal to see this lock in these contexts. In case of very long durations you can check for problems in the I/O area. See SAP Note 1999930 for more information. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | BackupSuperblockRead | This lock secures the backup input channel. It is normal to see this lock in these contexts. In case of very long durations you can check for problems in the I/O area. See SAP Note 1999930 for more information. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | capacityReached | This is a normal wait situation during backups. The backup executor thread (Backup::BackupExe_Job) waits until all data is written to the backup destination. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | ChannelUtilsSynchronousCopyHandler | This lock is used when data is copied on disk level. This mainly happens during backups and system replication (transfer of replica data). It is normal to see this lock in these contexts. In case of very long durations you can check for problems in the I/O area. See SAP Note 1999930 for more information.
Related call stack modules are: Stream::SynchronousCopyHandler::doCopy Stream::ChannelUtils::copySynchronous DataAccess::ReplicationProtocolPrimaryHandler::writeBackupToSecondary |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | CSV_ColumnExport_Semaphore_Stats | This semaphore is required for synchronizing CSV exports. Due to a bug it can happen that parallel CSV threads infinitely wait for this semaphore and cancelling isn’t possible. This problem is fixed with SAP HANA Rev. >= 122.07, >= 2.00.002 and >= 2.00.010. As a workaround you can disable export parallelism:
indexserver.ini -> [import_export] -> use_parallel_export = false |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | DeltaIndexManager.cpp: TRexAPI::DeltaIndexManager::MergeAttributeThread::MergeAttributeThread | This lock can show up when the MergedogMonitor thread waits for merges performed by the MergeAttributeThread. See SAP Note 2057046 for more information related to delta merges. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | Event …: CallStack Not Initialized | This is a generic, unspecific semaphore name used for various semaphores in IBM on Power environments. On Intel they follow the naming convention “<module>.cpp: <function>” instead, e.g. “DeltaIndexManager.cpp: TRexAPI::DeltaIndexManager::MergeAttributeThread::MergeAttributeThread”. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait
Mutex Wait ExclusiveLock Enter |
IndexMgr.cpp: TRexConfig::IndexHandle::acquire Acquire Index Access unnamed Mutex TRexConfig_IndexMgrIndex_BucketLock |
These waits are linked to the IndexHandle, a column store table level synchronization lock that is mainly used for delta merge (SAP Note 2057046) synchronization.
Typical reasons for these wait situations are:
Up to SAP HANA SPS 11 it is often not easily possible to identify the IndexHandle lock holder. Starting with SAP HANA SPS 12 you can use the following hdbcons functionality to determine IndexHandle wait situations: hdbcons 'indexmanager waitgraph -w' The overall IndexHandle state can be determined with the following command: hdbcons 'indexmanager list [-f <table>]' See SAP Note 2222218 for more information related to hdbcons. Starting with SAP HANA SPS 12 also runtime dumps (SAP Note 2400007) contain a new section [INDEXMANAGER_WAITGRAPH] with lock dependency information related to the IndexHandle. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | IpmmTaskWait MemoryReclaim |
These inter process memory management (IPMM) locks can show up when memory reclaim activities like self compactions or shrinks happen. See SAP Note 1999997 and make sure that memory is used efficiently so that the amount of reclaim operations is reduced.
Threads waiting for this lock typically work in module MemoryManager::GlobalMemoryHandler::pmAcquireIPMMLock. In the [IPMM_MEMORY] of the runtime dumps (SAP Note 2400007) you are able to identify the process holding the lock: [6] PID=16290 (state=T), SId=1547305659, compactors active, alive, process name: hdbnsutil AB=1228591104b (1.14gb), UA=0b, U=109622220b (104.54mb), FSL=0b, SMAS=1191825408b (1.10gb), ... ! Executing shrink. Process with SlotID 5 requests 0b to shrink, 13146370048b (12.24gb) to reserve, task: DefragmentationTask. Up to SAP HANA SPS 10 also the hdbnsutil tool is part of IPMM. If hdbnsutil holds the lock, you should check if this process is stuck for some reason and terminate it manually. Starting with SAP HANA SPS 11 hdbnsutil is no longer part of IPMM and so it can no longer be responsible for the IPMM locks. Shrink activities are documented in the database trace (SAP Note 2380176) with entries like “Information about shrink”. Self compactions (defragmentations) aren’t displayed in the trace files. You can find details via |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | LogBufferFreeWait | This lock indicates that no freespace is available in the log buffer to record redo information. This is typically caused by disk I/O issues. See SAP Note 1999930 for more information. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | MailSenderCallback | Waits for this semaphore can be caused by the statistics server having problem to send out mails. In the related statistics server trace file you may find an entry related to the SMTP port 25 like:
(invalid)-><ip_address>/25_tcp ConnectWait The permanent hanging situation is caused by SAP HANA bugs which are fixed with Rev. 83 and Rev. 90. As a consequence of this problem the statistics server may no longer be replicated properly. In order to restart the system replication, you can restart the statistics server. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | PageIO::SyncCallbackSemaphore | Starting with SAP HANA 2.0 synchronous I/O requests are implicitly executed as asynchronous I/O requests. The PageIO::SyncCallbackSemaphore indicates that the thread is currently waiting for the asynchronous I/O request to finish properly. See SAP Note 1999930 for tuning SAP HANA I/O performance. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | POSTCOMMIT_FINISH_SMP | This lock is related to COMMIT processing and can be caused by the following issues:
|
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | PrefetchCallback PrefetchIteratorCallback |
These waits are active when the thread waits for pages requested from disk. A high number of these waits can indicate that the I/O stack is not able to keep up with the number of I/O requests. This may be normal (e.g. when a large number of columns is loaded in parallel after startup), but it can also indicate bottlenecks in the I/O area. See SAP Note 1999930 for more information regarding I/O optimization.
Threads waiting for this lock typically have call stacks including modules like DataContainer::PageChainContainerPrefetchedIterator::PrefetchBuffer::getNextPage. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | PrivateLogBufferFlushWaitSemaphore | This semaphore is required while flushing session specific log buffer information into the global log buffer in memory. It can show up if there is contention on the global log buffer, e.g. due to LoggerBufferSwitch waits. Therefore you should check for other unusual lock situations happening at the same time and optimize these lock situations.
Private log buffers are used as of SAP HANA 1.00 SPS 10. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | ResourceFlushWaitA ResourceFlushWaitB |
These locks indicate that a SAP HANA thread (e.g. in the context of a savepoint) waits for asynchronous I/O write requests to be finished. Long lock durations can indicate a high write volume or problems in the I/O area. See SAP Note 1999930 for more information regarding I/O optimization. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | SaveMergedAttributeJobSemaphore | This lock is used when results of a column thread should be merged to disk but the actual merge hasn’t finished, yet. See SAP Note 2057046 for more information related to merges.
You can normally consider this lock as a non-critical idle wait. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | TableDeltaMerge.cpp: TRexAPI::TableMergeRequest::TableMergeRequest | Threads waiting in modules like TRexAPI::MergeMonitor::acquireToken for this lock, the number of merge tokens is too small to handle the current merge load. Occasional merge peaks are acceptable, but if it happens frequently, you should consider the following optimizations:
See SAP Note 2057046 for more details about merges and merge tokens. |
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | TableLoad.cpp: TRexAPI::LoadFieldThread TableLoad.cpp: ltt::vector::vector |
This semaphore is used when TableReload threads trigger LoadField threads in order to load columns into memory, e.g. after a startup of SAP HANA. See SAP Note 2127458 for more information related to column loads. It is normal to see spikes of this wait at some times. Its runtime mainly depends on the I/O performance when columns are read from the data area on disk. If it takes too long, you should check the I/O performance based on SAP Note 1999930. | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait
SharedLock Enter |
TranslationTable.cpp: JoinEvaluator::TranslationTable::fill TranslationTable.cpp: JoinEvaluator::TranslationTable::completeFill RWLock[TranslationTable.cpp]: JoinEvaluator::TranslationTable::TranslationTable |
These locks happen in the context of populating translation tables which are used to map join columns (see SAP Note 1998599). Check the following suggestions to optimize the situation:
|
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | TRexApiSystem.cpp: TRexAPI::TRexApiSystem::preloadTables | This wait is linked to column preload as part of SAP HANA startup (SAP Note 2127458). The related thread is of type ‘Assign’ with thread method ‘post_assign’. In case of longer waits you have to check why preload takes longer than expected. Typical reasons are individual issues (e.g. long text column load as described in SAP Note 2461411 or I/O read bottlenecks (SAP Note 1999930). | ||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | unnamed Semaphore | Normally all important semaphores should be named properly, so “unnamed Semaphore” should rarely pop up. If you face significant “unnamed Semaphore” waits, consider the following known constellations:
|
||||||||||||||||||||||||||||||||||||||||
Semaphore Wait | WaitAndSwitchCounterResourceSemaphore | This lock is required when a savepoint or snapshot waits for the resource flush thread to finish flushing data pages to disk. It is normal that this wait appears from time to time. If you see a particularly high number of occurrences, you can check the savepoint behavior as described in SAP Note 2100009. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLockEnter |
Acquire AttributeStore Access | This lock secures access to the column store for specific operations. For example, a compression optimization (SAP Note 2112604) has to access the column store without any garbage collection (SAP Note 2169283) going on in parallel (AttributeEngine::AttributeStoreHandle::setStoreHandleAndLockGC), and so it has to wait for ‘Acquire AttributeStore Access’ until a running persistence garbage collection is finished. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter Mutex Wait |
AttributeValueContainer writeLock
AttributeValueContainer readLock |
These locks secure accesses to column store main areas:
Lock contention on “AttributeValueContainer readLock” can happen in the following cases:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
calcEngine_CalcEngineManager_ScenarioMapRWLock | These locks can be secondary symptoms when a thread is blocked by the ceManager_GCLock lock, i.e. by the calculation engine garbage collector (CalcEngineGarbageCollector, see SAP Note 2169283). | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
Connection List RWLock | This lock secures the session connection list. With SAP HANA Rev. 110 to 112.02 it can be critical in the context of the SAP HANA bug described in SAP Note 2400515. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
ConsistentChangeLock | During the blocking phase of savepoints DML operations are blocked with “SharedLock Enter” related to the lock type ConsistentChangeLock. The call stacks contain modules like DataAccess::SavepointSPI::lockSavepoint. See SAP Note 2100009 and check if the blocking phase of savepoints can be optimized.
The actual reason of long ConsistentChangeLock waits can also originate from other activities. The section [SAVEPOINT_SHAREDLOCK_OWNERS] of runtime dumps (SAP Note 2400007) lists owners of shared ConsistentChangeLocks. If this information isn’t available, you can double-check the following list of potential lock holders:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
ExpensiveStatements_RingBufferLock | This lock secures the ring buffer linked to the expensive statement trace (SAP Note 2180165). Typical reasons for an increased amount of lock waits are:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
FileIDMapping | This lock secures persistence container accesses (e.g. for hybrid LOBs or virtual files). A high contention can happen when many persistence garbage collector threads (GCJob*) are concurrently active. Typical call stacks are DataContainer::FileIDMapping::addMapping (e.g. in the context of an INSERT involving a disk LOB), DataContainer::FileIDMapping::mapToEntryID and DataContainer::ContainerDirectoryImpl::getInternal. In this case you can reduce the maximum number of persistence garbage collectors by setting the following parameter to a lower value (SAP Note 2222250).
global.ini -> [persistence] -> max_gc_parallelity See SAP Note 2169283 for more information regarding SAP HANA garbage collection. |
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
GeometryDeltaAttribute Lock | This lock is used to synchronize accesses to data structures on the delta of geometry columns (ST_POINT and ST_GEOMETRY) in the context of modification operations like INSERT or UPDATE. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
GlobalContainerNameDirectoryTransLockStats | This lock protects the persistence container name directory. If concurrent garbage collector threads (GCJob*) are responsible for the lock waits you can reduce the maximum number of persistence garbage collectors by setting the following parameter to a lower value (SAP Note 2222250).
global.ini -> [persistence] -> max_gc_parallelity See SAP Note 2169283 for more information regarding SAP HANA garbage collection. SAP Note 2404252 describes a SAP HANA bug with Rev. 122.01 to 122.05 that can result in lock waits in combination of M_TABLE_PERSISTENCE_LOCATIONS accesses. |
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
Global SqlExecutor thread lock | This lock is used when the number of SqlExecutor threads is changed for some reason, e.g.:
Waits for this lock are typically a consequence of another underlying root cause (e.g. very high load or infrastructure bottlenecks). |
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
IndexManagerMapLock | This lock secures table accesses on a low level. Usually these locks are shared and so no waits happen. In specific scenarios locking can happen, usually in combination with DDL commands or background activities (e.g. merge, optimize compression). This lock is global and not table-specific, so accesses to all tables can be hampered. Check the recommendations related to the following locks to identify the root cause of IndexManagerMapLock issues:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
JoinEvaluator_JEPlanData_Lock | This lock situation happens when multiple threads of the same join engine job need to access the same plan data objects. This is a rather normal situation so that it is not necessarily an issue that this lock belongs to the most frequent ones in a system. The lock waits are always local to a single request / single connection. If the locks result in performance issues, you should check if the underlying SQL statement is already processed optimally or if it can be tuned (SAP Note 2000002). | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
LlangCompilingMap | This lock synchronizes concurrent compilation of L procedures. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
MergeMonitor_Lock TRexAPI::TableMergeRequest::TableMergeRequest |
These lock wait situations indicate that merge operations have to wait for merge tokens, e.g. due to a high amount of concurrent merges or bottlenecks during merging. See SAP Note 2057046 for more details. In very special cases you can adjust parameters like the following in order to control the merge throughput:
indexserver.ini -> [mergedog] -> load_balancing_func indexserver.ini -> [mergedog] -> token_per_table |
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
MergeMVCCLock | This lock can happen in case of concurrent merges and is typically a consequence of other issues (e.g. I/O problems or highly concurrent merges). See SAP Note 2057046 for more details regarding merges. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
MVCCSyncLock | This lock is relevant for column store tables only and it is responsible for synchronizing low-level MVCC operations with DDL statements, delta merges and compression optimizations. It can e.g. show up if different internal activities like persistence garbage collection and compression optimization collide. See SAP Note 2169283 for more information related to garbage collection and MVCC. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
pageAllocLatch | This latch secures shared memory page allocations in the context of the row store. Contention for this latch was observed in the context of MemoryReclaim waits (SAP Note 1999997). If you face a high amount of pageAllocLatch waits, you should check at first if there was a concurrent shrink operation and MemoryReclaim waits. If yes, the pageAllocLatch is most likely only a secondary symptom and you have to reduce the risk of automatic memory shrinks. | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
SavepointLock | This lock synchronizes savepoints and makes sure that a savepoint (or snapshot) isn’t started at a time when another savepoint (or snapshot) is already running. As an example, a system replication related snapshot will wait for SavepointLock as long as a normal savepoint is in progress on primary side. If you frequently see these locks, you should check and optimize the savepoint runtime and performance (SAP Note 2100009). | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
SessionContextCacheMapLock | This lock secures the access to session context information. If you see many waits, it is usually not a problem itself, but a symptom of another underlying issue (e.g. slow or hanging COMMIT operations). | ||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
transLock | This lock secures transaction management. Permanent transLock waits can be caused by a SAP HANA deadlock bug that is fixed with Rev. 112.04 and Rev. 121. If you already run into this bug situation, you have to restart SAP HANA to resolve the stuck situation.
If the waiting threads are in a call stack containing ptime::Transaction::find_trans_from_tid2trans and ptime::TransTokenConnector::checkTIDGloballyClosed, the problem is linked to NO LOGGING tables and you could prevent future issues by making sure that all of these tables are located on the master node. See SAP Note 2340583 for more information. |
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
TRexConfig_IndexMgrIndex_ReplayLock | This lock is required when delta log information is loaded or replayed. If for example a large delta storage isn’t loaded into memory, yet, the first thread requiring this information will load it in a module like Delta::log::Replay::replayLog. Other processes have to wait for the TRexConfig_IndexMgrIndex_ReplayLock lock.
Typical reasons for this lock are:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter |
unnamed ReadWriteLock | In some cases read write locks aren’t named properly and so they are only shown as “unnamed ReadWriteLock”. The most typical scenarios are:
|
||||||||||||||||||||||||||||||||||||||||
SharedLock Enter ExclusiveLock Enter Semaphore Wait |
vfLOBStatsMapLock
unnamed Semaphore |
The vfLOBStatsMapLock is linked to virtual file LOB statistics. If these locks appear on secondary system replication sites (SAP Note 1999880) with operation mode logreplay in context of LogRecoveryQueue<id> threads and call stacks like DataContainer::VirtualFileStatsProxy::getVirtualFileLOBStatistics or DataContainer::VirtualFileStatsProxy::removeVirtualFileLOBStatistics, a reduction of the recovery queue size can help to improve the situation:
global.ini -> [persistence] -> recovery_queue_count See SAP Note 2222250 for more information. |
An overview of internal lock waits can be retrieved via:
- M_READWRITELOCKS, M_SEMAPHORES, M_MUTEXES
- SQL: “HANA_Locks_Internal_LockWaits_Overview” (SAP Note 1969700)
- liveCache related internal lock waits can be checked via SQL: “HANA_liveCache_LockWaits_Current” (SAP Note 1969700)
Be aware that many harmless internal lock waits are included in these tables (e.g. threads being idle, but waiting for internal locks at that time), so a top down analysis is not helpful.
If the SAP HANA database is no longer accessible due to the internal lock escalations, see SAP Note 1999020 for potential additional analysis steps.
If you are not able to determine the root cause of critical internal lock waits, open a SAP incident on component HAN-DB and provide a full system dump if possible (SAP Note 1732157).
7. How can special lock waits be analyzed?
Special lock waits are all lock waits that aren’t covered by normal thread states and monitoring views. Due to a lack of monitoring data it is particularly hard to analyze them. In the following table you can find pro-active suggestions and further details:
Lock type | Details | ||||||
Nameserver topology | The namesever topology tracks the current state of the database (e.g. existing tables and locations) and is implemented as shared memory. Concurrent accesses to the nameserver topology are synchronized via a file on operating system level:
If there is a bottleneck accessing the file (e.g. because the /tmp file system is mounted via NFS), the performance can be significantly impacted. In order to minimize the risk of synchronization delays you have to make sure that the file is located on fast, local devices. See SAP Note 2100296 for more information. Starting with SAP HANA Rev. 91, but only with SPS 09, a message is written to the daemon trace if the file system is potentially critical. Mini check 1720 (“Supported nameserver lock file location”) evaluates this information. The different file type used with SPS >= 10 avoids these issues. An increased number of nameserver topology accesses can also be linked to an inefficient SQL cache utilization, e.g. due to many evictions or a lack of bind variables. See SAP Note 2124112 and make sure that SQL cache and parsing activities are working fine. In rare cases it can be helpful to relocate this file to a different location. This can be done by configuring the following parameter in configuration file /usr/sap/<sid>/HDB<inst_id>/<hostname>/sapprofile.ini: HDB/NameServer/RWLockPath=<filesystem_path> Example: HDB/NameServer/RWLockPath=/var/hdb/ramdisk Typical call stacks of threads waiting for the lock are: TrexIpc::NameserverLock::acquireRead NameServer::FTreeContainer::getValue An indirect indication for problems accessing this file can be increased SelfWatchDog thread activities (see SAP Note 2114710). Mini Check 870 performs a related check (see SAP Note 1999993). If you assume a problem, you can run SQL: “HANA_Threads_Callstacks” (THREAD_TYPE = ‘SelfWatchDog’) available via SAP Note 1969700 in order to have a look at the call stacks of the SelfWatchDog threads. If threads typically work in the following call stack, a problem accessing the lock file is likely: checkLockFile __close_nocancel __fopen_internal __GI__IO_file_open _IO_new_fclose _IO_new_file_close_it _IO_new_file_fopen __open_nocancel TrexService::SelfWatchdog::checkLockFile TrexService::SelfWatchdog::run Be aware that the following warning in trace files can actually be ignored because all file system types can be used for the lock file (unlike the DATA and LOG area where only specific file system types are supported): w FileIO Filesystem.cpp: Unsupported file system "<file_system_type>" for "/var/lib/hdb/<sid>/.hdb_<sid>_<inst_id>_lock" |
||||||
Runtime dump lock | If several runtime dumps (including variations like OOM or crash dump) are triggered at the same time, only one dump is executed at a time and other have to wait in call stacks like:
1: __lll_lock_wait+0x20 2: _L_lock_1008+0xb 3: __pthread_mutex_lock_internal 4: Synchronization::SystemMutex::lock 5: Diagnose::CrashDump::executeRuntimeDump 6: Diagnose::RuntimeDump::createSystemDump 7: MemoryManager::reportMemoryLimitViolation For technical reason the lock is implemented as system mutex (Synchronization::SystemMutex::lock) that is not displayed as SAP HANA lock wait. Instead the thread state appears as ‘Running’. At the same time it is an idle wait and no CPU is consumed. A hanging runtime dump can – depending on the context of the dump – be responsible for contention on other internal SAP HANA locks. To avoid this kind of locking you should reduce the possibility that different types of runtime dumps are executed in parallel. |
8. Do timeouts for lock waits exist?
Transactional lock waits are terminated when the time limit defined with the following parameter is exceeded:
indexserver.ini -> [transaction] -> lock_wait_timeout
Its default value 1800000 which represents 1800000 ms / 1800 s / 30 minutes. This means that a lock wait is terminated after 30 minutes and the following error message is issued:
SQL error 131: transaction rolled back by lock wait timeout
This behavior is different from other databases like Oracle where no timeout for exclusive lock waits exist and transactions will wait for the lock until the lock is available or the transaction is manually terminated.
For internal lock waits no timeout is implemented.
9. What happens in case of deadlocks?
Deadlocks are situations where two or more transactions lock each other cross-wise so that no transaction will ever be able to proceed. Usually deadlocks are caused by the application design, but there can be also more technical deadlocks in the context of primary key constraints (SAP Note 2429521). The actual behavior in case of a deadlock depends on the lock type:
Lock type | Deadlock behavior |
Transactional lock | If SAP HANA recognizes a deadlock based on a transactional lock, one of the transaction will be terminated and the following error is issued:
SQL error 133: transaction rolled back by detected deadlock This behavior is different from other databases like Oracle where a deadlock will only terminate the current DML operation and not the whole transaction. |
Internal lock | A deadlock on top of internal locks (e.g. SAP Note 2304201) has to be considered as SAP HANA bug and can only be resolved with a restart.
You can check for certain (but not all) internal deadlocks using the hdbcons deadlock detector functionality as described in SAP Note 2222218: hdbcons 'deadlockdetector wg -w -o <file_name>.dot' |
10. Is it possible to identify the record responsible for a record lock?
It is currently not possible to identify the record responsible for a record lock. The view M_BLOCKED_TRANSACTIONS contains a reference to the record (WAITING_RECORD_ID), but this reference can’t be evaluated in a reasonable way.
11. How can I determine how long certain types of locks are held?
Normally it is most important to analyze lock wait situations, i.e. the concurrent access to the same resources. For transactional locks you can additionally see how long locks are held (even if no transaction is waiting) by using the following approaches:
- M_OBJECT_LOCKS, M_RECORD_LOCKS, M_OBJECT_LOCK_STATISTICS
- SQL: “HANA_Locks_Transactional_Current” / SQL: “HANA_Locks_Transactional_Total” (SAP Note 1969700)
12. What is the unit of column TOTAL_LOCK_WAIT_DURATION in monitoring view M_SQL_PLAN_CACHE?
TOTAL_LOCK_WAIT_DURATION contains the time a SQL statement had to wait for transactional locks. Like all other duration columns in monitoring view M_SQL_PLAN_CACHE the unit is usually microseconds. Due to a bug it changed to milliseconds for SAP HANA Revisions 110 to 112.06 an 120 to 122.03. As a consequence the displayed wait durations are factor 1000 too low on these Revision levels.
13. How can I automatically capture additional information in case of unusual locking issues?
In order to optimally understand specific – mainly internal – lock waits, additional information like runtime dumps (SAP Note 2400007) are helpful. You can schedule SAP HANASitter (SAP Note 2399979) in order to automatically trigger follow-up actions in case of unusual thread states.
Example:
Command | Details |
python hanasitter.py … -ct LOCK_WAIT_NAME,Resource-Load-Wait,10 | Trigger data collection if there are at least 10 threads waiting for the “Resource Load Wait” lock |
14. Why are there deadlocks although changes are done in a sorted manner?
Deadlocks – i.e. terminations with “SQL error 133: transaction rolled back by detected deadlock” – are a consequence of cross-wise locks. If all concurrent transactions perform the record changes in the same order, no cross-wise lock can happen, so changes with a special sort order typically avoid deadlocks. Instead you can only see transactional lock waits.
There is one important restriction to this rule: If the underlying table is partitioned, the modifications are no longer necessarily done in the defined sort order, because modifications can happen in parallel in different transactions. This is a general risk when you partition a table, even if the partition columns and the WHERE clause of the modification match.
If you want to make sure that concurrent sorted modifications on a partitioned table don’t run into a deadlock, you have to set the partitioning related minimum bulk load size to a sufficiently high value in order to make sure that loads won’t use the bulk load parallelism, e.g.:
indexserver.ini -> [partitioning] -> bulk_load_size = 2000000000
This global setting may cause increased runtime of all modification operations on all partitioned tables in the system. Therefore it has to be used with care and tested thoroughly.
15. How can the lock holder be determined?
In case of wait situations it is of central importance to identify the lock holder and analyze its behavior. The following options exist to identify the lock holder:
Scenario | Analysis commands (SAP Note 1969700) | Details |
transactional locks | SQL: “HANA_Locks_Transactional_LockWaits” SQL: “HANA_Locks_Transactional_LockWaits_History” |
In case of transactional locks the lock holder is available via M_BLOCKED_TRANSACTIONS / HOST_BLOCKED_TRANSACTIONS (LOCK_OWNER_* columns). |
(some) internal locks | SQL: “HANA_Threads_CurrentThreads” SQL: “HANA_Threads_ThreadSamples_FilterAndAggregation” |
A subset ofinternal locks is displayed in column LOCKS_OWNED of M_SERVICE_THREADS (Rev. 1.00.120 and higher) and M_SERVICE_THREAD_SAMPLES / HOST_SERVICE_THREAD_SAMPLES (Rev. 1.00.122.08 and higher). |
ConsistentChangeLock | The section [SAVEPOINT_SHAREDLOCK_OWNERS] of runtime dumps (SAP Note 2400007) provides an overview of threads holding a ConsistentChangeLock. | |
index handle waits | Starting with SAP HANA 1.00.120 it is possible to determine index handle lock dependencies via “hdbcons ‘indexmanager waitgraph -w'” and via runtime dump (SAP Note 2400007) section [INDEXMANAGER_WAITGRAPH]. See “Acquire Index Access” above for more information. | |
general | SQL: “HANA_Threads_CurrentThreads” SQL: “HANA_Threads_ThreadSamples_FilterAndAggregation” SQL: “HANA_Threads_Callstacks” |
If none of the above direct options exist to determine the lock holder, you have to have a close look at concurrent thread activities (SAP Note 2114710) and their call stacks (SAP Note 2313619) and watch out for usual suspects (as e.g. documented in “How can internal lock waits be analyzed?” above). |
Keywords
SAP HANA lock enqueue exclusive shared wait mutex semaphore barrier sleep conditional variable
Product or Product Version |
---|
SAP HANA, platform edition all versions |
Number | Title |
---|---|
2400007 | FAQ: SAP HANA Runtime Dumps |
2400005 | FAQ: SAP HANA Persistence |
2399979 | How-To: Configuring automatic SAP HANA Data Collection with SAP HANASitter |
2380176 | FAQ: SAP HANA Database Trace |
2313619 | How-To: Generating SAP HANA Call Stacks |
2222250 | FAQ: SAP HANA Workload Management |
2222218 | FAQ: SAP HANA Database Server Management Console (hdbcons) |
2222200 | FAQ: SAP HANA Network |
2215131 | Alert Log Switch Wait Count Ratio |
2186744 | FAQ: SAP HANA Parameters |
2180165 | FAQ: SAP HANA Expensive Statements Trace |
2180119 | FAQ: SAP HANA Smart Data Access |
2169283 | FAQ: SAP HANA Garbage Collection |
2160391 | FAQ: SAP HANA Indexes |
2127458 | FAQ: SAP HANA Loads and Unloads |
2124112 | FAQ: SAP HANA Parsing |
2114710 | FAQ: SAP HANA Threads and Thread Samples |
2112604 | FAQ: SAP HANA Compression |
2100040 | FAQ: SAP HANA CPU |
2100009 | FAQ: SAP HANA Savepoints |
-
More
[ 20 / 41 ]
Number | Title |
---|---|
2169283 | FAQ: SAP HANA Garbage Collection |
2000002 | FAQ: SAP HANA SQL Optimization |
2380176 | FAQ: SAP HANA Database Trace |
2160391 | FAQ: SAP HANA Indexes |
2114710 | FAQ: SAP HANA Threads and Thread Samples |
2399990 | How-To: Analyzing ABAP Short Dumps in SAP HANA Environments |
1999997 | FAQ: SAP HANA Memory |
1999993 | How-To: Interpreting SAP HANA Mini Check Results |
2399979 | How-To: Configuring automatic SAP HANA Data Collection with SAP HANASitter |
2180165 | FAQ: SAP HANA Expensive Statements Trace |
2044468 | FAQ: SAP HANA Partitioning |
2154870 | How-To: Understanding and defining SAP HANA Limitations |
2000003 | FAQ: SAP HANA |
2222217 | How-To: Troubleshooting SAP HANA Startup Times |
2222218 | FAQ: SAP HANA Database Server Management Console (hdbcons) |
2100040 | FAQ: SAP HANA CPU |
2220627 | FAQ: SAP HANA LOBs |
2119087 | How-To: Configuring SAP HANA Traces |
2222277 | FAQ: SAP HANA Column Store and Row Store |
2100010 | SAP HANA: Popular Misconceptions |