SAP System Health Checks
SAP System Health Checks
SAP system health checks are part of basis consultant daily activities.The SAP system health checks are different for each system type like ABAP, Java, and BI, XI/PI systems. The list of transactions used for SAP system health checks are vary from customer to customer. Below are commonly used system SAP health check transactions for R3 systems.
- Monitoring Application Servers in SM51
- From SM51,we can Check whether all SAP servers are running or not
- When application server is not available, we can‘t find the host name for the server.
- We can also get the information about number of users logged to the instances
- We can estimate the current activities and the resulting load on the instance
- And also we can monitor list of all app servers, sever info, Queue info and SNC status in SM51
- We can also view the status of all active/inactive instances using RZ03
- Please ensure all the active servers and running fine, especially for Productions systems. As part of system health checks If you found any server which is not running then check for the reason and adjust the load balancing.
- Monitoring Work processes in SM50 & SM66–
- We can Check for Long time running work processes
- Check the state of work process
- We can cancel a process by selecting the work process and click the process menu; in this menu we have “Cancel” option.
- When we cancel work process, all transactions are terminated and dispatcher starts new work process.
- We can also view the work processes of the instance at OS level by using DPMON p (when only one instance) DPMON P <instance profile> <SID> (When multiple instances)
- As part of SAP system health checks please check the all the work processes status and also check for long running, status with error.
- Active users over view in SM04 and AL08
- Display active users and their activities
- Checking for the users who have lost their connections
- We can also view the number of RFC users, number of interact users, terminal, active instances.
- Check for log on load balance using SMLG(log on groups)
- We can check for load distribution and average response time and quality ratio in SMLG— Press F5 for load distribution –now we can see log on groups, instance, status, response time, number of users per instance, quality of load balancing and dialog steps of each instance.
- If any application server is goes down for any reason, user can still connect the group which will assign the best application server (SPACE –default log on group).
- Checking for ABAP Dumps in ST22
- When user have a program error , the dump is immediately displayed
- This dumps include information like ―what happened‖,‖ what can we do‖,‖ what tables were being used at that time‖ and ―any other program were affected‖.
- We can see how many short dumps have been generated in the current day and previous day—–in ST22, select the day and click on the display list icon.
- ABAP run time errors like Syntax errors, wrong logic, exception/short text Invalid request made to DB interface when accessing table(DBIF_RSQL_INVALID_REQUEST)
- Application program error(MESSAGE_TYPE_X) Error occurred due to database inconsistency (start_call_sick) (DBIF_NTAB_TABLE_NOT_FOUND ,UNCAUGHT_EXCEPTIION)
- If there are more than 20 dumps, we should send a mail to the client.
- All these dumps are stored in SNAP table, we need to delete the old dumps, why because some old dumps may cause some more new dumps.
- As part of SAP system health checks and reporting please address all the errors which is related to business and functional users.
- Check for Lock entries in SM12
- Check for Lock entries older than one day should be analyzed and corrected (lock entry menu— delete).
- Normally, the locks are automatically released when transactions are committed or when users are finished working on the data.
- But we need to aware about locks which are held unreleased for several hours—in this situation ,problem might be long running background jobs which update the data base , abnormal termination of the SAPGUI and also problems in update processing.
- To solve this problem ,we can delete the lock entries or logoff the user from SM04
- To test whether the enqueue work process is working correctly or not l( go to SM12 > Extras menu> diagnosis
- All lock entries are stored in TLOCK
While performing the SAP system health checks if you found any older locks please investigate and take the necessary actions, old lock entries cause so many problems which lead to lock table overflow, for more information please refer SAP lock table overflow
Causes of malfunctions in lock management
- The enque server is not available
- The message server is not available
Faulty computer links
- Incorrect system installation
- Incorrect setting of system parameters
- Check for any terminated updates in SM13
- We can check for any hanged updates, or updates pending for long or updates in PRIV mode
- We can check the system update records with error status or which have not yet been processed.
- Based on the type of error ,request the user repeat the data entry which performs the update
- All update requests are stored in VBLOG table
- We can monitor the cancelled update requests and all update requests in SM14
8. Check whether Production Client is open or close In SCC4
- Production Client should be always in closed status unless it is approved to open.
- close-No changes allowed, No change repository objects and no changes without automatic recording and no changes to repository objects and also we can set the protection level 2(no over writing, no external availability)
- Open-changes to repository objects, automatic recording changes and no restrictions
9. Monitoring Database in DB02
Database monitoring is also part of SAP system health checks, please perform the below checks.
- Check for Space of Table space used more than 90%
- if table space size reached 90% we need to increasing table space by adding data file using SAPDBA tool
- Check for Missing indexes indexes in BD02, by selecting ―missing indexes ― tab we can find missing indexes like primary indexes and secondary indexes,
- If we find any missing indexes we can report to ABAPers they will assign those indexes to proper Tables
- Check for Space critical objects (huge table/report occupy the maximum space in buffer, in this situation system may hang)
- If we find any space critical objects ,we should add the data file to the table space
10.Check for the terminated back ground jobs and long running jobs in SM37
- We can check the status of the job, if it is in active state for long time, we need to cancel that job(job-cancel)
- If it is still in active state we need to check the status in Job menu
- Still it is in same state we should kill that particular work process from SAP level(SM51) or OS level (DPMON)
- If one particular job is cancelled more than 20 times, a mail should be sent to the user.
- Check for Spool requests SP01
- Look for spool jobs that have been ―in process‖ for an hour.
- Checking for problems with spool and output request
- Check logs and possible causes of printing problems
- Check the output request attributes, the log files, and the size of the print job
- Check whether the device type and what the access method is for the device.
- Deleting old spool requests or scheduling the background job which automatically deletes them, if there are more than 50 spool errors, then they have to be deleted.
12. Monitor the back-up logs in DB12
- Check whether last back up is successful or not
- If it is not successful then problem might be sapbackup disk was full.
- We can check the redo logs which are not yet back up
- And check for the backup status
- Checking archive log directory status(oracle/SID/oraarch)
- Verify the log of the last backup and free space in log directory
13. Monitoring OS logs and CPU utilization in ST06 or OS06
- We can monitor the entire operating system in OS06 or ST06
- We can monitor OS logs and CPU utilization -detailed analysis-OS logs and CPU utilization
- We can monitor the CPU, memory, swaps, disks, file systems, and network (OS06-datailed analysis ) From the detailed analysis menu we can see operating system information from any of the previous 24 hours
14. Alert Monitoring in ST04
- We can monitor the system alerts from RZ20-doubleclick on the corresponding monitor-in that screen click on the Open alert.
- We can set the thresholds /alerts and our own monitors in RZ20-extras menu-click on activate maintenance function-in that we can create threshold with colors
- We can also monitor database alert log in ST04-Detailed analysis-error log in OS level -Oracle/SID/sap trace/back ground directory
- We can find the all data base errors and warnings in DB16
15. System log in SM21
- We can monitor the system log in SM21
- System Log stores all messages including problems warnings & information.
- In this system log we mainly look for errors related to Database listed as ORA-???? Error and update terminate errors.
We can also monitor the local system and all remote systems logs –SM21-system log menu-select Choose option-in that we have local system, remote system options.