===== Monitor job timeout ===== * Each monitor has a configurable timeout parameter and must finish its execution within the allocated time. * If you see a monitor with a ''TIMEOUT'' state, that means that the monitor was interrupted before normal termination. * When a monitor is interrupted, it will have the following consequences: * Alerts and metrics won't be generated for the monitored component * Shortdumps may be generated in the target system * The execution of other monitors may be delayed. Reasons for timeout can be several, and from that will depend how to resolve it: * Monitor generating too long computing time in the system * Monitor fetching too many data * Slow SAP systems * Non responding components in SAP (Like ''RFC destinations'') * Too many old data in the system * Too many monitors scheduled within the allocated batch time ==== How to investigate ==== You need to identify which monitor timed out, because they will slow down the monitoring by blocking other monitors: * From Pro.Monitor **Monitor errors** screen, look for ''TIMEOUT'' and ''KILLED'' status * Timeout of Monitors can also be detected in the worker logs * If configured, an alert will be sent when a monitor did not run correctly ==== How to fix ==== **First:** * In all cases, you can start by increasing the individual timeout of the monitors * Run a test, to see how long it takes to complete. Monitors have been designed to run from few seconds to a minute. If a monitor takes more than that, it can be problematic. **Second:** * Check the allocated batch time: Monitors are executed in batches in a dedicated OS process. This process has a configurable maximum run time in which all the monitors must fit. * If a batches reaches its time limit without having processed all the scheduled monitors, running monitors will be killed and immediately rescheduled with the remaining ones in a new process. * By giving more time to the process, you will have better chances to complete all tasks and avoid killed monitors. **Third:**\\ If the monitor execution exceeds one or two minutes, you have different options to reduce its execution time: * __Adjust its configuration:__ * Reduce the period (if available), and run the monitor more often. * Do not monitor components that fails systematically * __Cleanup old data:__ * Sometimes there could be very old records laying in the DB which are not useful, but slowing down the collection. * To remove them can drastically improve the response time, while freeing disk and memory resource on the system. * __Monitor less components when a system is slow:__ * Sometimes, the monitor times out just because the system is slow. Most monitors may experience slow response times. * In that case, there isn't much to do. The SAP system hasn't enough bandwidth to cope with all active monitors and you may deactivate some. * __Examples of monitors known for timing out often:__ * **ABAP instance response time**: Reduce the period * **ABAP transactions**: Reduce the period * **IDOC/TRFC/QRFC**: Remove old trailing records, reduce the period. * **RFC destinations**: Stop monitoring destinations which are constantly failing.