User Tools

Site Tools


products:promonitor:6.8:troubleshooting:monitoring:timeouts

Monitor job timeout

  • Each monitor has a configurable timeout parameter and must finish its execution within the allocated time.
  • If you see a monitor with a TIMEOUT state, that means that the monitor was interrupted before normal termination.
  • When a monitor is interrupted, it will have the following consequences:
    • Alerts and metrics won't be generated for the monitored component
    • Shortdumps may be generated in the target system
    • The execution of other monitors may be delayed.

Reasons for timeout can be several, and from that will depend how to resolve it:

  • Monitor generating too long computing time in the system
  • Monitor fetching too many data
  • Slow SAP systems
  • Non responding components in SAP (Like RFC destinations)
  • Too many old data in the system
  • Too many monitors scheduled within the allocated batch time

How to investigate

You need to identify which monitor timed out, because they will slow down the monitoring by blocking other monitors:

  • From Redpeaks Monitor errors screen, look for TIMEOUT and KILLED status
  • Timeout of Monitors can also be detected in the worker logs
  • If configured, an alert will be sent when a monitor did not run correctly

How to fix

First:

  • In all cases, you can start by increasing the individual timeout of the monitors
  • Run a test, to see how long it takes to complete. Monitors have been designed to run from few seconds to a minute. If a monitor takes more than that, it can be problematic.

Second:

  • Check the allocated batch time: Monitors are executed in batches in a dedicated OS process. This process has a configurable maximum run time in which all the monitors must fit.
  • If a batches reaches its time limit without having processed all the scheduled monitors, running monitors will be killed and immediately rescheduled with the remaining ones in a new process.
  • By giving more time to the process, you will have better chances to complete all tasks and avoid killed monitors.

Third:
If the monitor execution exceeds one or two minutes, you have different options to reduce its execution time:

  • Adjust its configuration:
    • Reduce the period (if available), and run the monitor more often.
    • Do not monitor components that fails systematically
  • Cleanup old data:
    • Sometimes there could be very old records laying in the DB which are not useful, but slowing down the collection.
    • To remove them can drastically improve the response time, while freeing disk and memory resource on the system.
  • Monitor less components when a system is slow:
    • Sometimes, the monitor times out just because the system is slow. Most monitors may experience slow response times.
    • In that case, there isn't much to do. The SAP system hasn't enough bandwidth to cope with all active monitors and you may deactivate some.
  • Examples of monitors known for timing out often:
    • ABAP instance response time: Reduce the period
    • ABAP transactions: Reduce the period
    • IDOC/TRFC/QRFC: Remove old trailing records, reduce the period.
    • RFC destinations: Stop monitoring destinations which are constantly failing.
/home/clients/8c48b436badcd3a0bdaaba8c59a54bf1/wiki-web/data/pages/products/promonitor/6.8/troubleshooting/monitoring/timeouts.txt · Last modified: 2024/05/01 18:35 (external edit)