products:promonitor:6.8:userguide:administration:adminconfig:alarmsandmetrics

Alarms and Metrics

Purpose

  • Defines global Alarms and Metrics configuration
  • All alarms and metrics generated by Redpeaks can be propagated by email or to third party applications by the use of plugins only.
  • If you want to use different plugins depending on the origin of an alarm (SAP/internal), you can use Alarm rules for that
  • Internal alarms are typically send by email to Redpeaks admin

How to access Alarms and Metrics feature

  • From the top right of the screen, click on the setting icon
  • Select the admin configuration sub-menu
  • Click on tabs Alarms/Metrics


System availability alerts

  • Max connection resp. time (sec) : An alert will be generated if a System is not responding after a number of seconds set in this input field. The severity of the Alert can be set using the corresponding dropdown list.
  • Max system down time (sec) : An alert will be generated after attempting to reach a System for a number of seconds set in this input field. The severity of the Alert can be set using the corresponding dropdown list.
  • Time zone alarm: An alert will be generated if the time zone of a system is not properly set, or cannot be resolved. This option will define the severity used for this alert.


Internal alerts

  • Monitor job execution error : An alert will be generated if a Monitor job encounters an error during its execution. The severity of the Alert can be set using the corresponding dropdown list.
  • CCMS errors : An alert will be generated if CCMS kind jobs encounter an error during its execution. The severity of the Alert can be set in the dropdown list.
  • Monitor Tree loading errors : An alert will be generated if Monitor Tree kind jobs encounter an error during loading data from SAP. The severity of the Alert can be set in the dropdown list.

Agents

This set of alarm settings will help to detect and be notified when a problem is detected on a agent:

  • Max agent down time (sec) :
    • To be notified when an agent is not responding
    • Define the max time in seconds the agent must be available before sending a notification
  • Min schedule ratio (%) :
    • This alarm allows to detect when an agent has not enough time to execute all its monitors
    • The server computes the ratio between executed monitors and rescheduled ones and compare it to the threshold
    • A ratio of 100% is to be expected on well configured agents
  • Min successful exec. ratio (%) :
    • This alarm allows to detect when an agent returns a lot of execution errors for its monitors
    • The server will compute the ratio between successful executions and failed ones
    • To have some monitor failing from time to time is normal, but a lot of failures might indicate a problem in the agent (resources/network)
  • Max result send time (sec) :
    • This alarm allows to detect when sending the results from the agent to the primary server is taking too long time
    • This can be caused by network problems, or resource problem on agent of primary server.
    • A notification will be sent if the send time is over threshold.
  • Max time without results (sec) :
    • This alarm allows to detect when an agent is not sending any results to the server
    • This can indicate a resource problem on the agent
    • A notification will be sent if the time since last received result is over threshold
  • Max VM Heap usage (%) :
    • This alarm allows to detect when an agent is using all its allocated memory
    • If the agent memory usage reaches 100%, this may indicate memory starvation and instability
    • A notification will be sent if VM memory usage reaches threshold
  • Max OS RAM usage (%) :
    • This alarm allows to detect when the overall OS memory usage is too high
    • High OS memory usage may prevent the server to use its allocated memory, and also use paging which will decrease performances.
    • A notification will be sent if OS memory usage is over threshold
  • Max OS disk usage (%) :
    • This alarm allows to detect when the application disk space is running low
    • Disk full situation must absolutely be avoided, it may bring the service down.
    • A notification will be sent if the disk used space is over threshold


Plugins

  • Max plugin down time (sec) :
    • Allows to detect when a plugin is failing to send events.
    • This is usually a critical case, because it means that monitoring might not be visible in the corresponding third party platform
    • A notification will be sent if the plugin error last for more than threshold.


Licenses

  • Max expiration delay (days) :
    • Allows to be notified when a license is going to expire
  • Invalid license severity :
    • Allows to be notified when a license is not valid


Internal alarms settings

  • Clear alarms :
    • If set, all clearable alarms will be cleared (by using an alarm with toClear paramter set to true.) once the problem is not detected anymore.


Metrics sources

  • Alarm source : SID, HOST, FQND, TITLE, INSTANCE, IP
  • Metric source : SID, HOST, FQND, TITLE, INSTANCE, IP

/home/clients/8c48b436badcd3a0bdaaba8c59a54bf1/wiki-web/data/pages/products/promonitor/6.8/userguide/administration/adminconfig/alarmsandmetrics.txt · Last modified: 2024/05/01 18:35 (external edit)