====== Schedules ======= ===== Purpose ===== Schedules are used to generate reports and most will run recurrently at a scheduled time. It is important to detect failures, which means that an important report might not be generated. Report running too often or constantly failing might also impact the performance of the system. This monitor will : * Check schedule recurrence and warn if running too often * Check global number of schedule execution status * Check individual schedule execution status ===== Configuration hints ===== Use the "Load schedules" button to load the scheduled reports from your system. You can filter a specific schedule name selected from the list, or set * to cover all. Regex works too ! Select the type of STATUS you want to monitor and set the threshold. **Aggregates:** * If active, the total number of schedule executions will be compared to the threshold. * If not, the executions of each individual schedule will be used instead. The aggregate setting is useful to define rather a global monitoring for all schedules, or a specific monitoring for each schedule. ===== Atomic fields ===== ^Minimum recurrence accepted (min)|The lowest recurrence a user can set| ^Minimum recurrence severity|The severity of the alarm generated if a user schedule a job to run too often| ===== Surveillance table ===== ^Parameter^Description^ ^Active|If checked, the rule will be active.| ^Schedule|The schedule(s) concerned by the rule. Use the character '*' to configure a rule for all the schedules| ^Status|Define the schedule STATUS to look for (FAILED, PENDING, etc...)| ^Threshold|Define the maximum number of schedules having the selected status. You can use the multi-threshold syntax (Ex: G2W:1 W2M:10)| ^Aggregates|If active, an alarm will be sent if the total number of schedule executions is over threshold. If not, an alarm will be sent for each schedule having a number of executions over the threshold.| ^Severity|The severity of the alarm generated if a failure is detected| ^Auto clear|If checked, the alarm will be cleared as soon as the alarm condition is not met anymore.| ^Alarm tag|This field allows to add custom text within the alarm message. %MSG% variable will contain the actual generated message and can be used such as: "my_prefix %MSG% my_suffix". By default, tag will be used as prefix.| ^Alarm|If checked, this line of surveillance will be used for alarm generation.| ^Metric|If checked, this line of surveillance will be used for metric generation.| ===== Examples ===== ^Active^Schedule^Status^Threshold^Aggregates^Severity^Auto clear^Alarm tag^Alarm^Metric^ |true|*|FAILURE|10|true|MAJOR|true| |true|true| **Effect** : An MAJOR alarm will be sent if 10 or more schedules are in a FAILURE state. A metric will be sent stating the total number of schedules having a FAILURE status ^Active^Schedule^Status^Threshold^Aggregates^Severity^Auto clear^Alarm tag^Alarm^Metric^ |true|Report_ABC|FAILURE|G2W:1 W2M:5|false|CRITICAL|true| |true|true| **Effect** : Monitors the specific schedule: Report_ABC. Sends a WARNING alert if 1 or more FAILURE states are detected. Sends a MAJOR if 5 or more. A metric will be sent, stating the number of occurence of the FAILURE status for this specific report. ===== Generated metrics ====== ^metricId^metricUnit^metricTarget^metricDescription^ |BO_SCHEDULES|Status|[scheduleName]|A metric with the state of each scheduled execution|