====== Generic event server ====== ===== Purpose ===== This plugin makes possible to collect generated metrics, alerts and monitored landscape structure from a third party software by using a REST API. When the Generic event server plugin is created and active, all generated alarms and metrics, will be stored in separate queues. These queues will be polled via a REST API to collect alarms and metrics by chunks. Once the plugin is active, it is necessary to poll queues very regularly. **Warning:** Queues are not persistent and are stored in memory. They will be lost if the Pro.Monitor server is restarted. ===== Configuration ===== * From the plugin menu of Pro.Monitor, select ''Generic event server'' in the plugin drop down and press ''Add'' * The plugin has the following parameters: ^ Parameter ^ Description ^ Mandatory ^ | Name | Give a name to the plugin | Yes | | Allowed IPs | To restrict the IPs allowed to call the service, Use ''*'' for all | Yes | | Max alarm queue size | The size of the queue holding the alarms | Yes | | Max metric queue size | The size of the queue holding the metrics | Yes | **Note:** To preserve server memory, alarms and metrics queue have a maximum size. If a queue is full, any new element will replace the oldest one. **Estimated memory consumption of alarms and metrics queues: 200 KB for chunk of 1000 alarms/metrics** ===== Authorizations ===== The API can be used only with an authenticated user. He must have the **monitoring** authorization. The API call must always include an ''Authorization'' header with Basic authentication key. ===== Services ===== When this plugin is active, 3 services will be available for collecting monitoring data: * **/monitoring/landscapes** : Collect monitored infrastructure metadata * **/monitoring/metrics** : Collect metrics * **/monitoring/alarms** : Collect alarms ==== Monitored landscapes service ==== This service returns a representation of the monitored landscape in JSON format. It will represent the monitored groups, systems and instances with their relationships and properties. === Description === **URL:** * **GET /monitoring/landscapes** * Parameters: None === Usage === * Monitored landscape architecture is not supposed to change very often. * Therefore it is probably not necessary to poll it more than **once every hour**. === Response === * The service returns a JSON table of monitored **groups**, **systems** and **instances** * Groups will contain systems, which will contain instances __Group structure format:__ ^ Parameter ^ Description ^ Type ^ | name | name of the group | String | | uuid | unique identifier of the group | String | | systems | the systems belonging to the group | Table | __System structure format:__ ^ Parameter ^ Description ^ Type ^ | sid | SID of the system as defined in the configuration | String | | realSid | SID of the system as discovered | String | | type | Type of the system | String | | uuid | unique identifier of the system | String | | description | description of the system | String | | instances | the instances belonging to the system| Table | | properties | properties of the system depending on the context | Table | __Instance structure format:__ ^ Parameter ^ Description ^ Type ^ | name | name of the instance | String | | type | Type the instance | String | | host | hostname | String | ==== Alarms service ==== This service can either return a chunk of generated alarms, or the number of alarms waiting to be collected. To collect alarms will remove them from the queue. === Description === **URL:** * **POST /monitoring/alarms?action=poll** : Removes a chunk of the oldest alarms from the queue and return it. * **GET /monitoring/alarms?action=size** : Returns the current size of the queue. ^ Parameters ^ description ^ type ^ Mandatory ^ Default value ^ | action | Defines the operation performed on the queue **(1)** | poll \ size | Yes | N/A | | maxchunksize | the maximum number of alarms to return in the response | Number | No | 100 | * (1) : * **poll** will remove a chunk from the queue and return it, works only with POST method, as it actually modifies the server state. * **size** will return the current size of the queue. Works only with GET method. === Usage === * It is advised to poll the queue often with small chunks * We recommend a poll period of **60 sec**. * If the amount of returned alarms is equal to the max size of the chunk, it means that more alarms can be fetched and another call can be triggered. === Response === * The service returns a JSON table of **alarms** __Alarm structure format:__ ^ Parameter ^ Description ^ Type ^ Always set ^ | id | The identifier of a unique alarm/problem **(1)** | String | Yes | | module | The monitored module | String | Yes | | metric | The monitored metric | String | No | | source | The source being monitored | String | Yes | | sid | the SID of the system being monitored | String | Yes | | groupName | the name of the group containing the system | String | Yes | | groupUUID | the unique identifier of the group | String | Yes | | connectorId | the id of the connector used to connect to the system | Number | Yes | | message | the alarm message | String | Yes | | severity | the severity of the alarm | String | Yes | | severityId | the id of the severity | Number | Yes | | toClear | Set to true if the alarm must be cleared **(2)** | Boolean | Yes | | clearable | Set to true if the alarm can ever be cleared **(3)** | Boolean | Yes | | instance | The instance for which the alarm occurred, if relevant | String | No | | client | The ABAP client for which the alarm occurred, if relevant | String | No | | user | The user for which the alarm occurred, if relevant | String | No | | component | A component name for which the alarm occurred, if relevant | String | No | | host | The host on which the alarm occurred, if relevant | String | No | * (1): Same problem on same resource gets same id. * (2): If set to true, the severity will represent the last generated one before the alarm was cleared. * (3): Some problems are events that cannot be "undone", so the alert will always stay. **Note:** Undocumented parameters are not to be used. ==== Metric service ==== This service can either return a chunk of generated metrics, or the number of metrics waiting to be collected. To collect metrics will remove them from the queue. === Description === **URL:** * **POST /monitoring/metrics?action=poll** : Removes a chunk of the oldest metrics from the queue and return it. * **GET /monitoring/metrics?action=size** : Returns the current size of the queue. ^ Parameters ^ description ^ type ^ Mandatory ^ Default value ^ | action | Defines the operation performed on the queue **(1)** | poll \ size | Yes | poll | | maxchunksize | the maximum number of alarms to return in the response | Number | No | 100 | * (1) : * **poll** will remove a chunk from the queue and return it, works only with POST method. * **size** will return the current size of the queue. Works only with GET method. === Usage === * It is advised to poll the queue often with small chunks * We recommend a poll period of **60 sec**. * If the amount of returned metrics is equal to the max size of the chunk, it means that more metrics can be fetched and another call can be triggered. === Response === * The service returns a JSON table of **metrics** __Metric structure format:__ ^ Parameter ^ Description ^ Type ^ Always set ^ | module | The monitored module | String | Yes | | metric | The monitored metric | String | No | | source | The source being monitored | String | Yes | | sid | the SID of the system being monitored | String | Yes | | groupName | the name of the group containing the system | String | Yes | | groupUUID | the unique identifier of the group | String | Yes | | connectorId | the id of the connector used to connect to the system | Number | Yes | | value | The value of the metric | Number/Boolean | Yes | | unit | The unit of the metric | String | Yes | | unitShort | The short representation of the unit | String | Yes | | target | The target resource for this metric **(1)** | String | Yes | | hasMax | If true, indicates that the metric cannot exceed ''sampleMax'' value | Boolean | Yes | | sampleMax | The maximum value reachable by the metric **(2)** | Number | No | | instance | The instance for which the metric is generated | String | No | | client | The ABAP client for which the metric is generated | String | No | | user | The user for which the metric is generated | String | No | | component | A component name for which the metric is generated | String | No | | host | The host on which the metric is generated | String | No | * (1): Represents the resource being 'measured', by example ''Disk C:'', ''User X'' * (2): For percent, sampleMax is 100. To use only if **hasMax** is set. **Note:** Undocumented parameters are not to be used. ===== How to use the API ===== ==== API call ==== Once the plugin is configured and active, alerts, metrics and monitored infrastructure will be available through the API. 1. Start by discovering the monitored landscapes 2. Poll regularly alarms and metrics queues. We recommend to poll the queues every minute. 3. Refresh landscape metadata once per hour ==== Alarms and metrics correlation ==== * Generated alarms and metrics will correlate with discovered groups, systems and instances. * For each alarm and metric, three parameters can be used to correlate it to a component of the landscape: * The ''groupUUID'' parameter will match the UUID of a discovered group. * The ''sid'' parameter will match the ''sid'' of a system * The ''instance'' parameter will match the ''name'' of an instance.