Pro.Monitor V6.7
Trouble shooting
Monitors Guide
Trouble shooting
Monitors Guide
System replication in HANA is a key component to be resilient against hardware failures or data corruption. This monitor will check that the replication system is in the expected state, during its nominal and transition phases (init,sync). It will also check for reconnects and fail overs and notifies as soon as those situations are detected.
To start monitoring replication, create a new rule in the surveillance table. You can filter on the host and port number of the instance to customize the monitoring for specific nodes.
Expected mode: Define the expected mode of the replication, sends alarm if different
Error severity: Check for the replication status as visible in the system and send an alarm if in error
Fully recoverable: Check for the fully recoverable status as visible in the system and send an alarm if not as expected
Max init/sync/unknown time Check for the time spent in the corresponding state and send an alarm if the replication systems stays in that state for too long.
Reconnect/Failover count Check for reconnect or fail over situations. If the delta between now and last check is over threshold, send an alarm.
Parameter | Description |
---|---|
Active | To enable/disable a rule. |
Host | A filter on the host name of the instance |
Port | A filter on the port number of the instance |
Expected mode | Define the expected mode of the replication |
Error Severity | If not DISABLED, send an alarm if the replication status is in error. |
Fully recoverable Severity | if not DISABLED, send an alarm if the fully recoverable state is not true. |
Max init time | The maximum time spent in init state |
Max sync time | The maximum time spent in sync state |
Max UNKNOWN time | The maximum time spent in unknown state |
Max reconnect count | The maximum reconnects occurred since last check |
Max failover count | The maximum failovers occurred since last check |
Auto clear | If checked, the alarm will be cleared as soon as the alarm condition is not met anymore. |
Alarm tag | This field allows to add custom text within the alarm message. %MSG% variable will containthe actual generated message and can be used such as: “my_prefix %MSG% my_suffix”. By default, tagwill be used as prefix. |
enable Alarm | To enable alarm sending |
enable QOS | To enable metric sending |
metricId | metricUnit | metricTarget | metricDescription |
---|---|---|---|
REPLICATION_STATUS | Boolean | SITE/HOST/PORT | Send TRUE if the replication status is OK, FALSE instead |
REPLICATION_IS_SECONDARY_ACTIVE | Boolean | SITE/HOST/PORT | |
REPLICATION_IS_SECONDARY_FULLY_RECOVERABLE | Boolean | SITE/HOST/PORT | Send TRUE if the fully recoverable status is TRUE, FALSE instead |
REPLICATION_RECONNECT_COUNT | Reconnects | SITE/HOST/PORT | Sends the number of reconnects since last check |
REPLICATION_FAILOVER_COUNT | Failovers | SITE/HOST/PORT | Sends the number of failovers since last check |