====== BTP Application Stats Monitor ====== ===== Overview ===== The **BTP Application Stats Monitor** provides comprehensive monitoring of SAP Business Technology Platform (BTP) Cloud Foundry applications. It tracks application health, resource usage, and instance status in real-time, allowing you to proactively detect and respond to performance issues or failures. ===== Prerequisites ===== * BTP Cloud Foundry API access * Valid BTP credentials with permissions to: * List applications (''/v3/apps'') * Read process statistics (''/v3/processes/{guid}/stats'') * Network connectivity from the monitoring system to BTP API endpoints ===== API Endpoints Used ===== ^ Endpoint ^ Purpose ^ | POST /oauth/token | Authentication (OAuth2 token generation) | | GET /v3/apps | List all applications (wildcard mode) | | GET /v3/processes/{guid}/stats | Retrieve instance statistics for an application | ===== Key Features ===== ==== Metrics Collection ==== The monitor collects the following metrics for each application: * **Running Instances** - Number of healthy, operational instances * **Crashed Instances** - Number of failed instances * **Memory Used (MB)** - Total memory consumption across all running instances * **Disk Used (MB)** - Total disk space consumption across all running instances * **CPU Usage (%)** - Average CPU utilization across all running instances ==== Alarm Capabilities ==== * **Application State Monitoring** - Detect when apps are not in the expected state (e.g., STOPPED, CRASHED) * **Crashed Instance Detection** - Alert when the number of crashed instances exceeds a threshold * **Memory Usage Alarms** - Warning and Critical thresholds for memory consumption * **Disk Usage Alarms** - Warning and Critical thresholds for disk consumption * **CPU Usage Alarms** - Warning and Critical thresholds for CPU utilization ==== Wildcard Support ==== Monitor all applications in your BTP space automatically by using ''*'' as the app name. The monitor will: * Automatically discover all applications in your BTP Cloud Foundry space * Monitor each application individually * Dynamically adapt to new or removed applications ===== How It Works ===== ==== Data Collection Process ==== - **Authentication**: The monitor authenticates to the BTP Cloud Foundry API using OAuth2 with cached token management - **Application Discovery (wildcard mode)**: Retrieves all applications from ''/v3/apps'' endpoint - **Stats Collection**: For each application, calls ''/v3/processes/{guid}/stats'' to gather instance-level statistics - **Data Aggregation**: Combines data from all instances to calculate totals and averages - **Alarm Evaluation**: Compares collected metrics against configured thresholds - **Metrics Storage**: Stores collected metrics for historical tracking and reporting ==== Application State Determination ==== The overall application state is determined as follows: * **RUNNING** - At least one instance is running (even if some are crashed) * **CRASHED** - All instances are crashed * **STOPPED** - No instances are running (and none are crashed) //Note//: If you have 2 instances where 1 is RUNNING and 1 is CRASHED, the application is considered RUNNING because it's still serving traffic. ==== Resource Aggregation ==== * **Memory & Disk**: Summed across all running instances only (crashed instances are excluded) * **CPU**: Averaged across all running instances only * **Quotas**: Summed across all instances (running + crashed) to calculate usage percentages ===== Configuration ===== ==== Connection Settings ==== * Create a Web Service Connector pointing to your BTP Cloud Foundry API endpoint * Example: ''https://api.cf.eu10-123.hana.ondemand.com'' * Configure authentication credentials (user profile with BTP credentials) ==== Monitor Configuration ==== ==== Adding Applications to Monitor ==== There are two ways to add applications to the monitor: === Method 1: Load BTP Apps (Recommended) === * Open the BTP Application Stats monitor configuration * Click the "Load BTP Apps" button in the toolbar * The system will: * Connect to your BTP Cloud Foundry API * Retrieve all available applications * Populate the table with application names * Select which applications to monitor by checking the Active checkbox * Configure alarm thresholds for each application individually Benefits: * No need to manually find application GUIDs * Shows all available applications in your BTP space * Pre-fills application names automatically * Allows selective monitoring of specific applications === Method 2: Wildcard Mode (Monitor All) === * Set App Name to ''*'' * The monitor will automatically: * Discover all applications at runtime * Monitor every application using the same configuration * Adapt to new or removed applications without reconfiguration Benefits: Zero configuration for comprehensive coverage Automatically monitors new applications Single configuration for all apps === Basic Settings === ^ Field ^ Description ^ Default ^ | Active | Enable/disable monitoring for this application | ''true'' | | Schedule | Collection frequency (minutes) | ''5'' | | Timeout | Maximum execution time (seconds) | ''120'' | | App Name | Application name or * for all apps | * | === State Monitoring === ^ Field ^ Description ^ Default ^ | State Alarm | Enable state monitoring | ''true'' | | Expected State | Expected application state | ''RUNNING'' | | State Check Mode | ''EQUALS'' or ''NOT_EQUALS'' | ''NOT_EQUALS'' | | State Severity | Alarm severity (1-5) | ''4'' (Critical) | * Check Mode Examples: * ''NOT_EQUALS'' + ''RUNNING'' → Alarm if app is NOT running * ''EQUALS'' + ''CRASHED'' → Alarm if app IS crashed === Crashed Instances Monitoring === ^ Field ^ Description ^ Default ^ | Crashed Instances Alarm | Enable crashed instance detection | ''true'' | | Crashed Instances Threshold | Maximum acceptable crashed instances | ''0'' | | Crashed Instances Severity | Alarm severity | ''4'' (Critical) | === Memory Monitoring === ^ Field ^ Description ^ Default ^ | Memory Alarm | Enable memory monitoring | ''true'' | | Memory Warning % | Warning threshold | ''80%'' | | Memory Critical % | Critical threshold | ''90%'' | === Disk Monitoring === ^ Field ^ Description ^ Default ^ | Disk Alarm | Enable disk monitoring | ''true'' | | Disk Warning % | Warning threshold | ''80%'' | | Disk Critical % | Critical threshold | ''90%'' | === CPU Monitoring === ^ Field ^ Description ^ Default ^ | CPU Alarm | Enable CPU monitoring | ''true'' | | CPU Warning % | Warning threshold | ''80%'' | | CPU Critical % | Critical threshold | ''90%'' | === General Settings === ^ Field ^ Description ^ Default ^ | Auto Clear | Automatically clear alarms when conditions normalize | ''true'' | | Metric | Enable metrics collection | ''true'' | ===== Usage Examples ===== ==== Example 1: Load and Monitor Specific Applications ==== * Click "Load BTP Apps" * Select only production apps (e.g., prod-api, prod-web, prod-worker) * Configure stricter thresholds for production: Memory Warning: 70% Memory Critical: 85% CPU Warning: 60% CPU Critical: 80% * Leave development apps inactive ==== Example 2: Monitor All Applications with Wildcard ==== App Name: * State Alarm: Enabled Expected State: RUNNING State Check Mode: NOT_EQUALS → Monitors all apps in your BTP space and raises alarms if any app is not running. === Example 3: Mixed Approach === * Use wildcard (*) with default thresholds for general coverage * Add specific critical apps via Load BTP Apps with custom thresholds * Both configurations can coexist in the same monitor ==== Example 4: Monitor Only Application State ==== App Name: background-worker State Alarm: Enabled Memory Alarm: Disabled Disk Alarm: Disabled CPU Alarm: Disabled Crashed Instances Alarm: Disabled → Only monitors if the application is running, ignoring resource usage. ==== Collected Metrics ==== The BTP Application Stats monitor collects 5 key performance metrics for each monitored application. These metrics are stored with dimensional tags for easy filtering and analysis. === Metric Types=== ^ Metric Name ^ Unit ^ Description ^ Example ^ | running_instances | count | Number of healthy, operational instances | 2 | | crashed_instances | count | Number of failed or crashed instances | 0 | | memory_used_mb | MB | Total memory consumption across all running instances | 256.5 | | disk_used_mb | MB | Total disk space consumption across all running instances | 512.3 | | cpu_usage_percent | % | Average CPU utilization across all running instances | 15.67 | === Metric Format === All metrics follow this naming convention: promonitor.btp_app_stats. app_name:;host:; ==== Alarm Examples ==== === State Alarm === BTP App my-app is in state STOPPED (expected state: RUNNING) === Crashed Instances Alarm === BTP App my-app has 2 crashed instance(s) (threshold: 0) == Memory Alarm == BTP App my-app memory usage is 92.5% [CRITICAL] (thresholds: warning=80%, critical=90%) == CPU Alarm == BTP App my-app CPU usage is 85.23% [WARNING] (thresholds: warning=80%, critical=90%) ===== Best Practices ===== * Use "**Load BTP Apps**" to easily discover and select specific applications to monitor * Use **Wildcard Mode** for comprehensive coverage when you want to monitor everything * **Combine both approaches** - Use wildcard for general monitoring and add specific apps with custom thresholds * **Adjust thresholds** based on your application's normal behavior patterns * **Enable Auto Clear** to automatically resolve alarms when conditions improve * **Set appropriate schedules** - 5 minutes is recommended for production monitoring * **Monitor crashed instances** separately from state to distinguish between partial and total failures * **Review metrics regularly** to identify trends and optimize resource allocation ===== Troubleshooting ===== ==== "Load BTP Apps" Button Doesn't Work ==== * Verify BTP credentials are correct in the Web Service Connector * Check network connectivity to BTP API * Ensure the connector is properly configured and saved * Review logs for authentication errors ==== No Data Collected ==== * Verify BTP credentials are correct * Check network connectivity to BTP API * Ensure the application GUID is correct (for specific app monitoring) * Review logs for authentication errors ==== Metrics Show Zero ==== * Application may have no running instances * All instances might be crashed or stopped * Check application status in BTP Cockpit ==== Alarms Not Clearing ==== * Verify Auto Clear is enabled * Check if the condition has actually normalized * Review alarm threshold configurations ==== Related Documentation ==== [[https://v3-apidocs.cloudfoundry.org/version/3.205.0/#get-stats-for-a-process|SAP BTP Cloud Foundry API Documentation - Get stats for a process]]