Monitoring

The Monitoring part of webMethods Cloud Container enables you to monitor the health and availability of the solutions and run-time instances, alerts and alert statuses. You receive an email whenever there is a condition that might affect the solution.

Monitoring Solutions

The monitoring of a new solution starts automatically 10 minutes after the creation of the solution. The data of the solution is collected and analyzed every 60 seconds.

You can access the monitoring pages from the left-side navigation menu of the Monitoring main page.

You can filter the information on most Monitoring pages based on time. To specify the time-range, select a value in the time-range selector.

The following table describes the options in the time-range selector.

Option Description
1h Displays the information for the last 1 hour.
6h Default. Displays the information for the last 6 hours.
12h Displays the information for the last 12 hours.
24h Displays the information for the last 1 day.
2d Displays the information for the last 2 days.
1w Displays the information for the last 1 weeks.
2w Displays the information for the last 2 weeks.
4w Displays the information for the last 4 weeks.

To navigate to the Monitoring main page, log in to webMethods Cloud Container, and select Monitoring in the webMethods Cloud Container navigation bar.

Dashboard

On the Dashboard page, you can view:

Solutions

On the Solutions page, you can check the health of the run-time instances from all the solutions. For each run-time instance, you can view the current data, and the data for the last 24 hours.

The health metrics are grouped into three categories:

Runtimes

On the Runtimes page, you can view the graphs for monitored KPIs for the selected run-time instances from all the solutions.

The example image shows the graph for the Used Memory KPI. The horizontal lines below the graph represent the severity and duration of the alerts that were raised for the KPI. The information alerts are displayed in blue, the warning alerts are in orange, and the critical alerts are in red.

The following table describes the meaning of the alert lines from the example graph for the Used Memory KPI.

Time Period Details
1 Until 2:05 h, there had been an open information alert.
2 At 2:05 h, the severity of the information alert was changed to warning.
3 An information alert existed during that period.
4 A warning alert existed during that period.

You can change the value in the Solutions drop-down field to load the information about the run-time instances from a specific solution.

You can use the INTEGRATION SERVER, UNIVERSAL MESSAGING, and BIGMEMORY MAX tabs to view the information related to the selected solution and runtime. The different colours in the KPIs depict different cluster instances in the Monitoring screen.

By default, the page displays information for the last 24 hours. To view the information for a different time period, use the time-range selector. For more information about the time-range selector, see Monitoring Solutions.

The following table describes the monitored Integration Server KPIs.

Name Description
Used Memory The total used memory for the Java VM.
Service Threads The number of active service threads.
Sessions The number of active licensed sessions.
Stateful Sessions The number of the current stateful HTTP sessions.

The following table describes the monitored Universal Messaging KPIs.

Name Description
Free Memory The amount of free memory that the Realm Server has within the Java VM. This indicates the difference between what the Java VM has currently allocated and what the Realm Server has used.
Published Events Total number of events published on this realm from the time it started.
Subscribed Events Total number of events that this realm has sent to clients from the time it started.

The following table describes the monitored Terracotta KPIs.

Name Description
Off-Heap Used Memory Shows the amount of off-heap memory that is currently used.
Live Objects Shows the total number of live objects in the cluster, mirror group, server, or clients. If the trend for the total number of live objects goes up continuously, clients in the cluster will eventually run out of memory and applications might fail. Upward trends indicate a problem with application logic, garbage collection, or the tuning of one or more clients.

Viewing Adapter KPIs

On the Runtimes page, you can view the KPIs for the adapters that are installed on the Integration Server instances.

  1. Navigate to the Runtimes page.

  2. Select a solution.

  3. On the INTEGRATION SERVER tab, select an Integration Server instance.

  4. Click Connectivity KPIs.

  5. On the ADAPTERS tab, select an Adapter. The Adapter KPIs are displayed.

The following table describes the monitored Adapter KPIs.

Name Description
Connections The number of connection pools in the adapter and how many of them are currently enabled.
Notifications The number of adapter notifications (polling notifications) and how many of them are currently enabled.

Note: You can view Adapter KPIs only for the current time.

Viewing Connector KPIs

On the Runtimes page, you can view the KPIs for the connectors that are installed on the Integration Server instances.

  1. Navigate to the Runtimes page.

  2. Select a solution.

  3. On the INTEGRATION SERVER tab, select an Integration Server instance.

  4. Click Connectivity KPIs.

  5. Click the CONNECTORS tab.

  6. Select a provider.

  7. Select a connector. The Connector KPIs are displayed.

The following table describes the monitored Connector KPIs.

Name Description
Connections The number of connection pools in the connector and how many of them are currently enabled.
Listeners The number of connector listeners and how many of them are currently enabled.

Note: You can view Connector KPIs only for the current time.

Services

On the Services page, you can view the number of successful and failed service executions of the Integration Server instances from the solutions.

The Services page consists of the Service Executions pane and the History pane.

Pane Description
Service Executions Shows the following information about the service executions of the Integration Server instances for the selected time range:
  • Total number of service executions
  • The number of successful service executions
  • The number of failed service executions
  • The successful service execution, as a percentage value calculated by the formula (Number of successful service executions / total number of service executions) * 100
History Shows a chart with the history of successful (green) and failed (red) service executions. Hovering over the green and red bars displays the total number of successful and failed service executions, correspondingly.

The numbers of service executions on the Services page includes the public and internal services of the Integration Server instance and their child services.

You can change the value in the Solutions drop-down field to view the information about a specific solution, or the information for all solutions.

By default, the page displays information for the last 24 hours. To view the information for a different time period, use the time-range selector. For more information about the time-range selector, see Monitoring Solutions.

Uptime

On the Uptime page, you can view time lines that represent the availability of all run-time instances of the solutions.

The color of the time lines changes based on the status of the run-time instances.

The following table describes the meaning of the different colors.

Time line color Indicates that
green the run-time instance was available during the indicated time period.
red the run-time instance was unavailable during the indicated time period.
grey the run-time instance did not exist during the indicated time period.
blue at least one node from the cluster is unavailable.
yellow a solution update is in progress (the solution is under maintenance).

Note the following points:

By default, the time line displays the availability of the instances during the last 24 hours. To view the information for a different time period, use the time-range selector. For more information about the time-range selector, see Monitoring Solutions.

Alerts

The alert is a notification that a rule is violated.

On the Alerts page you can:

By default, the alerts page displays the number of alerts (critical, warning, and information) for all the solutions, and detailed information about the alerts in a tabular format.

Note: The alert does not appear immediately when the corresponding rule violation occurs. The system will wait for a certain duration between the violation of the rule and the firing of the alert. In this time period, the system will check whether the alert continues to be active during each evaluation for that time period before firing the alert.
The evaluation time for different alerts is as follows:

The system will not send an alert if the rule violation condition is resolved during the corresponding evaluation period. For more information about the interval, see Configuring the Alerts.

If you deactivate a solution, the Alerts page will not display the alerts for the solution.

If you activate a solution, the Alerts page will display both the historical alerts for the solutions that had been raised before the deactivation of the solution, and the alerts that were raised after the activation of the solution.

When a solution update starts, the existing active alerts for the solution are set to resolved. During the update period, no alerts are generated for the solution. You can disregard any email alerts that you receive during the upgrade period.

The following table describes the information that is displayed in the table on the Alerts page.

Column Description
Solution Name of the solution.
Runtime Run-time type.
  • Integration Server
  • Universal Messaging
  • Terracotta
  • Instance Name of the run-time instance.
    Start Date Date and time when the alert was raised.
    Resolved On Date and time when the alert was resolved. The field is empty if the alert is still active.
    Message Description of the alert.
    Status Status of the alert.
    • The alert is inactive.
    • The alert is active.

    Note: The Alerts page might not display the alerts for all nodes from a cluster. For example, if you monitor an Integration Server cluster with two Integration Server instances, and both instances have alerts for the same property with different severity, the Alerts page will show the alert of lower severity only, as explained in the following table.

    Integration Server instance Alert type Visibility on the Alerts page
    Integration Server instance 1 Information. Free memory is low. Yes
    Integration Server instance 2 Warning. Free memory is low. No

    You can view all alerts for all the nodes from the cluster in the email alerts.

    By default, the page displays information for the last 1 hour. To view the information for a different time period, use the time-range selector. For more information about the time-range selector, see Monitoring Solutions.

    Alert Types

    The following table provides more information about the alert types.

    Note: Warning alerts and information alerts are not available for KPIs that monitor the availability of a run-time instance.

    Alert Severity Description Color Coding
    Critical A condition exists that is critical for the system performance. red
    Warning A condition exists that might deteriorate the system performance. orange
    Information A condition exists that might evolve into a warning or critical alert. blue

    Configuring the Alerts

    You can change the default threshold values, recipient emails, alert intervals, and message for the alert severity levels. Threshold values determine when a rule is violated and when the system raises an alert.

    To configure the system alerts

    1. Navigate to the Alerts page.

    2. Select the CONFIGURATION tab.
      The Configuration tab shows information about the KPIs that monitor the availability of a runtime instance for all solutions.

    3. Click any KPI in the Name column to configure an alert.
      A form with the configuration details for the alert rule is displayed.

      The following table describes the fields in the form.

      Values Description
      Application

      Integration Server, Universal Messaging or BigMemory Max.
      KPI severity thresholds (%)
      The KPI’s boundary values. When the value of the KPI is outside the range that is specified by these boundary values, the alert is raised.
      You can configure the threshold values of critical, warning, and information alerts by adjusting the ends of the red line, orange line and blue line.

      Note: The Threshold field is read-only for KPIs that monitor the availability of a runtime instance.
      Severity alert configurations

      Supports the following types of severity.

      - Critical: A condition exists that is critical for the system performance. Color code is RED.

      - Warning: A condition exists that might deteriorate the system performance. Color code is ORANGE.

      - Information: A condition exists that might evolve into a warning or critical alert. Color code is BLUE.

      Note: Warning alerts and information alerts are not available for KPIs that monitor the availability of a runtime instance.
      Alert recipient(s)

      Configure valid email addresses to send error notification for different types of severity.
      To configure more than one email, use comma-separated values.
      Alert Intervals

      You can use the alert intervals to customize the way in which you want to receive the notification.

      Note: Due to Alert-Manager internal latency or network latency, sometimes the alert notification mail might be delayed a little bit from the alert interval values provided.

      - Group Wait: Specify how long to initially wait to send an alert for a group having the same labels. It is recommended to use the group wait duration as 10 mins.

      - Group Interval: Specify how long to wait before sending a new alert that is added to a group. It is recommended to use the group interval duration as 10 mins.

      - Repeat Interval: Specify how long to wait to resend a given alert that has already been sent. It is recommended to use the repeat interval duration as 12 hr.
      For better results, make sure the repeat interval and group interval values aren’t very close and ensure that the group interval value is a divisor of the repeat interval value.
      Message

      Customize the messages to the corresponding type of severity.
    4. Click Apply.

    Alert Actions

    You can take actions and resolve the problems with the solutions that caused the alerts. The following table relates the alert with the probable cause of the problem, and the recommended actions that you can take to resolve the problem.

    Alert Name Probable Cause Action to Resolve the Alert
    ISFreeMemoryLow The memory usage is reaching the configured thresholds.
    If the memory usage is continuously reaching 95% and above, and you do not observe any flaw in your application, then probably there is another memory-intensive application.
    Allocate more memory to the solution.
    ISRuntimeSessionUsageHigh The solution uses too many sessions and there might not be free sessions for new requests. Try one of the following:
    - Stop some unnecessary services, if any
    - Increase the maximum number of active licensed sessions
    - Move some of the workload to another solution
    ISRuntimeStatefulSessionUsageHigh The number of the current stateful HTTP sessions is high. There might not be enough bandwidth for new sessions. Move some of the workload to another solution.
    ISRuntimeUnavailable Integration Server is down. Try one of the following:
    - If the Integration Server went down because of a high workload, create an Integration Server cluster.
    - If the Integration Server went down because of insufficient memory and you also get a memory alert, allocate more memory for Integration Server.
    TCOffHeapMemoryLow The heap-off memory has reached the threshold because of too much stored data. Stop adding data or delete some data from the heap-off memory.
    TCRuntimeUnavailable The Terracotta server went down, or there was a human mistake (for example, somebody shut down the Terracotta server). Restart the Terracotta server. For greater safety and security, start with the server that was shut down last.
    UMRuntimeUnavailable The Universal Messaging server is down. Restart the Universal Messaging server. If the problem persists, contact the Software AG Global Support.
    UMFreeMemoryLow The memory usage is reaching the configured thresholds.
    If the memory usage is continuously reaching 95% and above, and you do not observe any flaw in your application, then probably there is another memory-intensive application.
    Increase the memory for Universal Messaging.

    Logs

    You can access the Logs page from the left-side navigation menu of the Monitoring main page.

    For each of the logs, you can apply the following filters:

    Viewing and Downloading the logs for a specific run-time instance in a solution

    Here is an example on how to view and download the logs for a specific run-time instance in a solution for Integration Server.

    1. Select a time period from the time-range selector.

      By default, the page displays the logs for the last 1 hour. To view the logs for a different time period, use the time-range selector. For more information about the time-range selector, see Monitoring Solutions.

    2. Select a solution from the top dorop-down list box.
      The list box displays all the active solution available in the landscape model. The active solution that we selected has two products Integration Server and Universal Messaging.

    3. Select Integration Server.
      You will see all the available instances for this product in the drop-down list.

    4. Select a run-time instance and click .

      You can view logs details, such as the log file name, the date when the log file was created, and the size of the log file. webMethods Cloud Container downloads one file at a time. Multi-select option is not enabled for downloading the log files.

      You can view specific log lists by using filters. The filters are provided on the top levels of the log results based on the folders available in the product. By deselecting any filter, you can remove the log results from the list.

      Note: By default, the retention period for old logs is 30 days or four weeks (4W), which means that you can view and download only the 30 days log file details.

    Custom Logs

    The custom log provides important information that you need to monitor and correct problems that occurred during the service execution. webMethods Cloud Container offers the capability to download these custom logs through Monitor > Logs page.

    In this case, the customer creates a custom log file for each service. The consequent execution of the same service will append custom log entries to the same log file. Customers will also configure rotating the custom log file in a periodic manner, either by day or by size. Upon rotation, the rolled-over logs are renamed and made available through the Monitor > Logs page.

    Before You Begin

    1. Make sure that the custom log file resides in the following directory:
      /opt/softwareag/IntegrationServer/instances/default/logs/custom_pkg_logs/.log
      Note: You cannot change the location of the log file.

    2. Make sure there is file access permission to modify the parameters set for directories in the fileAccessControl.cnf configuration file.

    3. If you have any older versions of Log4j installed, you must update to log4j 2.17.1 or a higher version.

    4. As a best practice, the logs must be rotated periodically based on the size or on the date.
      Note: It is recommended to rotate the custom log when the custom log file size reaches 50 MB size or at midnight, whichever occurs first

    How to Identify Custom Log Entries?

    Upon rotation, you can view the custom log file through Monitoring > Logs page of webMethods Cloud Container. Use the following format to identify the custom log entries:

    <file-name>.log..yyyymmdd.time_stamp.zip

    Downloading the Custom Log

    To download the custom log details, click    under Actions.