Beyond the ESM Administrator Guide

Pavan Raja
Apr 8, 2025
13 min read

Summary:

Based on the provided information, it appears that the text is describing a tool named Logfu, which is used for analyzing log files from servers running various types of applications. The primary purpose of Logfu is to help in identifying patterns and issues within these logs, such as memory consumption, event throughput, system shutdowns, and database/disk performance related to events. Here's a summary of the key points: 1. **Logfu Tool**: - Developed by Hewlett-Packard Development Company, L.P. - Used to analyze log files from servers running web servers, databases, etc. - Aims to identify issues like memory consumption, event throughput, system shutdowns, and database/disk performance. 2. **Logfu Features**: - Examines specific log files such as server.log, server.std.log, and server.status.log. - Uses options like -m for "manager" and -noplot for skipping graph plotting. - Output is saved in the directory logs/default/Logfu_. - Provides several data points: - **Famous Last Words**: Explains why certain events or processes failed, providing insights into potential issues or errors. - **Exception Groups**: Helps quickly identify repeating exceptions or error messages within the log files. - **Memory**: Monitors memory consumption and alerts if there is a significant increase that might indicate performance problems or resource constraints. - **Event Insertion**: Checks whether the database or disk has the capacity to handle events effectively, ensuring smooth operation without delays or failures. - When used with connectors, Logfu can plot time per batch to detect network latency issues and provide detailed information through the Advanced Management Interface (manage.jsp). 3. **Advanced Management Interface**: - Accessible via URL https://:8443/arcsight/web/manage.jsp. - Allows users to view interesting Mbeans, such as Agent State, providing real-time status updates and performance metrics of the system. 4. **Tracker, Groups, and Filters**: - Features that allow users to track specific performance metrics, organize data using groups, and refine data using filters. 5. **Mbean: RulesEngine**: - An MBean named "RulesEngine" which handles or interacts with rules-based decision making within the system. 6. **AgentStateTracker**: - A tracker that monitors the state of agents in a distributed computing environment. 7. **Security for the new reality**: - Implies an ongoing effort or consideration regarding security measures needed in response to changing circumstances or advancements in technology and threats. The text ends with the copyright notice from Hewlett-Packard Development Company, suggesting these details pertain to products or services developed by this company, likely related to enterprise management or monitoring solutions. In summary, Logfu is a tool designed for analyzing log files from servers running various types of applications, helping identify issues and providing insights into system performance and security measures needed in the face of changing circumstances.

Details:

This document appears to be a guide or presentation created by Nathan Tisdale, an Advanced Support Engineer at Hewlett-Packard (HP), discussing various aspects of ArcSight Event Management Solution (ESM) and its administration. The primary focus is on troubleshooting perspectives within the ESM environment, including data flow analysis, log analysis techniques, and live monitoring using the Advanced Management Console (AMC). The presentation starts with a brief introduction about Nathan Tisdale's background in ArcSight Technical Support, his role in training new support engineers, handling escalations, prioritizing bugs on behalf of customers, and empowering ArcSight administrators. The agenda includes:

A troubleshooting perspective including data flow analysis between Oracle vs CORR-Engine, basic log analysis using logs, warning messages, and memory information, and advanced log analysis with exceptions, thread dumps, and Logfu.
Live monitoring through the Advanced Management Console (AMC).

It also addresses the intended audience: ArcSight administrators responsible for ensuring continuous event flow within ESM. The presentation is structured to provide insights on how to identify bottlenecks in current management systems, similar to a previous training module called "Gain Rock Star Status: ArcSight ESM Manager Administrator." The data flow section outlines a simple deployment of the ESM with components such as ArcSight, SmartAgent, ArcSight Console, ArcSight Manager, and ArcSight Web. The concept is explained from event insertion to retrieval within the system. This setup helps understand how events are processed in an ESM environment. This document outlines a system for handling and analyzing security events, which involves several stages and components including threads, database interactions, and performance issues related to event data retrieval and insertion. Performance issues may manifest as slow channel loading, failed reports or trends, or issues with connector status and manager logs indicating potential database hangs or rejected threads. The processing stages involve categorizing events through various verifiers, adders, resolvers, initializers, handlers, and forwarders, before being persisted in the database. Symptoms of performance issues include slow data retrieval, delayed event insertion, and filling queues due to inefficiencies in the rules engine or data monitors. The provided text discusses the logging configuration and management for a system, particularly in the context of an Oracle database management tool (Manager Oracle) integrated with HP ArcSight. The log files are located in specific directories as outlined below: 1. **Log File Locations:**

`/logs/default/*.log*`
``:
`SERVER.LOG`
`/admin/arcsight/bdump/ALERT_.LOG`
`SERVER.STD.LOG`
`/network/log/LISTENER.LOG`
`SERVER.STATUS.LOG`
`/network/log/SQLNET.LOG`
`SERVER.REPORT.LOG`
`SERVER.SQL.LOG`
``:
`/opt/arcsight/logger/current/arcsight/logger/logs/*`
`SERVER.LICENSE.LOG`
`/opt/arcsight/logger/data/mysql/*.log*`
`/opt/arcsight/logger/data/pgsql/serverlog*`

2. **Log Rotation:**

Log files are limited in size to 10MB by default.
Automatic log file rotation is enabled, keeping up to 10 rotated files plus the current file.
Configuration settings for extending logging can be modified in `/config/server.properties` using parameters:

```plaintext log.channel.file.property.maxsize=10MB log.channel.file.property.maxbackupindex=10 ``` 3. **Key Manager Logs:**

`SERVER.STD.LOG` and `SERVER.LOG` are crucial for:
Initialization messages
General progress messages
Exceptions with detailed traces
Event batch insert times
Garbage collector information
`SERVER.STATUS.LOG` includes critical warnings, information from Mbeans, and uncaugh exceptions.

This setup ensures that the system maintains an organized approach to logging by specifying clear paths for log files and implementing a rotation policy to prevent excessive file growth. This document outlines the monitoring and logging mechanisms used by a system, primarily focusing on management processes, resource consumption, and data collection for troubleshooting and performance analysis. Key aspects include: 1. **Watchdog Messages**: These are messages that provide status updates about active lists statistics and monitor resource usage. The wrapper manages the lifecycle of manager processes, including rules and data monitoring. 2. **Log Rotation**: Configuration is done via `/config/server.wrapper.conf`, copying settings from `server.defaults.wrapper.conf` (with edits discouraged). 3. **Specific Logs**:

**SERVER.SQL.LOG, SERVER.REPORT.LOG, SERVER.CHANNEL.LOG, SERVER.PULSE.LOG, SERVER.LICENSE.LOG** are detailed and should be enabled for specific purposes such as Oracle DBA tasks or license compliance monitoring.

4. **Manager Logs**: Typically found under `/logs`, including but not limited to:

**SERVER.SQL.LOG, SERVER.REPORT.LOG, SERVER.CHANNEL.LOG, PARTITION*.LOG** for active channel queries and Oracle partition management.
**SERVER.PULSE.LOG** updates every 10 seconds unless the CORR-Engine is present.
**SERVER.LICENSE.LOG** tracks license compliance status.

5. **Data Collection**:

**Thread Dumps** are generated for system tables and performance analysis, with specific conditions (e.g., if the manager is identified as a bottleneck).
**Logs** should be collected from both managers and agents to ensure comprehensive troubleshooting.

6. **Collecting Logs**: Using tools like ArcSight Sendlogs, users can collect logs through a wizard interface for easier management and data collection during performance issues or troubleshooting. This document is part of a larger system monitoring framework designed to provide operational insights into Oracle-based systems and their managers, focusing on efficiency and timely information retrieval for informed decision-making. The provided text appears to be a technical document or guide related to system monitoring, logging, and troubleshooting for an unspecified software product (likely from Hewlett-Packard). Here's a summary of the key points discussed in the text: **Log Collection:**

Various logs are collected including Manager logs, Agent logs, Web logs, Console logs, Oracle Alert log, Thread Dumps, Session Waits, and Output from SQL. These logs can be run manually or via scripts (e.g., using `./arcsight sendlogs`, `./arcsight managerthreaddump`, etc.).

**Service Status:**

The status of services can be checked using the command `/sbin/service arcsight_services status`.

**Generating Reports:**

Thread Dumps and Session Waits can be generated with specific commands (e.g., `./arcsight arcdt session-waits –sp spool` for Session Waits).
HTML reports on thread dumps can also be created using the command `./arcsight threaddumps `.

**Database Logs:**

For Oracle and CORR-E databases, specific logs are collected.

**Data Collection and Storage:**

All logged data is placed in a tarball or zip file for easy upload and sharing.

**Alerting System (Whining Messages):**

The system alerts users via email when there are issues like subsystem failures, database connection problems, event insertion times, SSL certificate expiration, partition manager failures, memory utilization issues, etc. These alerts can also be triggered by specific log files or events recorded in `server.std.log` or `server.log`.

**Memory Utilization:**

The Manager is responsible for allocating memory within the Java heap and handles garbage collection automatically. Memory usage is monitored closely, and logs provide details on used and maximum memory allocated.

This document appears to be a detailed guide aimed at helping users understand how to effectively monitor and troubleshoot issues with the software through various logging methods and alerting systems. The text provides an overview of garbage collection in Java applications, particularly focusing on the differences between minor and major garbage collections (GC), their respective pauses, and how real memory usage is measured through "Full GC" messages. It also discusses the concept of a working set, which represents the actual memory being used by an application without any garbage. The text highlights two main types of garbage collection: minor GC, which focuses on the young generation, and major (full) GC, which includes both young and tenured generations. Minor GCs are quick but may expand to cover the entire heap, while major GCs are more extensive and take longer to complete. The pause times for these collections can vary significantly depending on factors such as hardware capabilities. It emphasizes that real memory usage is captured in "Full GC" messages from a server's log file (like server.std.log). The working set is defined as the amount of memory actively being used by an application without any garbage, and it should ideally be around twice this value. Frequent minor GCs or overly large major GCs can negatively impact performance, leading to OutOfMemoryErrors or excessive pause times. To determine the optimal heap size for a Java application, it is recommended that the heap size be set at least double the working set size of the application. If the heap is too small, there may be frequent full GCs and poor performance; if it's too large, major GCs can take excessively long times to execute, potentially leading to process termination due to the Wrapper timing out. The text suggests adjusting the heap size either through a management console or by running specific setup scripts like 'managersetup', based on these guidelines for optimal performance and memory usage without overloading system resources. This document discusses various issues related to memory management and error handling in a system environment, particularly within the context of an unspecified technology or application framework (referred to as "the application"). The main points covered include: 1. **Memory Management Issues:**

Out of memory errors lead to server restarts.
Recommendations for troubleshooting include checking logs related to memory usage and specific modules like "CapsManager."
Monitoring tools such as Data Monitors, channels, and Active Lists should be used to assess overall memory utilization.
If there is a spike in memory usage during multiple memory-intensive tasks, increasing the heap size can help but does not address the root cause of potential memory leaks.
Memory leak issues are difficult to diagnose due to their gradual increase over time and complexity in tracking them down.
In such situations, contacting support services is advised for further assistance.

2. **Log Analysis:**

The log file "SERVER.STATUS.LOG" provides detailed information about agent statuses including cache size estimations. For example, the archiver and Syslog agents are listed with their respective reported times, counts, and memory usage metrics.

3. **Error Handling:**

Exceptions in the application may be encapsulated within Java constructs, often due to coding errors or transient bugs.
Full stack traces of exceptions should be included in error reports to identify where and how specific errors occur in the code.
Not all exception details are equally significant; some might be misclassified or have minimal impact on functionality. The nature of these exceptions can sometimes be related to content-specific issues.

4. **Caching and Logs:**

The document references a cache size estimation from "SERVER.STATUS.LOG" which includes fields like Estimated Cache Size, Sent To Manager Count, and Failed Connection Attempts.
Additionally, the log file "default.com.ar" (likely another system or application-specific log) is mentioned without detailed content description.

5. **General System Logs:**

The document implies that logs from various sources like SERVER.STATUS.LOG and SERVER.LOG are crucial for understanding the performance and state of a system, particularly in troubleshooting memory issues and application errors.

Overall, this document underscores the importance of detailed logging and monitoring to proactively manage and troubleshoot issues within complex software systems, with specific focus on Java-based applications where traditional buffer overflow-type problems may be mitigated by increasing heap size but do not address deeper issues like memory leaks. This document appears to be a technical log or report related to software development and system management, specifically focusing on performance issues and operational details within a software application named "Event insertions" or "Event insertion flow." Here's a summary of the key points from this text: 1. **Timed Ring Buffer**: The document mentions a "Timed Ring Buffer" which appears to be related to managing data in memory, potentially for quick access and retrieval within an application that handles events. It also discusses discarding "increment X," where X is presumably a variable representing either the amount of data or time. 2. **Database Connectivity**: Detailed logs from SERVER.STD.LOG are provided which discuss issues such as connectivity problems (with references to SUBSYSTEM STATUS CHANGED) and performance metrics like persistence rate, with an ideal target under 100ms for processing events. There are also specific entries showing different JVMs handling event persistence tasks at varying speeds—one completing in less than 10 seconds and another taking over 3698 milliseconds (approximately 3.7 seconds). 3. **Manager Busy**: The document notes that the "Manager stops accepting events" due to exceeding thread limits, which is a common issue encountered when managing concurrent processes or connections within an application. 4. **Thread Dumps and Insertion Issues**: When there are issues with inserting events into the database (Event insertion flow), it suggests that generating a Thread Dump might be necessary for troubleshooting specific performance bottlenecks or errors in the data handling process. The thread types mentioned include Data Listener threads, Normalization threads, rules engine threads, Monitors, and others, all potentially involved in different stages of event processing. 5. **Active Channel**: This phrase suggests a mechanism to manage which events are currently being processed or actively used by the system, possibly indicating that not every incoming event is immediately acted upon but rather there's an order or priority set for handling certain events over others. 6. **Resource Utilization and Retrieval**: The document mentions different resources accessing data from a database to retrieve event information for reporting queries, trend queries, and potentially other analytical purposes. This indicates the system might be using this data not only for immediate operational use but also for long-term analysis and decision making. Overall, the text is focused on providing detailed insights into how an application manages incoming events, handles them through various threads and processes, and interacts with databases to ensure smooth operation under varying conditions such as resource constraints or performance bottlenecks. This text appears to be a technical documentation or summary related to a Java application, possibly part of an IT infrastructure or security system. Here's a summarized and structured version of the provided information: ### Theme: Managing Thread Dumps and Event Processing in a Java Application #### 1. Introduction to Thread Dumps

**Purpose**: Collect stack traces for each thread within the Virtual Machine (VM) to identify performance bottlenecks or issues with concurrent processes.
**Relevant Data**: Typically includes session waits or database sessions to correlate with database activities.

#### 2. Generating Thread Dumps

**Methodology**:
Access via `Manage.jsp` under NGServer or directly invoke `/bin/arcsight managerthreaddump`.

#### 3. Formatting Thread Dumps

**Output**: HTML format, accessible through command `/bin/arcsight threaddumps > threaddumps.html`.

### Part #2: Servlet Engine and Event Processing #### 1. Servlet Engine Components

**SeededJsseListener**: Responsible for reading bytes from network sockets and converting them into Java Objects, specifically a "Security Event Batch".
**Flow Process**: Events are processed in sequential stages (Flows) within the servlet engine:
**Flow 1**: Receives events directly from the network or scanner reports.
**Flow 2**: Pre-persistor stage where events are prepared, normalized, and written to a database for further processing.
**Flow 3**: Post-persistor stage where events undergo evaluation against rules, correlation generation, and preparation for dashboard display.

#### 2. Dashboards

**XCPUDMPC-Thread**: This component handles the final stages of event processing including data monitor evaluations and correlation event generation before preparing for garbage collection.

### Key Takeaways:

The text outlines a structured approach to managing thread dumps and processing events within a Java application, detailing how specific components handle different aspects of data flow and processing in a servlet engine environment. This setup is crucial for maintaining performance and security in IT systems.

Logfu is a tool developed by Hewlett-Packard Development Company, L.P., which is not an officially supported product. It serves to analyze log files from servers running various types of applications, such as web servers and databases. The primary purpose of Logfu is to help in identifying patterns and issues within these logs, including memory consumption, event throughput, system shutdowns, and database/disk performance related to events. The tool examines specific log files like server.log, server.std.log, and server.status.log, using a syntax with options such as -m for "manager" and -noplot for skipping graph plotting. The output is saved in the directory logs/default/Logfu_, and it provides several interesting data points: 1. **Famous Last Words**: This refers to why certain events or processes failed, providing insights into potential issues or errors. 2. **Exception Groups**: It helps quickly identify repeating exceptions or error messages within the log files. 3. **Memory**: Logfu monitors memory consumption and can alert if there is a significant increase that might indicate performance problems or resource constraints. 4. **Event Insertion**: This checks whether the database or disk has the capacity to handle events effectively, ensuring smooth operation without delays or failures. Logfu also extends its functionality when used with connectors by plotting time per batch to detect network latency issues and can provide detailed information through the Advanced Management Interface (a.k.a. manage.jsp), accessible via URL https://:8443/arcsight/web/manage.jsp. This interface allows users to view interesting Mbeans, such as Agent State, providing real-time status updates and performance metrics of the system. This text appears to be related to monitoring and management tools used in a system or software environment, possibly within an IT infrastructure. Here's a breakdown of the terms mentioned: 1. **Tracker** - Refers to a tool that tracks specific performance metrics such as EPS (Earnings Per Share) for connectors, user sessions managed by SessionManager, subsystem status with Whiner notifications, active list monitoring including memory consumption, and channel usage details like how many channels are active or validating SQL queries. 2. **Groups and filters** - This suggests that there is a feature in the system that allows users to organize and refine data using groups and specific filters. 3. **Mbean: RulesEngine** - Indicates an MBean (Management Bean) named "RulesEngine" which likely handles or interacts with rules-based decision making within the system. 4. **AgentStateTracker** - Refers to a tracker that monitors the state of agents, possibly in a distributed computing environment where different parts of the system are managed by separate agents. 5. **Security for the new reality** - This phrase might imply that there is an ongoing effort or consideration regarding security measures needed in response to changing circumstances or advancements in technology and threats. The text ends with the copyright notice from Hewlett-Packard Development Company, suggesting these details pertain to products or services developed by this company, likely related to enterprise management or monitoring solutions.

Disclaimer:

The content in this post is for informational and educational purposes only. It may reference technologies, configurations, or products that are outdated or no longer supported. If there are any comments or feedback, kindly leave a message and will be responded.

Beyond the ESM Administrator Guide

Summary:

Details:

Recent Posts

Comments