Disaster Recovery Plan for ArcSight ESM 6.5c Backup and Restore
- Pavan Raja
- Apr 8, 2025
- 13 min read
Summary:
Based on the provided text, here is a comparison of memory usage between different Data Management (DM) tools or systems typically used in enterprise environments:
1. **Memory Usage Overview**: - The text does not explicitly provide detailed memory usage statistics for different DMs but suggests best practices and potential issues related to performance and resource management. 2. **Best Practices**: - **Disable Unused Data Monitors**: This is a general practice recommended across various systems, regardless of the specific DM tool or system being used. It helps in conserving memory resources by not keeping unused processes running. - **Monitor Log Files**: Monitoring log files for performance indicators and potential bottlenecks is crucial for any system that handles large volumes of data. This includes reviewing logs for messages related to event persistence and database interactions, which can be indicative of resource usage. - **Consider Restarting the Manager**: In cases where ongoing memory consumption seems uncontrollable or performance degrades significantly, a restart might temporarily alleviate issues by resetting internal states and freeing up resources.
3. **Performance Issues and Root Causes**: - The text discusses specific issues like MySQL connection problems, poor disk speed affecting event flow, and bottlenecks in event processing due to large events or misconfigured parameters. These are common challenges across different DM tools that interface with databases and handle real-time data.
4. **Workarounds and Solutions**: - **Disable yum repos and rebuild the system** for MySQL connection issues. - Upgrade hardware if disk speed is a bottleneck. - Adjust query parameters or event batch sizes to reduce memory usage in processing large datasets. - Configure database settings like `/opt/arcsight/logger/data/mysql/my.cnf` to optimize resource management.
5. **Memory Usage Comparison**: - Comparing different DM tools based on the provided text, they all recommend disabling unused processes and monitoring memory usage to prevent excessive consumption. Specific tools might have proprietary solutions or optimizations for MySQL configurations, but general principles of efficient resource use are consistent across platforms.
6. **Configuration Recommendations**: - For instance, tuning MySQL configuration files like `my.cnf` can significantly impact how the system handles temporary sorting and locking issues, which directly affects memory usage in a DM context.
In conclusion, while the text does not provide direct comparisons of memory usage between different DM tools, it offers guidance on best practices for managing resource consumption that are applicable to any data management solution. The principles of disabling unused processes, monitoring performance metrics, and implementing hardware-specific optimizations can be adapted across various platforms to ensure efficient use of system resources.
Details:
This document outlines the procedures for disaster recovery and monitoring of ArcSight Enterprise Security Manager (ESM) version 6.5c using HP tools. Key points include:
1. **Disaster Recovery**: Instructions are provided on how to backup using ESM 6.5c, including details on how to perform a log file review and validate resources with the command `./arcsight resvalidate`. The output should be reviewed in the HTML report located at `/opt/arcsight/manager/validationReport.html`.
2. **Monitoring**: This section covers monitoring techniques such as generating thread dumps (two methods), checking session wait times, and reviewing log files which can be found at specific paths:
Thread Dumps: Way 1 is executed via the command `/opt/arcsight/manager/bin/arcsight managerthreaddump`, and Way 2 involves accessing the URL `https://hpesm65c:8443/arcsight/web/index.jsp` to request a thread dump from NGServer. Check the `server.std.log` for output.
Session Wait Times: Use the command `/opt/arcsight/manager/bin/arcsight arcdt session-waits –sp spool` to retrieve current running JDBC sessions and their wait times.
3. **Log Files**: Detailed paths to log files are provided, including those under `/opt/arcsight/manager/logs/default/*.log*`, `/opt/arcsight/logger/current/arcsight/logger/logs/*.log*`, `/opt/arcsight/logger/data/mysql/*.log*`, and `/opt/arcsight/logger/data/pgsql/serverlog*`.
4. **Storage Groups and Archives**: This section is not fully transcribed in the provided text but likely covers management of storage groups and archival procedures for log files and data within the ArcSight system.
Overall, this document serves as a comprehensive guide for maintaining and monitoring an ArcSight ESM 6.5c installation to ensure operational resilience and efficient logging practices.
This document outlines the initial storage configuration and retention policies for a system, including event storage management within HP Restricted systems. Key points include:
1. **System Storage Size**: The total storage capacity includes both trend and list data. Event storage has a maximum limit of 8TB.
2. **Retention Periods**: Events are retained based on a specific retention period until the space or time limits are reached, at which point they are deleted or archived.
3. **Storage Layout**: Shows how events are organized over time with free spaces indicating available storage for new data.
4. **Archival Process**: Daily events are moved from event storage to archive storage once retention periods expire, and remain listed in Archive Jobs until their specific retention period is over before being deactivated and moved to the Archives page.
5. **ESM 6.5C Command Center**: This section provides management tools for creating, editing, and controlling the location of both event and archive storage groups, as well as segmenting data within these groups based on retention policies specific to each storage group.
6. **Archives Page**: The final resting place for deactivated events after their archival period has expired, providing a segmented view of stored information according to the system's mapping rules.
7. **Recovery Plan (PLAN 'B')**: Outlines strategies and procedures for recovering from potential data loss scenarios, utilizing the system's capabilities as outlined in previous sections.
The document outlines procedures for maintaining an online event archive in HP software, specifically focusing on disaster recovery and config backups. Key points include:
1. **Event Storage Size**: Specifies retention periods for archived events, which is crucial for data management and compliance.
2. **Retention Period (Days)**: Defines how long the stored information will be retained before being purged or replaced with new data. Here, it's set at 14 days.
3. **Disaster Recovery Summary**: Outlines procedures for restoring configuration settings from an archive in case of a disaster:
The `import_system_tables` command is used to restore specific configurations.
Existing data is deleted when the restored configuration replaces the current one.
It's important to re-run trends after importing as trend data are not automatically regenerated from the archive.
4. **Config Backup**: Explains how to manually back up essential configuration information using the `configbackup` command:
This command exports certain critical configuration details such as search settings and archive configurations into a file named `configs.tar.gz`.
Users are advised to familiarize themselves with specific guidelines before executing this backup, especially regarding database table exports like rules, reports, and dashboards.
5. **Backup Location**: The exported config files should be copied to long-term storage for preservation. This ensures that backups can be accessed and restored in case of data loss or system failure.
Overall, the document provides detailed steps for preserving critical configuration settings and managing event archives through regular backup procedures designed to facilitate quick recovery from potential disasters.
This text is about managing and restoring a system using the HP ArcSight Event Management System (ESM). The steps include preparing for restore, restoring data from a backup, ensuring archive storage is maintained, and performing disaster recovery operations. Key points are:
1. **Prepare for Restore**: Ensure you have no event data in the new installation to prepare it for restoration.
2. **Restore Configuration**: Use the `arcsight configbackup` command to restore the configuration from a backup file. The content should be restored to the same version of ESM or the same operating system used during the original backup.
3. **Ensure Archive Storage**: Maintain archive storage by regularly moving archives to long-term storage and backing up their configurations using specific commands.
4. **Disaster Recovery**: Use `arcsightexport_system_tables` to back up critical system tables and restore them with `arcsightimport_system_tables`.
The text also provides detailed command sequences for these operations, such as stopping logger services, running the configuration backup, and moving the archive configurations to long-term storage.
The provided text describes a process of exporting and then importing MySQL database system tables from an ARCSIGHT installation using specific scripts and credentials. Here's a summary of the steps outlined in the text:
1. **Preparation**: The script is run with parameters specifying the MySQL username, password, database name, and whether to include SLD tables (set to no in this case). It generates a list of system tables for export into a file named `export_system_tables.param`.
2. **Exporting Tables**: Using the specified credentials, the script exports the defined system tables from the MySQL database (`arcsight`) into a SQL dump file located at `/opt/arcsight/manager/tmp/arcsight_dump_system_tables.sql`. This process is logged as successful.
3. **Importing Tables**: The next step involves importing these tables back into the same MySQL database using another script, with specified credentials and the path to the export file. Before starting the import, the system checks if the manager is running and requires it to be shut down for consistency before proceeding. It then imports the SQL file, resets certain sequences (`arc_resource_ref_id`, `arc_case_id`, and `arc_kb_display_id`), and logs this process as successful.
4. **Post-Import Check**: The system checks if the manager is running again after the import to ensure no inconsistencies occur post-import. If it's still running, a sleep period of 30 seconds is enforced before proceeding further.
5. **Completion**: Both export and import processes are marked as successful with logging messages, indicating that the backup and recovery operations have been carried out successfully for the system tables related to ARCSIGHT.
This sequence of commands and checks ensures data integrity and operational consistency in maintaining the ARCSIGHT database setup.
This document outlines the process of archiving and importing content from an ARCSIGHT system using the ArcSight Archive Utility. The utility is configured with specific parameters such as JAVA_HOME, ARCSIGHT_HOME, and utilizes administrative credentials for accessing and manipulating data.
The user interface includes commands like './arcsightarchive -u admin -p password -m exp30 -f /home/arcsight/export.xml -uri"/All Active Channels" ...' to export the current system configurations including channels, field sets, lists, agents, assets, zones, networks, locations, dashboards, data monitors, filters, profiles, reports, rules, and stages into an XML file named '/home/arcsight/export.xml'.
After setting up the environment with JAVA_HOME pointing to /opt/arcsight/manager/jre and ARCSIGHT_HOME set to /opt/arcsight/manager, the ArcSight Archive Utility starts. It initializes configurations from files like config/server.defaults.properties and config/server.properties. The utility then performs an archive operation using default settings on the specified XML file.
For importing the archived content back into the ARCSIGHT system, another command './arcsightarchive -m hpesm65c -u admin -i -f /home/arcsight/export.xml' is used to import the data. This process involves updating trend data, query viewers, and completes with a message indicating Import Complete and total elapsed time for the archive operation.
The document concludes with additional details about the utility version (Version 6.5.0.1459.0) and copyright information.
This text seems to be a log or output from a system command related to ARCSIGHT logger software, possibly during the process of backing up configurations. Here's a summary of what it says:
1. **Command Execution**: A specific command `/opt/arcsight/logger/current/arcsight/logger/bin/arcsightconfigbackup` is executed to backup the configuration files and directories for ARCSIGHT logger software.
2. **Environment Variables**: During the execution of this command, several environment variables are set or confirmed:
`ARCSIGHT_HOME` is `/opt/arcsight/logger/current/arcsight/logger`.
`ARCSIGHT_LOGGER_HOME` and `ARCSIGHT_LOGGER_DATA` both point to the same location as `ARCSIGHT_HOME`.
Other related variables like `CONAPP_HOME`, `TOMCAT_HOME`, and `JAVA_HOME` are also set or confirmed.
3. **Backup Operation**: The command is performing a "tarring up" operation, which means it is archiving the configuration files into a compressed file (`.tar.gz`). This backup is done to `/opt/arcsight/logger/current/arcsight/logger` as specified by `ARCSIGHT_HOME`. 4. **Timestamp**: The operation completes with a timestamp indicating when the backup was finished, which is "
".
This log entry confirms that the system administrator or software is performing routine maintenance by backing up critical configuration files of ARCSIGHT logger to ensure data integrity and for disaster recovery purposes. The file `configs.tar.gz` in `/opt/arcsight/logger/current/backups/` would contain all the configurations backed up during this operation.
This text appears to be a script output from a system running ARCSIGHT, which is likely a software for managing and monitoring systems. The script is attempting to start the disaster recovery process for ARCSIGHT, but it encounters an issue with resolving the `$CURRENT_HOME` variable.
The script lists several paths and configurations being used by the system, such as `ARCSIGHT_HOME`, `CONAPP_HOME`, and `TOMCAT_HOME`. It then proceeds to list a series of commands that are part of the disaster recovery process, including changing directories and running specific scripts. The output also includes warnings about destructive operations and prompts for user confirmation before proceeding with data deletion.
The script is attempting to run a MySQL dump command using `mysqldump-u` (presumably for username) and `` (likely placeholder for the actual password). However, there's a syntax error in the command as written (`$(/)`), which doesn't seem to be complete or correct based on standard usage.
The script ends with a copyright notice and another prompt asking if you are sure about proceeding with the destructive operation.
This text appears to be a document detailing procedures and commands related to system administration and data management for a software product named ArcSight. It provides instructions for tasks such as exporting database tables, importing specific trend data, configuring MySQL databases, connecting to CORR-Engine, running SQL queries, and restarting the manager. The document also includes paths to configuration files and specifies that direct manipulation of MySQL or PostgreSQL databases should be avoided unless authorized by Hewlett-Packard (HP).
1. **Data Export and Import**:
Commands are provided for exporting trend data from ArcSight using `./mysql-u arcsight -p < /tmp/trendsdump.out` and importing it back into a MySQL database. However, specific tables like `arc_event` and `events` are ignored due to their storage location in the logger.
Alternatively, all resources including data from ArcSight databases can be exported using `./mysqldump-u arcsight-p --databases arcsight --ignore-table=arcsight.arc_event --ignore-table=arcsight.events > /tmp/dump.out`.
2. **Configuration Files**:
Lists paths to various configuration files located in different directories on the system, including language and notification configurations, Java security settings, and MySQL database locations.
3. **Database Connections**:
Provides commands for connecting to both MySQL (`./mysql-u arcsight -p`) and PostgreSQL databases (`/opt/arcsight/logger/current/arcsight/bin/psqlrwdbweb`).
4. **Running SQL Queries**:
Instructions on how to create a text file with a query and use the `arcsight arcdt runsql –f ` command to execute it.
5. **Manager Restart**:
Explains that manager restarts can occur for various reasons, emphasizing the need to follow prescribed procedures when restarting the system.
6. **Legal Disclaimer**:
Includes a legal disclaimer stating that unauthorized manipulation of MySQL or PostgreSQL databases may lead to data loss and specifies that changes in configurations or data should be done with caution.
7. **Appendix**:
Indicates the document is part of an appendix, implying it contains additional information not covered elsewhere in the main body.
This text seems to be a technical guide for maintaining and troubleshooting ArcSight system, providing detailed commands and paths necessary for MySQL database administration and query execution.
The common reason for JVM running out of memory is related to handling large amounts of data in applications like JVM-based systems used for running reports or managing data monitors. This can be caused by several factors including poorly written data monitors, a huge active list loaded into memory, too many active channels, or rules with numerous partially matching conditions.
To address this issue:
1. Check all data monitors to ensure they do not have guard overflow issues by looking for related messages in server logs. Specifically, review TopValueCount data monitors that maintain ordered lists of event attribute values and their occurrences. These can require significant memory depending on the attribute used and the number of unique values involved.
2. Implement a threshold check within these data monitors to prevent memory exhaustion. For instance, if configured to show top N items like event IDs or source addresses, exceeding certain limits (e.g., 1000 distinct values) will cause the monitor to flush its internal structures and release memory, ensuring continued operation without running out of memory.
3. Ensure that data monitors are properly configured and written to avoid excessive memory usage or other performance issues.
When dealing with data monitors (DMs) that are causing memory issues or system instability due to excessive event accumulation, consider the following solutions and best practices:
1. **Adjust Event Limits**: Increase the limit on the number of distinct events for one or more DMs by setting it to 2000, but be aware of the trade-off between retaining events and memory usage. This can help prevent memory issues without significantly altering data retention.
2. **Monitor Memory Usage**: Use the manage.jsp page located at https://localhost:8443/arcsight/web/manage.jsp to check the memory utilization for resources like Datamonitor, Activelist, and Active Channels. Ensure that your DM is not consuming an unusually high amount of memory compared to other DMs by looking near the bottom of the manage.jsp page.
3. **Disable Unused Data Monitors**: As a best practice, disable any data monitors that are not currently in use since enabled DMs consume memory resources continuously.
4. **Monitor Log Files**: Keep an eye on the server.std.log file for messages indicating "Persisted xxx events in xxx ms". If this value consistently exceeds 1000 over a prolonged period, take manager thread dumps and DB session outputs at three-minute intervals with nine repetitions taken every 30 seconds.
5. **Consider Restarting the Manager**: If you observe worsening performance and frequent restarts of the manager itself, it may be necessary to restart the management service from a clean state (equivalent to disabling and re-enabling the data monitor) to free up memory and potentially resolve the issue.
By following these guidelines, you can optimize the performance and efficiency of your data monitors, ensuring that they do not consume excessive resources or hinder system stability.
The provided text discusses various technical issues and their root causes, along with potential workarounds for a system experiencing performance problems. Here's a summarized breakdown of the key points mentioned in the text:
1. **Mysql Database Connection Issues:**
Symptom: The manager periodically loses connection to the database.
Root Cause: Libraries on a non-certified operating system are causing MySQL to crash.
CAUTION: Do not enable automatic updates.
Workaround: Disable yum repos and rebuild the system.
2. **Performance Issues Due to Disk Speed:**
Symptom: Event flow slows or stops entirely, with disk speed showing a delay bottleneck.
Root Cause: The CORR-Engine requires disk speeds of at least 15,000 RPM to keep up with write operations.
Workaround: Upgrade the hardware.
3. **Event Processing Bottlenecks:**
Symptom: Event flow slows or stops entirely.
Root Cause: Large event sizes require smaller batches of events to be written to disk, causing delays.
Specific issues mentioned in the text are related to configuration parameters for reducing the number of events per batch and adjusting query timing to limit overlap.
Workaround: Reduce the number of events per batch, which may impact read performance (CAUTION).
4. **Sort Space Limit Exceeded and Lock Table Size Issues:**
Symptom: Trend or report fails to run, with logs indicating "temporary sort space limit exceeded" or "total number of locks exceeds the lock table size."
Root Cause (sort space): The query is producing a large number of distinct rows, exceeding the allocated temporary space.
Root Cause (locks): Buffer size insufficient to handle the result set and the total number of locks exceed the lock table size.
Workaround: Refine the query to reduce results or adjust timing of queries to limit overlap. For specific configurations related to MySQL, refer to configuration files like /opt/arcsight/logger/data/mysql/my.cnf.
In summary, these issues are primarily related to performance bottlenecks in data handling and processing within a system, with suggested fixes ranging from software adjustments (like disabling repositories or refining queries) to hardware upgrades when specific performance thresholds are not met.
The provided text discusses two main issues related to performance and resource usage in systems using MySQL configuration file `my.cnf`: sorting temperature limits and dealing with gaps in time stamps.
1. **Sort Temperature Limits**: When the system encounters excessive grouping of data (more than four levels), it may generate large temporary files or exceed memory allocation for sorting, leading to errors like "temporary sort space limit exceeded". The root cause is the resource-intensive nature of such groupings. A workaround suggested is to minimize the use of Group By by truncating long fields using substring functions and adjusting settings in `my.cnf`.
2. **Handling Gaps in Time Stamps**: Issues arise when time stamps on devices are incorrect or events are processed in batches, leading to slow performance in channels, reports, or trends that depend on the end time of events. The main symptom is increased disk I/O and messages like "throwing out increment" from the correlation engine. Correcting timestamps might be a temporary solution but creates overhead. A better workaround includes using Manager Receipt Time for optimal performance and avoiding batch collection of events.
These issues are crucial to address for efficient operation in data processing systems, highlighting the importance of proper configuration and handling of time-stamped data and grouped information.
