HA Options 0.1
- Pavan Raja

- Apr 8, 2025
- 13 min read
Summary:
To provide balanced databases and discuss their applicability, let's consider the following points based on the provided information:
### Balanced Database Approach
#### 1. **Oracle Database with Streams** While Oracle Database with Streams is generally used for more general database resilience, it is not directly applicable to ArcSight ESM (Enterprise Security Manager) deployments. However, for broader database considerations, here’s a balanced approach:
- **High Availability Solutions**: Implementing high availability solutions like Oracle Real Application Clusters (RAC), which can provide automatic failover and minimal downtime. - **Data Guard**: Using Oracle Data Guard for synchronous or asynchronous replication between two sites to ensure data protection against site failures. - **Backup and Recovery**: Regularly scheduled backups coupled with fast recovery options to minimize data loss in case of a disaster. - **Monitoring Tools**: Utilize monitoring tools like Oracle Enterprise Manager (OEM) to monitor the health and performance of the database, enabling proactive maintenance and troubleshooting.
#### 2. **SAN Systems** For storage enhancement:
- **Backup Solutions**: SAN can be used for offsite backups, ensuring data is secure even if the primary site experiences a disaster. - **Performance Enhancements**: High-speed storage can improve database performance by providing faster access to data.
### ArcSight Logger Solution with Oracle Database Correlation
#### 1. **Connector-Based High Availability** - **Replicating Events**: Two logger appliances replicate events via smart connectors, ensuring no event loss and enabling investigation of failures. - **Failover Mechanism**: Automatic redirection of traffic to the secondary appliance in case of primary failure. - **Challenges**: Identifying Connector failures, managing storage due to duplication, and potential data loss during communications interruptions.
#### 2. **External Load Balancing System** - **Traffic Management**: Directing event and log data through a shared IP address, hiding the real Logger address from devices. - **Failover Capabilities**: Automatically redirecting traffic between primary and secondary appliances based on connectivity status. - **Challenges**: Detecting failed communications, potential loss of events during failures, and time-consuming efforts to consolidate data post-failure.
### Centralized Storage Solutions
#### 1. **NAS or SAN for Log Archiving** - **Cost-Effectiveness**: No additional hardware required for backup configurations and data storage. - **Data Backup**: Regular archiving of logs ensures that configuration can be restored in case of a main Logger failure. - **Challenges**: Event data loss during downtime, reliance on manual processes for restoring backups.
#### 2. **SAN Directly Connected to Logger Appliance** - **Shared Configuration**: Both appliances share the same configuration storage, reducing hardware requirements. - **Failover Capabilities**: Automatic takeover of the main Logger role upon failure through IP address restoration. - **Challenges**: Same as above regarding data loss during downtime and reliance on SAN for storage.
### Conclusion Balancing high availability with cost-effectiveness and ease of management is crucial in database and logging solutions. The use of connectors, external load balancers, and centralized storage solutions can provide a balanced approach to ensuring operational performance and availability without relying solely on ArcSight’s built-in high availability features.
Details:
This document outlines various high-availability (HA) options available for ArcSight products, designed to maintain operational continuity in case of system failures or outages. The article categorizes these options into three layers: Integration Layer, Core Engine Layer, and Logger High Availability Options.
**Integration Layer:**
This layer involves log and event collection mechanisms which are crucial for data flow to the ArcSight platform. To ensure high availability at this level, two main methods are suggested:
1. **Pull-based SmartConnector High Availability**: This method pulls logs from a remote source whenever an agent is available. It supports both on-premise and cloud deployments without requiring additional appliances or hardware.
2. **Push-based SmartConnector High Availability**: Here, data flows directly to the ArcSight system via a push mechanism where no agents are needed at the receiving end; it also supports both on-premise and cloud environments.
**Core Engine Layer:**
This layer includes key components such as ArcSight Express, ArcSight ESM (Event Management System), ArcSight ESM Manager, Web interface, and Database. HA options here include:
1. **ArcSight Express**: A lightweight version suitable for small deployments with minimal requirements.
2. **ArcSight ESM**: The full-featured version requiring more resources but offering enhanced capabilities. For high availability:
**ArcSight ESM Availability**: Ensures that the system can be accessed and managed even when some components are unavailable.
**ArcSight ESM Manager**: Manages all modules of ArcSight ESM, providing a central point for monitoring and controlling systems across multiple nodes to ensure minimal downtime.
**ArcSight ESM Web**: Provides access via web interface with failover mechanisms in place.
**ArcSight ESM Database**: Uses shared SAN or database replication to maintain data integrity and accessibility during failures.
**Logger High Availability Options:**
This section covers hardware-based solutions for logger components, including:
1. **Connector-based High Availability**: Enhances the ability of connectors to handle more log events by distributing load between multiple connectors.
2. **External Load Balancer**: Utilizes a third-party device or software that distributes incoming network traffic among multiple servers.
3. **Warm Standby**: Involves maintaining an identical replica of the primary system, which can take over automatically in case of failure.
4. **Shared SAN**: A storage area network where all ArcSight systems share disk arrays to ensure data is accessible across nodes.
**Recommended Solution:**
The document recommends a tailored approach based on specific requirements and constraints:
For small or medium deployments, simpler solutions like connectors with load balancing might suffice.
Larger environments benefit from more robust configurations such as warm standbys, shared SANs, or external hardware for better performance and availability.
This document provides a comprehensive guide to implementing high availability in an ArcSight environment, tailored to various deployment scenarios and sizes of organizations.
The provided text is about the integration layer used in systems like ArcSight for collecting and processing logs and events. This layer involves using various SmartConnectors to handle data collection and processing, which then sends the processed information to storage or correlation tools as needed. To ensure effective operation, several questions should be addressed regarding mechanisms, system availability, and regulations. These include:
1. What are the mechanisms for collecting logs and events? Native communications are often preferred but alternative methods like agents (like Snare) or syslog servers might be used depending on security and architecture considerations.
2. Are the systems generating logs and events highly available? The necessity of high availability (HA) depends on the importance of the logs and events, as well as their source system's criticality.
3. What mechanism is required for collecting logs and events - pull or push? This choice impacts what data can be retrieved and how to manage availability. For example, databases or Windows systems use a 'pull' method where SmartConnectors retrieve the information directly from the source.
4. What impact would there be if certain sources no longer send logs or events? This depends on any regulatory requirements for uninterrupted collection.
5. Are there regulations that stipulate uninterrupted collection of logs and events? If so, compliance with these regulations should guide the decision-making process regarding HA and other operational aspects.
Addressing these questions helps in determining what specific SmartConnectors are needed based on source devices, ensuring high availability where necessary, selecting appropriate data collection mechanisms, and considering regulatory impacts.
This passage discusses the considerations and approaches involved in adding availability (HA) to syslog sources within the ArcSight solution, emphasizing the importance of using reliable protocols such as TCP instead of unreliable ones like UDP. The text highlights several features and mechanisms provided by ArcSight SmartConnectors to ensure uninterrupted log and event collection, including per-destination caching, a reliable protocol with encrypted channels, queue-based processing, and a state-based pull mechanism. It also discusses the potential impact on availability if certain sources no longer send logs or events, suggesting that depending on regulatory compliance requirements, HA may be more critical for specific systems, devices, or sources.
ArcSight SmartConnectors are designed to maintain a full state-based track of logs and events, ensuring that in case of failure or system interruption, processing resumes from where it left off without data loss or duplication. They support multiple platforms including Windows, Linux, and other virtualization platforms, offering flexibility in deployment locations and functionalities.
SmartConnectors can be deployed using either pull-based (active connection to a remote source like Microsoft Windows or databases) or push-based (e.g., syslog) methods. For pull-based systems, clustering is recommended for high availability, with an active/passive configuration where the state information is stored on a shared disk, allowing processing to continue seamlessly from the point of failure. Push-based systems do not require clustering as they operate without needing synchronized queues between nodes. Both types can implement load balancing across multiple nodes for efficient event processing and scalability.
The article discusses implementing network load balancing for ArcSight Logger, Express, and ESM to ensure high availability. Since clustering is not feasible with the current ArcSight Connector Appliance, alternative solutions are recommended such as using a hardware-based network load balancing solution or external options like Piranha, Ultramonkey, or Keepalived on Linux platforms. The article also highlights that there's currently no option for high availability in the SmartConnectors deployed on an ArcSight Connector Appliance and recommends considering external load balancing appliances to address this issue.
The core engine layer, which includes devices like ArcSight Logger, Express, and ESM, is another critical area needing additional availability considerations. While these components are flexible, they do not support clustering or load balancing features specific to the Logger appliance that does not offer HA capabilities out of the box. Recommendations for addressing availability in this part include reading a dedicated section on Logger's high availability options and considering upgrades like Express (to ESM) if higher availability is necessary.
Lastly, it emphasizes the importance of understanding user interaction patterns when assessing system requirements, as well as considering backup and recovery systems to ensure minimal downtime impacts.
A Security Operations Centre (SOC) is a facility where operators and analysts monitor systems 24/7 to investigate threats and attacks in real-time. This can be achieved through visualizing and displaying data, providing a proactive monitoring solution. Some customers prefer an "escalation and prioritization system" that operates without constant human intervention ("lights out"), alerting only when necessary for investigation.
The criticality of the SOC depends on its purpose: a 24/7 operation typically requires high availability (HA), while "lights out" systems can function without it. The impact of a short downtime in a system is minimized by automatic caching and retransmission, with all correlation occurring correctly once the system is restored.
ArcSight Inc., based at 5 Results Way, Cupertino, CA 95014, USA, develops software for SOC management named ArcSight ESM (Enterprise Security Manager). This system uses an Oracle database that can be supported by external SAN storage, enhancing its availability without affecting the main ArcSight system.
Backup and recovery systems are crucial to ensure data integrity; industry standards like Legato or Veritas are often used for this purpose in conjunction with the resilient Oracle database of the ESM system. This setup ensures backup and recovery processes that meet the requirements of SOC operations.
The article discusses the critical components and their availability in the ArcSight Enterprise Security Manager (ESM) system. It identifies three main components: the ESM Manager, the ESM Web, and the ESM Database.
The most crucial component is the ESM Manager, which handles real-time processing, correlation, and alerting for the entire system. Despite its critical role, the ESM Manager can tolerate some downtime due to its integration layer's natural buffer provided by SmartConnectors. However, options like clustering are not recommended as they do not align with the designed fail-over functionality of the ESM Manager. Several methods such as standard off-the-shelf solutions and hardware redundancy can be employed to increase availability.
The ESM Web is a web server component for browser access and its importance depends on user accessibility options. The ESM Database, which runs an Oracle database, has several HA (High Availability) options available, including clustering, depending on the deployment scenario between different servers or storage systems.
In summary, while the ArcSight ESM Manager is a critical component requiring high availability for optimal system performance and response times, other components such as the Web interface and Database can be designed with additional availability solutions in mind, though their effectiveness may vary based on the specific configuration (e.g., whether they are deployed separately or share hardware).
ArcSight Express, which includes the Enterprise Security Manager (ESM) Manager for correlation purposes, does not support High Availability (HA). If HA is required, an upgrade must be purchased to provide HA support. In a fail-over solution, if the primary Manager fails, another Manager will start up and take over its role, ensuring consistency in processing, no missed events, automatic handling of event flow, and stability of the database. However, this approach requires that ESM Manager and Database be on separate servers, and some appliance solutions do not include fail-over software by default.
The provided text discusses the use of multiple web servers and technologies to enhance the availability of a Web interface for an ESM (Event Management System) system. It mentions two main approaches: utilizing DNS resolution to provide load balancing and fail-over between IP addresses or clustering web servers in a network setup.
1. **Multiple Web Servers with Simple Level of Availability**: This method involves assigning multiple IP addresses to the same DNS name, which allows random selection between them for connections. If one server fails, the browser automatically tries another address, ensuring some level of availability. However, if a web server goes down during an active connection, it will not auto-restart sessions upon reattempted login, and new connections may take several seconds to establish.
2. **Clustered Web Servers**: For higher resilience and availability, the system can be set up as a cluster with two or more nodes that communicate with an ESM Manager. This setup uses shared disks to maintain configuration and state information between servers. Connections are managed through a Virtual IP address, which directs traffic to one of the active nodes based on their status. The cluster mode operates in either Active/Active or Active/Passive configurations, but it's noted that session auto-transfer might not occur if nodes fail.
3. **Load Balanced Web Servers**: This approach involves using hardware solutions for load balancing and availability checking, which automatically redirect traffic between working servers. Popular technologies for this include Microsoft Windows Clustering services, Linux Load balancing solutions like Piranha or Ultramonkey, or Keepalived. These methods are more complex but provide robust fault tolerance without significant configuration changes required from the user.
The effectiveness of these approaches is contingent on factors including server hardware, network infrastructure, and specific configurations for authentication and session management. It's also important to test such setups in a live environment before deployment due to potential variations in how different web browsers handle DNS resolution and caching.
ArcSight, Inc., located at 5 Results Way, Cupertino, CA 95014, USA (with phone number (408) 864-2600 and website www.arcsight.com), provides information on enhancing the availability and resiliency of Oracle Enterprise Database for its ArcSight ESM solution. The article outlines several methods to improve database availability:
1. **Oracle Database Standard System**: Includes in-built features like redo logs and automatic archiving, which help in improving availability and resiliency.
2. **Oracle Database with RAC (Real Application Cluster)**: This option involves a cluster of front-end servers cooperating for the Oracle database, enhancing availability under most conditions but potentially affecting performance.
3. **Oracle Database with Data Guard**: Utilizes Oracle Data Guard to replicate databases and maintain standby systems, focusing on fail-over or recovery without direct online resiliency.
4. **Oracle Database with RAC and Data Guard**: Recommended by Oracle for maximum availability, combining both RAC and Data Guard for location-independent, load-balanced databases.
5. **Oracle Database with Streams**: Applies to more general database resilience but is not needed for ArcSight ESM database deployments.
Additionally, SAN (Storage Area Network) systems can be used for storage of the database, assisting in backup and recovery but not directly improving database availability.
ArcSight recommends using standard technologies and tools for enhancing Oracle database availability without implying any direct support from ArcSight for performance issues related to these enhancements.
This text discusses various strategies for improving availability in an ArcSight Logger solution, which is part of the ArcSight Express (ESM) with Oracle Database correlation capabilities. The document highlights that while high availability is not included as a feature, there are alternative methods to ensure operational performance and availability through integration layers such as connectors or external load balancers.
One method discussed is connector-based high availability where two logger appliances replicate events from each other via a smart connector. If one logger fails, the cached events are forwarded to the other logger until communication resumes. This setup provides duplicate copies of all events for investigative purposes and ensures no data loss. However, it requires transparent identification of Connector failures and careful management of storage due to event duplication.
The document also mentions that ArcSight does not support high availability or licensing for this feature on its own, suggesting additional steps such as upgrading the ESM component in case high availability is necessary. Additionally, no third-party Oracle software like RAC (Real Application Clusters) is included with ArcSight's base license and recommends purchasing other Oracle solutions through authorized channels or partners if needed.
An external load balancing system is used in this setup to direct event and log data to a primary Logger appliance through a shared IP address, hiding the real Logger address from devices. Instead of using high availability options with Connectors, the load balancer serves the purpose for network communications between devices and appliances. When the main Logger appliance is operational and has good network communication, events and logs are sent to it; otherwise, traffic is directed to the secondary (DR site) Logger. This system ensures minimal data loss but requires identifying when Connector failures occur, which can lead to some event data being lost or cached on the Connector. Advantages include an easily understood and configured fail-over system with automatic redirection of traffic, allowing for easy search and consolidation of log data from both appliances after a failure. Disadvantages involve difficulties in detecting failed communications from Connectors, potential loss of events during failures, and time-consuming efforts to consolidate data post-failure. A warm standby option involves using a single storage location for archived daily event logs and configuration recovery upon main Logger failure, providing an instant access point to archived logs after configuration recovery with the only loss being events up to the failure date.
The document discusses two methods for logging system data, focusing on a centralized storage solution using either a Network Attached Storage (NAS) or Storage Area Network (SAN).
The first method involves using a NAS or SAN to store archived daily logs. In case of failure in the main Logger appliance, the configuration can be restored to the secondary Logger appliance, which then automatically starts processing incoming events and log data. This setup is cost-effective as no additional hardware is required and ensures that configurations and data are backed up for future use. However, there is a disadvantage where event data may be lost during the period when a Logger Appliance is unavailable, with potential data loss depending on the last archived time.
The second method uses SAN storage directly connected to the Logger appliance, allowing both appliances to share the same configuration and only one being active at any given time. If the main Logger fails, its configuration can be restored to the secondary Logger using the original IP address, enabling it to assume the role of the main logger. This setup also does not require additional hardware for a simple and cost-effective solution but shares some of the same disadvantages as noted above regarding data loss during downtime.
In both scenarios, devices send logs and events in their native formats via Syslog (typically UDP), with all data being stored by the Logger appliances or shared SAN storage.
This text discusses the use of UDP (User Datagram Protocol) in logging systems, specifically focusing on how to handle failures with a main logger appliance. It mentions that if the main appliance fails, it can be easily replaced by a recovery one, and configuration can be restored from SAN (Storage Area Network) storage, making data instantly available. The advantages include simplicity, cost effectiveness without additional hardware, and quick access to backed-up configurations and data. However, there are disadvantages such as needing a manual process for restoring backups and relying on a SAN solution for storage. An alternative method involves using the SAN storage option for event data, ensuring high availability of data stored in the SAN which is instantly available for searching and reporting. The text also touches upon the importance of device availability but does not address this issue directly.

Comments