High Availability Infrastructure Development in 2011

Pavan Raja
Apr 8, 2025
11 min read

Summary:

The passage discusses a comprehensive approach to managing ARST (Advanced Recovery Standard Toolkit) integration alongside Oracle deployments, emphasizing failover handling and registry management during reinstallation efforts. The main focus is on ensuring that the Oracle database is properly installed and configured on both Windows cluster hosts before proceeding with the Manager installation. This setup involves storing all Oracle data files, redo logs, undo logs, temporary files, system/event index/data on shared SAN drives. The optimal approach outlined in the passage involves installing Oracle databases on both hosts prior to installing the Manager. Once the Oracle installations are complete, proceed with setting up and configuring the database cluster using Windows Cluster Administrator GUI. The steps include creating a TNS resource that does not require any SAN shared resources, then creating the Oracle resource along with its dependencies on the TNS resource and necessary SAN shared drives for storing ARST data files. Once the DB cluster is stable and operational, the cluster manager will automatically launch TNS because it recognizes that the Oracle resource has been established. This setup ensures that both hosts have visibility to the database through the TNS listener while maintaining proper configuration for local logs and other related configurations on each host's C: drive. In summary, the key is to properly install Oracle databases on both cluster hosts before proceeding with the Manager installation. Ensuring that all necessary resources are configured in a way that facilitates shared SAN storage for database files across multiple hosts simplifies configuration and reduces administrative overhead during operations. The passage also discusses transitioning from a more complex method of ensuring Oracle services are aware of TNS and shared disks for clustering to using Oracle Fail Safe (OFS) or modern Windows Clustering features for better efficiency. Additionally, the text highlights the simplicity and benefits of using Windows Cluster GUI in managing resources across multiple nodes without inconsistencies, which can be easily verified through SQL queries and failovers. The passage also briefly mentions the IP/host name requirements for planning a Windows cluster setup. The passage further outlines the process of setting up a system for high availability (HA) using Windows Clustering, specifically focusing on managing IP/Hostname references and SAN shared drives. It emphasizes that once these configurations are established, the system functions efficiently without issues. The author then relates this setup to SmartConnectors by explaining their self-contained nature, which allows them to be deployed on a cluster-enabled host using its SAN shared drive. SmartConnectors, such as syslog and Windows Update connectors (WUC), operate in a similar manner to the main system, leveraging the cluster's resources and SAN drives for optimal performance during failover scenarios. These connectors handle unsolicited data feeds like syslog or polling operations without needing excessive data de-duplication since they are also cluster-enabled. For systems requiring logging services (Loggers), the only practical solution is to use load balancers, which can be either DNS-based or appliance-based. The passage underscores that these configurations ensure high availability as understood in industry standards and that Windows Clustering aligns well with existing enterprise environments where domain controllers are often already deployed. Doug's message suggests that customers should handle high availability themselves by installing software on top of their existing systems without needing specific knowledge about clustering technology. He emphasizes that the solution is customer-driven and not a Windows vs. Unix-like operating system issue. The focus here seems to be on providing automated failover for SmartConnectors, possibly using load balancers or other monitoring methods to ensure availability even if some instances fail. Allen Pomeroy's response acknowledges the desire for high availability in various tiers of the system but advocates for a focused discussion on improving passive and active SmartConnector availability through automation. He notes that while many customers use load balancers and other strategies, there is a market need for a solution integrated into the product, which would provide automated failover capabilities to minimize downtime. This could potentially give HP ESP (Enterprise Storage and Servers) a competitive edge in meeting customer requirements for high availability. The text discusses the need for automated recovery mechanisms in high availability (HA) solutions, particularly due to recent regulatory changes that mandate such capabilities. It mentions that customers can achieve this with add-on products like Veritas Cluster Server, Microsoft Windows Clustering, or open source Pacemaker. However, load balancers are only defensible if they are used solely with passive SmartConnectors and not covered by the same service level as Windows event logs from an audit perspective. The text suggests that if the functions are not built into the product, offering a method to provide availability at a nominal cost, even if self-supported, can be a viable solution. The paper and Pacemaker configuration provided in the whitepaper demonstrate how customers with HP ArcSight products and a need for HA connectors can manage clustering solutions themselves. There is at least one customer who has used this method as a stopgap until similar features are available in SmartConnectors.

Details:

The article "Building a High Availability SmartConnector Cluster with open source" discusses the implementation of automated high availability for event collection via both passive and active SmartConnectors, which are used in HP ESP (Event Processing Suite). The author mentions concerns about potential deal losses due to system failures and acknowledges that there might be efforts to add appropriate features at the SmartConnector layer. In response, the author has prepared a paper and scripts using open source clustering and network disk filesystems to achieve high availability of both passive and active SmartConnectors. This method helps in quickly establishing a configuration with two separate nodes for each type of connector. The text discusses setting up a high availability (HA) cluster for syslog UDP traffic using Fedora Core 17 Linux and Windows Unified SmartConnector. It mentions that once two systems are running Fedora Core 17, automated quickstart scripts can complete the rest of the cluster setup in about 15 minutes. This process is described as very easy and fast to implement. The author then invites other system engineers and possibly software professionals to review their paper and cluster configuration before releasing it to a wider audience. They specifically ask if Daniel Aguirre or Allen Pomeroy would be interested in reviewing the setup, with Allen mentioning that he will gladly test the solution on RHEL 6.2 x64 since they officially support this platform. Allen responds positively and agrees to help by providing access to the paper and cluster configuration for testing. He also acknowledges that while he hasn't tested it on any other OS than Fedora Core 17, there is a good chance it will work on RHEL 6.2 as well, with potential adjustments needed for repositories if issues arise during testing. The document located at the URL irock.arcsight.com/docs/DOC-5033 discusses a high availability (HA) solution for ArcSight SmartConnectors using open source Veritas Cluster Software as an alternative to expensive proprietary solutions. The primary purpose of this paper is to demonstrate how to implement a HA cluster configuration without incurring the significant costs associated with commercial clustering software like Veritas Clustering Services (VCS). This approach is particularly beneficial for customers in regulated industries such as those subject to NERC regulations, who need robust data handling capabilities and availability. The document argues that while some competitors offer high availability through their own proprietary solutions, ArcSight should also provide this functionality at a lower cost without the need for expensive third-party clustering software. The authors highlight the importance of offering HA functionality not only in terms of competitive positioning but also due to recent regulatory changes that have increased demand for such capabilities. In summary, the document aims to educate about an open source solution that can provide high availability for ArcSight SmartConnectors at a lower cost than traditional proprietary solutions, which is particularly relevant for customers in heavily regulated markets. The text provided seems to be discussing challenges encountered with HA (High Availability) in the context of using Veritias clustering for ArcSight support, similar to how they handle virtualization issues. They mention that there's a lack of full support for this approach and face criticism when potential clients request documentation or support for specific products/data sources. The document outlines several shortcomings: 1. Documentation outdated: The configuration references still show old GUI screenshots and procedures from many versions ago, which do not align with the current system setup. 2. SmartConnector support lagging: There are issues with parsers, making it difficult to provide adequate support for various data sources as required by clients. 3. Lack of documentation for installing SmartConnectors on the ConApp, leading to a poor demonstration during Proofs of Concept (POC). 4. HA setup is not fully supported using Veritias clustering in this context and there are workarounds being explored like virtualization-like handling but with limitations. The author expresses their frustration and concern about these issues through the use of phrases like "raked over the coals" when facing client scrutiny, which may suggest a critical tone. The responses from other users show interest in exploring solutions together or attempting to improve the documentation and support for SmartConnectors, indicating an attempt to address the concerns raised. To summarize the text provided, it discusses the process of setting up and configuring a Windows cluster for high availability purposes using server-based components only. The goal is to create a functional setup where multiple servers can share storage resources through a SAN (Storage Area Network). This setup allows for failover between servers in case one fails; each server acts as a primary active node, with at least one alternate node ready to take over in the event of a failure. The cluster operates on shared storage provided by the SAN, which is independent of the Windows host machines. The authors mention that EMC, Hitachi, and Netapp brands support this setup, but they are unsure about HP's StorageWorks product. They also discuss how geographical distance can impact performance due to increased latency. In case of a data center outage or partial disruption, the cluster might experience a "pause" while the SAN converges; during this time, applications running on the active node may fail. The Windows cluster setup does not use local storage from any particular host, except in one specific situation where it is mentioned that we do not make use of the local storage. The installation process involves installing all necessary components onto a shared cluster drive, making the setup appear as if it were a standard single-host installation for most applications and services. The key point here is that Windows Clustering allows for automatic failover between nodes without interruption, providing high availability, but this does not mean "non-stop, no-stop" operation; rather, it offers a different level of assurance in terms of fault tolerance beyond basic fail-over capabilities. This setup ensures that applications and services remain operational even if one node fails, enhancing the reliability of the system. The provided text describes a process for setting up an environment with Oracle databases, utilizing ARST (Advanced Resource Shore Telemetry) technology, while also managing failover scenarios and handling SAN configurations. Here's a summarized version of the key points: 1. **Failover Setup**: In this setup, a host cluster is used to perform a failover action, ensuring that an alternate node becomes active. This allows the execution of the 'arcsight serviceinstall' command for ARST installation. During this process, no additional steps are required on the ARST side. 2. **ESM with Oracle**: When dealing with ESM (Enterprise Systems Management) and Oracle, it is noted that Oracle is not part of ARST. Therefore, the setup involves a traditional single host Oracle install where software is placed on C:\ drive, and all ARST-specific files are pointed to SAN shared drives. 3. **SAN Failover Handling**: If multiple SANs are involved, they must fail over together, regardless of which initiates the failover. This ensures consistent storage access for Oracle databases. 4. **Software Installation**: The setup involves installing a Manager on a SAN-shared drive and both hosts having the service installed. Afterward, Oracle is installed conventionally on a single host in a clustered environment. 5. **Registry Handling**: Despite Oracle's inherent complexity compared to ARST, the process involves reinstalling the database action exactly as it was initially performed when installing Oracle. This reinstallation ensures that both ARST DB (partition archiver) and Oracle C:\ installations are handled on the alternate host, causing the registry hive to become messy but ultimately clean overall. 6. **Cluster Preparation**: After preparing the cluster for Manager, if a database server is already part of a clustered setup, it fails over together with SANs and hosts. This involves re-installing the DB action again as in the initial setup without altering Oracle clustering features. In summary, this process outlines a method to manage ARST integration alongside Oracle deployments, emphasizing failover handling and registry management during reinstallation efforts. The main point is to ensure that the Oracle database is properly installed and configured on both Windows cluster hosts before proceeding with the Manager installation. This setup should be done in such a way that all Oracle data files, redo logs, undo logs, temporary files, system/event index/data are stored on shared SAN drives. The optimal approach would be to install Oracle databases on both hosts prior to installing the Manager. Once the Oracle installations are complete, proceed with setting up and configuring the database cluster using Windows Cluster Administrator GUI. Begin by creating a TNS resource which does not require any SAN shared resources. Next, create the Oracle resource along with its dependencies on the TNS resource and all necessary SAN shared drives for storing ARST data files. Once the DB cluster is stable and operational, the cluster manager will automatically launch TNS because it recognizes that the Oracle resource has been established. This setup ensures that both hosts have visibility to the database through the TNS listener, while also maintaining a proper configuration for local logs and other related configurations on each host's C: drive. In summary, the key is to properly install Oracle databases on both cluster hosts before proceeding with the Manager installation. Ensure that all necessary resources are configured in a way that facilitates shared SAN storage for database files across multiple hosts, which will simplify configuration and reduce administrative overhead during operations. The text discusses transitioning from a more complex method of ensuring Oracle services are aware of TNS and shared disks for clustering to using Oracle Fail Safe (OFS) or modern Windows Clustering features for better efficiency. It highlights the simplicity and benefits of Windows Cluster GUI in managing resources across multiple nodes without inconsistencies, which can be easily verified through SQL queries and failovers. The text also briefly mentions the IP/host name requirements for planning a Windows cluster setup. The passage outlines the process of setting up a system for high availability (HA) using Windows Clustering, specifically focusing on managing IP/Hostname references and SAN shared drives. It emphasizes that once these configurations are established, the system functions efficiently without issues. The author then relates this setup to SmartConnectors by explaining their self-contained nature, which allows them to be deployed on a cluster-enabled host using its SAN shared drive. SmartConnectors, such as syslog and Windows Update connectors (WUC), operate in a similar manner to the main system, leveraging the cluster's resources and SAN drives for optimal performance during failover scenarios. These connectors handle unsolicited data feeds like syslog or polling operations without needing excessive data de-duplication since they are also cluster-enabled. For systems requiring logging services (Loggers), the only practical solution is to use load balancers, which can be either DNS-based or appliance-based. The passage underscores that these configurations ensure high availability as understood in industry standards, and Windows Clustering aligns well with existing enterprise environments where domain controllers are often already deployed. Doug's message suggests that customers should handle high availability themselves by installing software on top of their existing systems without needing specific knowledge about clustering technology. He emphasizes that the solution is customer-driven and not a Windows vs. Unix-like operating system issue. The focus here seems to be on providing automated failover for SmartConnectors, possibly using load balancers or other monitoring methods to ensure availability even if some instances fail. Allen Pomeroy's response acknowledges the desire for high availability in various tiers of the system but advocates for a focused discussion on improving passive and active SmartConnector availability through automation. He notes that while many customers use load balancers and other strategies, there is a market need for a solution integrated into the product, which would provide automated failover capabilities to minimize downtime. This could potentially give HP ESP (Enterprise Storage and Servers) a competitive edge in meeting customer requirements for high availability. The text discusses the need for automated recovery mechanisms in high availability (HA) solutions, particularly due to recent regulatory changes that mandate such capabilities. It mentions that customers can achieve this with add-on products like Veritas Cluster Server, Microsoft Windows Clustering, or open source Pacemaker. However, load balancers are only defensible if they are used solely with passive SmartConnectors and not covered by the same service level as Windows event logs from an audit perspective. The text suggests that if the functions are not built into the product, offering a method to provide availability at a nominal cost, even if self-supported, can be a viable solution. The paper and Pacemaker configuration provided in the whitepaper demonstrate how customers with HP ArcSight products and a need for HA connectors can manage clustering solutions themselves. There is at least one customer who has used this method as a stopgap until similar features are available in SmartConnectors.

Disclaimer:

The content in this post is for informational and educational purposes only. It may reference technologies, configurations, or products that are outdated or no longer supported. If there are any comments or feedback, kindly leave a message and will be responded.

High Availability Infrastructure Development in 2011

Summary:

Details:

Recent Posts

Comments