top of page

Event Super Highway from CORR to Hadoop

  • Writer: Pavan Raja
    Pavan Raja
  • Apr 8, 2025
  • 5 min read

Summary:

The document discusses a project involving "Event Super Highway," focusing on integrating data from various sources, including CORR (not explicitly defined), into Hadoop through an innovative tool called ESM-Hadoop Event Transfer Tool. Here's a concise summary of its main points: 1. **Hadoop Overview**: - Introduces Hadoop Distributed File System (HDFS) and MapReduce framework for handling large volumes of data efficiently. - Typical use cases include data aggregation, web log analysis, and data mining tasks. 2. **HDFS Architecture**: - Explains the architecture with Namenode managing metadata and Data Nodes storing actual data with replication for fault tolerance. 3. **Innovation in Big Data Integration**: - Demonstrates transferring 300k events per second from ArcSight products to a Hadoop cluster using three DL980 servers, handling up to 20+ billion events daily. - The ESM-Hadoop Event Transfer Tool is used for this purpose, capable of forwarding events at a rate of 5k EPS and completing the task in about 20 days. 4. **Event Super Highway**: - Hints at integrating various systems without migrating to NoSQL databases. 5. **ESM-Hadoop Event Transfer Tool**: - A standalone migration tool for transferring events directly from storage files to Hadoop, converting data into CEF format for storage in Hadoop configurations. - Pros include high performance, efficient data transfer, and scalability. - Cons involve potential loss of compression during conversion from ROS to CEF. - Installation involves downloading a specific build version for Linux, running an installer script, and providing the path to Hadoop's core-site.xml file. - Usage requires using the command line interface with parameters like destination type, path, number of threads, etc. 6. **Performance Metrics**: - The tool can handle up to 20k EPS per thread, showing scalability in handling large volumes of events. 7. **Example Command**: ```bash Arcsight event_transfer -dtype Hadoop -dpath /usr/hadoop/events -threads 10 ``` This document provides an overview of Hadoop's capabilities and focuses on the practical application of integrating legacy systems like ArcSight into Hadoop for big data processing using the ESM-Hadoop Event Transfer Tool.

Details:

The document discusses a project related to "Event Super Highway," which involves integrating data from various sources such as CORR (which isn't explicitly defined) to Hadoop. Hong Yan, a Senior Software Engineer, presents an overview of Hadoop and the innovation in big data integration through the use of ESM-Hadoop event transfer tool. The main components discussed are: 1. **Hadoop Overview**:

  • It introduces Hadoop Distributed File System (HDFS), designed for handling very large amounts of data across multiple machines, providing high-throughput access to data.

  • Hadoop MapReduce is presented as a software framework enabling processing of vast amounts of data in parallel.

  • Typical use cases include data aggregation like sorting, word count, phrase count; web log and traffic analysis; and data mining tasks such as clustering and classification.

2. **HDFS Architecture**: Explains the architecture with metadata handled by the Namenode and actual data stored across Data Nodes, including replication of blocks for fault tolerance. 3. **Innovation in Big Data Integration**:

  • A specific use case demonstrates transferring 300k events per second (EPS) from ArcSight products to a Hadoop cluster. This is achieved through deploying three DL980 servers each sustaining 100k EPS, handling a total of 20+ billion events daily. The ESM-Hadoop event transfer tool forwards these events at a rate of 5k EPS in one deployment scenario, taking about 20 days to complete the task.

4. **Event Super Highway**: This section is not fully expanded but hints at integrating various systems without migrating to NoSQL databases. Overall, the document provides an overview of Hadoop's architecture and capabilities, with a focus on practical applications such as event transfer from legacy systems like ArcSight into Hadoop for big data processing. The ESM-Hadoop Event Transfer Tool is a standalone migration tool designed for transferring events directly from storage files to Hadoop. It allows users to run the tool on machines with an ESM and CORR-engine, converting data into CEF format, which can then be stored in Hadoop configurations. **Pros:** 1. **High Performance**: The tool is capable of sustaining up to 20k EPS per thread, making it highly scalable for large-scale event processing across a "Super Highway" (presumably meaning efficient data flow). 2. **Efficient Data Transfer**: Utilizes a sequential scan with minimal input/output operations, which can lead to more efficient compression and faster transfer rates. 3. **Scalability**: Designed to handle large volumes of events efficiently, making it suitable for environments requiring high throughput. **Cons:** 1. **Loss of Compression**: The tool converts data from its native format (presumably ROS) to CEF, which may result in a loss of the highly efficient compression typically achieved in ROS format. **Installation and Usage:**

  • Download the specific build version of the tool for Linux.

  • Run the installer script provided in the downloaded package.

  • Provide the path to the Hadoop configuration file (core-site.xml) during installation.

  • Use the command line interface `arcsight event_transfer ` where parameters include destination type, path, number of threads, and more based on specific requirements or queries.

**Example Command:** ```bash Arcsight event_transfer -dtype Hadoop -dpath /usr/hadoop/events -threads 10 ``` This tool is intended for forwarding EPS (possibly Event Processing Speed) at a rate of up to 4 hours, building a "Super Highway" with multiple lanes for efficient data transfer. The performance graphs show the connector super highway reaching forward up to 600k EPS as per the provided information. This text appears to be a section of an internal document or technical specification from Hewlett-Packard (HP), discussing a specific setup and configuration used for some type of testing or analysis related to data processing. Here's a breakdown of the key points from this short passage: 1. **Testing Methodology**: The term "ance testing" is not clearly defined, but it might refer to stress testing or load testing which are common in performance engineering and system administration to evaluate how well a machine, software, or network can handle increased demand or heavy loads. 2. **EPS (Estimated Peak Speed)**: This refers to the estimated maximum speed that the system can achieve under peak conditions. The value given here is 491k, which could be in any unit of measurement related to processing speed or throughput. 3. **Machine Type**: The machine being used for this testing or analysis is a DL980 model from HP. This type of machine is typically known as a workstation server and is often used for high-performance computing tasks due to its robust hardware configuration and scalability. 4. **Hadoop Configuration**: Hadoop is mentioned as having 7 nodes configured, which indicates that the system is using Hadoop distributed processing technology across multiple servers or virtual machines. This setup allows for parallel processing and can handle large volumes of data. 5. **Threading and Core Utilization**: There are 60 threads being used in this configuration, suggesting a high level of parallelism to test how efficiently multi-threading contributes to the performance and responsiveness of the system under stress. The use of multiple threads is typical in modern computing environments where handling large data sets or complex calculations require efficient resource management and execution. 6. **Copyright Notice**: The document ends with three identical copyright notices from Hewlett-Packard, emphasizing that the information contained herein is subject to change without notice and carries a disclaimer regarding copyright ownership and usage rights. In summary, this passage provides technical details about an environment set up for performance testing on a specific machine model within a Hadoop cluster using multiple threads. The text also includes typical language used in internal company documents to clarify the nature of the information contained within and the conditions under which it is provided.

Disclaimer:
The content in this post is for informational and educational purposes only. It may reference technologies, configurations, or products that are outdated or no longer supported. If there are any comments or feedback, kindly leave a message and will be responded.

Recent Posts

See All
Zeus Bot Use Case

Summary: "Zeus Bot Version 5.0" is a document detailing ArcSight's enhancements to its Zeus botnet detection capabilities within the...

 
 
 
Windows Unified Connector

Summary: The document "iServe_Demo_System_Usage_for_HP_ESP_Canada_Solution_Architects_v1.1" outlines specific deployment guidelines for...

 
 
 

Comments


@2021 Copyrights reserved.

bottom of page