Hadoop Ingestion Tool

Hadoop Ingestion Tool or Hadoop Job Runner (HIT) is a comprehensive web application which lets you to interact with Hadoop cluster .The application allows any data whether structured, semi-structured or unstructured of any volume to be stored and processed at the cluster effectively and present those results to the end user

HIT offers the following features:

  • Submit Map Reduce jobs to the Hadoop cluster and present the processed results to the client
  • Schedule the map reduce jobs
  • Ingest structured data(RDBMS) to Big Data sources like HDFS and Hive and vice versa
  • Ingest unstructured or semi structured data from various File Servers like HTTP or FTP Server to Big Data sources like HDFS and Hive.

Architecture

HIT application runs on any web browser and no additional installation is required by the client. It acts like an interface between the web browser (client) and Hadoop cluster like Cloudera or Hortonworks. The below diagram explains the architecture of the project at a generic level and works as shown in the steps below:

hit archtecture

  • User(Browser) provides login credentials to HIT application login page
  • HIT Server processes that request and checks the users credentials against the credentials present in its backend database (HIT DB).Once authentication and authorization are successful the user is redirected to Home screen with menu options based on his/her role access.
  • User (Browser) sends request regarding uploading Map reduce job or uploading data to Hadoop cluster like CDH (Cloudera).
  • HIT Server processes that request and uses the Hadoop API internally to communicate with the Hadoop Cluster which performs the given action.