Hive provides SQL-like query language on HDFS(Hadoop Distributed File System)

Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF's), aggregations (UDAF's), and table functions (UDTF's). Hive Query Lan...

Apache HBase is a storage system, with roots in Hadoop, and uses HDFS for underlying storage.

Apache HBase is a storage system, with roots in Hadoop, and uses HDFS for underlying storage.Apache HBase is a storage system, with roots in Hadoop, from which it gets its "H". Though HBase uses HDFS for underlying storage, HBase is designed much more for fast and frequent access to blobs of binary data. It is an example of what most would call a NoSQL column-oriented store; it holds semi-structured values for keys. Below is the reference architecture based on HDFS, MapReduce, and HBase. MapReduce might be used for parallel processing to calculate something. I will search much detailed knowledge in...

HDFS(Hadoop Distributed File System) is designed to run on commodity hardware – Low cost hardware

HDFS(Hadoop Distributed File System) is designed to run on commodity hardware – Low cost hardwareThe Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming a...

Hadoop MapReduce is a software framework for processing vast amounts of data in-parallel on large clusters

Hadoop MapReduce is a software framework for processing vast amounts of data in-parallel on large clustersHadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. In other words, Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunk...