Hive WordCount hiveQL Execution

posted on Nov 20th, 2016

Apache Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. The traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over a distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like Queries (HiveQL) into the underlying Java API without the need to implement queries in the low-level Java API. Since most of the data warehousing application work with SQL based querying language, Hive supports easy portability of SQL-based application to Hadoop.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Hive 2.1.0 pre installed (How to Install Hive on Ubuntu 14.04)

Hive WordCount hiveQL Example

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - Create employee.txt file.

employee.txt

Step 4 - Add these following lines to employee.txt file. Save and close.

1201 Gopal 45000 TechnicalManager TP
1202 Manisha 45000 ProofReader PR
1203 Masthanvali 40000 TechnicalWriter TP
1204 Krian 40000 HrAdmin HR
1205 Kranthi 30000 OpAdmin Admin

Step 5 - Copy employee.txt from local file system into HDFS.

$ hdfs dfs -copyFromLocal /home/hduser/Desktop/employee.txt /user/hduser/employee123.txt

Step 6 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 7 - Create wordcount hive query file. The file should have .hql extension.

wordcount.hql

Step 8 - Add thses following lines to wordcount.hql Save and close.

CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'hdfs://localhost:9000/user/hduser/employee123.txt' OVERWRITE INTO TABLE docs;
CREATE TABLE word_counts AS
SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\\s')) AS word FROM docs) w
GROUP BY word
ORDER BY word;

Step 9 - Execute wordcount.hql hiveQL

$ hive -f /home/hduser/Desktop/HIVE/wordcount.hql

Hive WordCount hiveQL Example

Hive WordCount hiveQL Example

Step 10 - Execute select hiveQL

$ hive -e 'select * from word_counts'

Hive WordCount hiveQL Example

Set these Hive Execution Parameters in hive-site.xml

  <property>
     <name>mapred.reduce.tasks</name>
     <value>-1</value>
     <description>The default number of reduce tasks per job.</description>
  </property>

  <property>
     <name>hive.exec.scratchdir</name>
     <value>/tmp/mydir</value>
     <description>Scratch space for Hive jobs</description>
  </property>

  <property>
     <name>hive.metastore.warehouse.dir</name>
     <value>/user/hive/warehouse</value>
     <description>location of default database for the warehouse</description>
  </property>

  <property>
     <name>hive.enforce.bucketing</name>
     <value>true</value>
     <description>Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced. </description>
  </property>

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Hive Installation With Derby Database Metastore   Hive Installation With MySQL Database Metastore   Beeline Client Usage   hiveserver2 and Web UI usage   Hive Metastore Configuration   Hive Command Line Interface   Hive Shell Commands usage   Hive Distributed Cache   HDFS and Linux Commands in hive shell   Customizing hive logs   Database Commnds Usage   Table Commands Usage   Hive Partitioning Configuration   Hive Bucketing Configuration   UDFs Java Example   UDAFs Java Example   UDTF Java Example   Hive JDBC client Java Example   Hive Web Interface (HWI)   HiveQL Examples