Hive hiveQL Examples

posted on Nov 20th, 2016

Apache Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. The traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over a distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like Queries (HiveQL) into the underlying Java API without the need to implement queries in the low-level Java API. Since most of the data warehousing application work with SQL based querying language, Hive supports easy portability of SQL-based application to Hadoop.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Hive 2.1.0 pre installed (How to Install Hive on Ubuntu 14.04)

Hive hiveQL examples

Hive is a database technology that can define databases and tables to analyze structured data. The theme for structured data analysis is to store the data in a tabular manner, and pass queries to analyze it. This post explains how to create Hive database and hiveQL queries. Hive contains a default database named default.

Step 1 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 2 - Enter into hive shell

$ hive

Database

Create Database is a statement used to create a database in Hive. The usage of SCHEMA and DATABASE are same.

hive> CREATE DATABASE [IF NOT EXISTS] userdb;

OR

hive> CREATE SCHEMA userdb;

The following query is used to verify a databases list

hive> SHOW DATABASES;

Table

Create Table is a statement used to create a table in Hive.

CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
> salary String, destination String)
> COMMENT 'Employee details'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ' '
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;

Create sample.txt file

sample.txt

Add following lines to sample.txt file. Save and close.

1201 Gopal 45000 Technical manager
1202 Manisha 45000 Proof reader
1203 Masthanvali 40000 Technical writer
1204 Krian 40000 Hr Admin
1205 Kranthi 30000 Op Admin

Load

We can insert data using the LOAD DATA statement. While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.

LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/sample.txt' OVERWRITE INTO TABLE employee;

Verify

select * from employee;

Alter

How to alter the attributes of a table such as changing its table name, changing column names, adding columns, and deleting or replacing columns.

hive> ALTER TABLE employee RENAME TO emp;
hive> ALTER TABLE employee CHANGE name ename String;
hive> ALTER TABLE employee CHANGE salary salary Double;

Add Columns

hive> ALTER TABLE employee ADD COLUMNS (
> dept STRING COMMENT 'Department name');

Replace Columns

hive> ALTER TABLE employee REPLACE COLUMNS (ename String name String);

Partitioning

Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.

hive> ALTER TABLE employee
> ADD PARTITION (year=’2013’)
> location '/2013/part2013';

hive> ALTER TABLE employee
> ADD PARTITION (year=’2012’)
> location '/2012/part2012';

hive> ALTER TABLE employee PARTITION (year=’2013’)
> RENAME TO PARTITION (Yoj=’2013’);

ALTER TABLE employee DROP [IF EXISTS]
> PARTITION (year=’2013’);

Built-in operators of Hive

There are four types of operators in Hive

1. Relational Operators

2. Arithmetic Operators

3. Logical Operators

4. Complex Operators

SELECT * FROM employee WHERE Id=1205;

SELECT * FROM employee WHERE Salary>=40000;

SELECT 20+30 ADD FROM temp;

SELECT * FROM employee WHERE Salary>40000 && Dept=TP;

View

How to create and manage views. Views are generated based on user requirements. You can save any result set data as a view. The usage of view in Hive is same as that of the view in SQL.

hive> CREATE VIEW emp_30000 AS SELECT * FROM employee WHERE salary>30000;

select * from emp_30000;

hive> DROP VIEW emp_30000;

Index

An Index is nothing but a pointer on a particular column of a table. Creating an index means creating a pointer on a particular column of a table.

hive> CREATE INDEX inedx_salary ON TABLE employee(salary) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';

hive> DROP INDEX index_salary ON employee;

HiveQL Syntax

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number];

Select and Where Query

Create a employee.txt file

employee.txt

Add following lines to employee.txt file. Save and close.

1201 Gopal 45000 TechnicalManager TP
1202 Manisha 45000 ProofReader PR
1203 Masthanvali 40000 TechnicalWriter TP
1204 Krian 40000 HrAdmin HR
1205 Kranthi 30000 OpAdmin Admin

Create a table called employee

hive> CREATE TABLE IF NOT EXISTS employee (id int, name String, salary String, destination String, dept String) COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

Load employee.txt data into table employee.

hive> LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/employee.txt' OVERWRITE INTO TABLE employee;

hive> SELECT * FROM employee WHERE salary>30000;

OrderBy

The ORDER BY clause is used to retrieve the details based on one column and sort the result set by ascending or descending order.

SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

GroupBy

The GROUP BY clause is used to group all the records in a result set using a particular collection column. It is used to query a group of records.

SELECT Dept,count(*) FROM employee GROUP BY DEPT;

Drop a table

hive> DROP TABLE IF EXISTS employee;

View tables

hive> SHOW TABLES;

Drop a database

hive> DROP DATABASE IF EXISTS userdb;

hive> DROP DATABASE IF EXISTS userdb CASCADE;

OR

hive> DROP SCHEMA userdb;

View databases

hive> SHOW databases;

Joins

JOINS is a clause that is used for combining specific fields from two tables by using values common to each one. It is used to combine records from two or more tables in the database. It is more or less similar to SQL JOINS.

Create a table customers

hive> CREATE TABLE IF NOT EXISTS customers (id int, name String, age int, address String, salary String) COMMENT 'customer details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

Create a customers.txt file

customers.txt

Add following lines to customers.txt file. Save and close.

1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP 4500.00
7 Muffy 24 Indore 10000.00

Load customers.txt data into customers table.

LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/customers.txt' OVERWRITE INTO TABLE customers;

Create a table orders

CREATE TABLE IF NOT EXISTS orders (oid int, odate String, id int, amount String) COMMENT 'order details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\n' STORED AS TEXTFILE;

Create a orders.txt file

orders.txt

Add following lines to customers.txt file. Save and close.

102 2009-10-08-00:00:00 3 3000
100 2009-10-08-00:00:00 3 1500
101 2009-11-20-00:00:00 2 1560
103 2008-05-20-00:00:00 4 2060

Load orders.txt data into orders table.

LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/orders.txt' OVERWRITE INTO TABLE orders;

JOIN

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.id = o.id);

LEFT OUTER JOIN

SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c LEFT OUTER JOIN ORDERS o ON (c.ID = o.ID);

RIGHT OUTER JOIN

SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c RIGHT OUTER JOIN ORDERS o ON (c.ID = o.ID);

FULL OUTER JOIN

SELECT c.ID, c.NAME, o.AMOUNT, o.ODATE FROM CUSTOMERS c FULL OUTER JOIN ORDERS o ON (c.ID = o.ID);

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Hive Installation With Derby Database Metastore   Hive Installation With MySQL Database Metastore   Beeline Client Usage   hiveserver2 and Web UI usage   WordCount hiveQL Execution   Hive Metastore Configuration   Hive Command Line Interface   Hive Shell Commands usage   Hive Distributed Cache   HDFS and Linux Commands in hive shell   Customizing hive logs   Database Commnds Usage   Table Commands Usage   Hive Partitioning Configuration   Hive Bucketing Configuration   UDFs Java Example   UDAFs Java Example   UDTF Java Example   Hive JDBC client Java Example   Hive Web Interface (HWI)