Hive User Defined Functions (UDF) Java Example

posted on Nov 20th, 2016

Apache Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. The traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over a distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like Queries (HiveQL) into the underlying Java API without the need to implement queries in the low-level Java API. Since most of the data warehousing application work with SQL based querying language, Hive supports easy portability of SQL-based application to Hadoop.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Hive 2.1.0 pre installed (How to Install Hive on Ubuntu 14.04)

Hive User Defined Functions (UDF) Java Example

Generally Hive having some Built-in functions,we can use that Built-in functions for our Hive program with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function).

There are three types of UDFs

1) Regular UDFs

2) User Defined Aggregate Functions - UDAFs (See,here)

3) User Defined Table Generating Functions - UDTFs (See,here)

Here is the simple steps of How To Write Hive UDF Example In Java.

Step 1 - Add these jar files to your java project.

hive-exe*.jar

$HIVE_HOME/lib/*.jar
$HADOOP_HOME/share/hadoop/mapreduce/*.jar
$HADOOP_HOME/share/hadoop/common/*.jar

AutoIncrementUDF.java

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(stateful = true)
public class AutoIncrementUDF extends UDF {
	int ctr;
	public int evaluate() {
		ctr++;
		return ctr;
	}
}

Step 2 - Compile and create a jar file of your java project. Creating a jar file is left to you.

Step 3 - You can add jar file in ways

1) Using Hive Shell

Step 4 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 5 - Enter into hive shell

$ hive

hive> ADD JAR /home/hduser/Desktop/HIVE/AutoIncrementUDF.jar;

OR

2) hive-site.xml

hive-site.xml

<property>
    <name>hive.aux.jars.path</name>
    <value>file:///home/hduser/Desktop/HIVE/AutoIncrementUDF.jar</value>
</property>

OR

3) hive-env.sh

hive-env.sh

export HIVE_AUX_JARS_PATH="/home/hduser/Desktop/HIVE/AutoIncrementUDF.jar"

Step 6 - Create a function

hive> CREATE TEMPORARY FUNCTION incr AS 'AutoIncrementUDF';  

OR

Step 6 - Create a function

hive> CREATE PERMANENT FUNCTION incr AS 'AutoIncrementUDF';

Step 7 - Create a data.csv file

data.csv

Step 8 - Add these following lines to data.csv file Save and close.

row1,c1,c2
row2,c1,c2
row3,c1,c2
row4,c1,c2
row5,c1,c2
row6,c1,c2
row7,c1,c2
row8,c1,c2
row9,c1,c2
row10,c1,c2

Step 9 - Create a table t1, load data.csv data into the table and verify.

hive> CREATE TABLE t1 (id STRING, c1 STRING, c2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/data.csv' OVERWRITE INTO TABLE t1;
hive> SELECT * FROM t1;

Step 10 - Create a table increment_table1, execute UDF and verfiy.

hive> CREATE TABLE increment_table1 (id INT, c1 STRING, c2 STRING, c3 STRING);
hive> INSERT OVERWRITE TABLE increment_table1 SELECT incr() AS inc, id, c1, c2 FROM t1;
hive> SELECT * FROM increment_table1;

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Hive Installation With Derby Database Metastore   Hive Installation With MySQL Database Metastore   Beeline Client Usage   hiveserver2 and Web UI usage   WordCount hiveQL Execution   Hive Metastore Configuration   Hive Command Line Interface   Hive Shell Commands usage   Hive Distributed Cache   HDFS and Linux Commands in hive shell   Customizing hive logs   Database Commnds Usage   Table Commands Usage   Hive Partitioning Configuration   Hive Bucketing Configuration   UDAFs Java Example   UDTF Java Example   Hive JDBC client Java Example   Hive Web Interface (HWI)   HiveQL Examples