Categories
Configuration Hadoop Hive Scalability

Give Your MySQL Account Access to Hive

One area of the Apache Hive documentation that’s not entirely explicit is in regard to the database privileges needed for its metastore[1]. Developers often become accustomed to creating a database account that has all privileges granted. But in the Real World, end users of Hive must configure it to point to a metastore RDBMS account with very specific privileges granted by the DBA.

Are You Privileged?

By default, Hive is configured to use a (local-to-that-system) Apache Derby database instance as its metastore. However, once an initial evaluation phase of Hive is complete, the team will likely need to move to a shared database to use as their metastore so the whole team is in sync. MySQL is a suitable database to be used as the metastore, though Hive supports other relational databases as well.

So which database roles must be granted to the account used for the Hive metastore in MySQL?

  • CREATE
  • DROP
  • ALTER
  • INDEX
  • SELECT
  • INSERT
  • UPDATE
  • DELETE

You can refer to the official MySQL documentation for granting privileges regarding the specifics of the roles and the exact syntax.

Here’s an example MySQL script that creates a dedicated user (“hive”), a MySQL database (“hivemetastore”), and grants the user some DDL[2] and DML privileges:


CREATE DATABASE hivemetastore;
CREATE USER 'hive'@'%' IDENTIFIED BY 'mypassword';
GRANT CREATE, DROP, ALTER, INDEX ON hivemetastore.* TO 'hive'@'%';
GRANT SELECT, INSERT, UPDATE, DELETE ON hivemetastore.* TO 'hive'@'%';

Once the DBA has created the account and granted it the requisite privileges, the user running Hive can specify this MySQL database as his Hive metastore. This is done either by exporting the following settings via HIVE_OPTS or by specifying them on the command line when invoking the CLI:


javax.jdo.option.ConnectionURL=jdbc:mysql://your.mysql.server/hivemetastore
javax.jdo.option.ConnectionUserName=hive
javax.jdo.option.ConnectionPassword=mypassword

[1] It can be confusing when discussing Hive someone refers to the “Hive” database, tables, or columns. To which are they referring? To the database, tables, and columns that Hive presents (with data stored in HDFS, S3, etc.) or the metastore database, tables, and columns (with data stored in MySQL, Postgres, etc.) in which Hive stores its meta data? Unfortunately the two are often used interchangeably.

[2] The reason that the CREATE, DROP, ALTER, and INDEX privileges are needed is because Hive can dynamically create the needed schema in the metastore database in which to store information about the Hive database.