Apache Hive

[3][4] Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Hive facilitates the integration of SQL-based querying languages with Hadoop, which is commonly used in data warehousing applications.

[8] Apache Hive supports the analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem and Alluxio.

[18][19] Major components of the Hive architecture are: While based on SQL, HiveQL does not strictly follow the full SQL-92 standard.

[27] Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReduce, Tez, or Spark jobs, which are submitted to Hadoop for execution.

The word count can be written in HiveQL as:[5] A brief explanation of each of the statements is as follows: Checks if table docs exists and drops it if it does.

This query serves to split the input words into different rows of a temporary table aliased as temp.

As any typical RDBMS, Hive supports all four properties of transactions (ACID): Atomicity, Consistency, Isolation, and Durability.

[29] The recent version of Hive 0.14 had these functions fully added to support complete ACID properties.

[30] Enabling INSERT, UPDATE, and DELETE transactions require setting appropriate values for configuration properties such as hive.support.concurrency, hive.enforce.bucketing, and hive.exec.dynamic.partition.mode.

The Hadoop distributed file system authorization model uses three entities: user, group and others with three permissions: read, write and execute.

The default permissions for newly created files can be set by changing the unmask value for the Hive configuration variable hive.files.umask.value.