How To: Manage Hive File Creation

File Format

Hive is frequently used for ETL work and transforming data as a result there are often intermediate files written to the Hadoop Distributed File System supporting the processing. This output is created by the Mappers and Reducers which feed data to subsequent tasks.

Final Output


default file format - set during CREATE TABLE

HDFS Output


default file format - set during query


File Count

Developers may instruct hive to merge files at the end of Jobs and also configure either the size of the merged files or the average size of the merged files. Hive will spawn additional MapReduce jobs to meet the requirements set in the configuration therefore these settings may increase the workload.

HDFS Output


Maximum number of HDFS files


Compression Policy

Administrators may choose to compress either the final or intermediate output files.

Final Output


set to true to compress the files

HDFS Output


set to true to compress the files


File Extension

Administrators may choose to set the file extension.

Final Output


File extension, defaults to compress codec.

Have more questions? Submit a request


Powered by Zendesk