How To: Manage Hive File Creation

File Format

Hive is frequently used for ETL work and transforming data as a result there are often intermediate files written to the Hadoop Distributed File System supporting the processing. This output is created by the Mappers and Reducers which feed data to subsequent tasks.

Final Output

hive.default.fileformat

default file format - set during CREATE TABLE

HDFS Output

hive.query.result.fileformat

default file format - set during query

 

File Count

Developers may instruct hive to merge files at the end of Jobs and also configure either the size of the merged files or the average size of the merged files. Hive will spawn additional MapReduce jobs to meet the requirements set in the configuration therefore these settings may increase the workload.

HDFS Output

hive.exec.max.created.files

Maximum number of HDFS files

 

Compression Policy

Administrators may choose to compress either the final or intermediate output files.

Final Output

hive.exec.compress.output

set to true to compress the files

HDFS Output

hive.exec.compress.intermediate

set to true to compress the files

 

File Extension

Administrators may choose to set the file extension.

Final Output

hive.output.file.extension

File extension, defaults to compress codec.

Have more questions? Submit a request

Comments

Powered by Zendesk