SQL Engine - Facts & Suggestions

Facts & Suggestions

When considering the various cloud engine and cluster pairings available there are a lot of factors to consider if we are to make an informed decision. The grid below not only highlights the potential reasons to choose one engine over another but also identifies the line item as either a Fact or a Suggestion. While we know that Facts will hold true Suggestions are not absolute as there may be corner cases which would force individuals to select a different solution.

 

Architecture Breakdown

   

MR

Tez

Presto

Spark

Fact

Writes to disk between consecutive jobs

x

     

Fact

Streams data in memory between tasks

 

x

x

x

Fact

ANSI SQL compliant

   

x

x

 

Usage Breakdown

   

MR

Tez

Presto

Spark

Suggestion  

Appropriate for Data Analysts

x

x

x

 

Suggestion

Ad-Hoc use case appropriate

x

x

x

 

Suggestion

Appropriate for Data Engineers

x

x

 

Fact

Data volume above 100 TB

x

x

 

x

Fact

Dynamic Partitioning support

x

x

   

Suggestion

Batch Workload / ETL use case appropriate

x

x

 

x

Fact

Notebook support available

   

x

x

Suggestion

Appropriate for Data Scientists

     

x

 

SQL Breakdown

    MR

Tez

Presto

Spark

Fact

In Memory join option

 

x

(default behavior) 

x

Suggestion

Appropriate for simple joins

x

x

x

x

Suggestion

Appropriate for complex joins

x

x

   

Suggestion

4 or more tables in join

x

x

 

x

 

Format Breakdown

    MR Tez Presto

Spark

Suggestion

Prefers the Parquet format

     

x

Suggestion

Prefers the ORC format

x

x

x

Have more questions? Submit a request

Comments

Powered by Zendesk