Reference: Cluster Facts & Suggestions

Facts & Suggestions

When considering the various cloud engine and cluster pairings available there are a lot of factors to consider if we are to make an informed decision. There is a significant amount of public documentation and benchmarking efforts available which we can draw from. The grid below not only highlights the potential reasons to choose one cluster over another but also identifies the line item as either a Fact or a Suggestion. While we know that Facts will hold true Suggestions are not absolute as there may be corner cases which would force individuals to select a different solution.

 

Fact Breakdown

   

MR

Tez

Presto

Spark

Fact

ANSI SQL compliant

   

x

x(2.0)

Fact

SQL - like language support

x

x

x

x

Fact

Python Support

     

x

Fact

R support

     

x

Fact

Notebook support

   

x

x

Fact

Data volume less than 100 TB

x

x

x

x

Fact

Data volume above 100 TB

x

x

 

x

Fact

Potential for Memory Depletion

 

x

x

x

Fact

Streaming Support out of the box

     

x

Fact

Machine Learning Support out of the box

     

x

Fact

Memory Intensive Processing

 

x

x

x

Suggestion Breakdown

Suggestion

SQL user friendly

x

x

x

x

Suggestion

Ad-Hoc Use Case

x

x

x

 

Suggestion

ETL workloads with large data volumes

x

x

   

Suggestion

Most reliable / least likely to fail

x

x

   

Suggestion

Appropriate for Data Scientists & Engineers

     

x

 

Have more questions? Submit a request

Comments

Powered by Zendesk