How To : Create Spark Session in Qubole Analyze

To process the data in spark, the first thing we need is to create Spark Context/ Spark session(from Spark2.0). 

To know more about SparkSession - https://qubole.zendesk.com/hc/en-us/articles/115001297003-Reference-Spark-Session-vs-Spark-Context 

Here are the code snippets to create Spark Context and Spark Sessions in Qubole:

-> If you choose to write a spark-Scala application in Analyze, Spark Context(name - sc) and Spark Session(name - spark) are pre-created. Here is an example code of data frame creation using a CSV file.

Code:

Screen_Shot_2018-01-10_at_2.50.00_PM.png

 

Output:

Screen_Shot_2018-01-10_at_2.48.33_PM.png

-> In Pyspark, the session and context have to be created explicitly. Here is an example code to load the data into a data frame using spark session.

 

Code:

Screen_Shot_2018-01-10_at_4.26.42_PM.png

 

Output:

Screen_Shot_2018-01-10_at_4.29.25_PM.png

 -> Qubole creates required Spark session automatically if you choose to compose a query using SQL option.

A sample query:

Screen_Shot_2018-01-10_at_4.38.39_PM.png

Output:

Sql_output.png

-> In SparkR-Analyze, Spark Context and Session can be created easily. Here is an example code of loading the SparkR library and R dataframe creation.

Code:

Screen_Shot_2018-01-10_at_5.11.41_PM.png

 

Output:

 

-> Last note, SparkContext(name - sc) and SparkSession(name - spark) are pre-created in Qubole notebooks so that you can work with data directly. 

Have more questions? Submit a request

Comments

Powered by Zendesk