Welcome to request_with_scala

This project has educational goals and aims to explore how to use Scala in Databricks, consuming an API, and dealing with the response using Spark and native solutions.

link to the solution in Databricks --> here <--

Topics

The Source
Cluster Configs
Class ReadRequest
How to use
Next steps

The source

The Reddit APIs were used. It returns the top 50 stocks discussed in the Wallstreetbets subreddit over the last 15 minutes, including a sentiment analysis of the discussions. Documentation is available here.

Cluster Configs

For this project, DBR 13.3 was used*, and the current ENV was set to use JDK-11 on the cluster:

JNAME=zulu11-ca-amd64

This JVM is necessary to enable java.net.http.HttpRequest** in Databricks and is required by the request libraries described below:

Scala package	Description	Maven coordinates	Reference
sttp.client3	Scala library that provides HTTP request and response handlers.	`com.softwaremill.sttp.model:core_2.12:1.7.10`	Documentation
sttp.model	Provides HTTP models such as headers, URIs, methods, etc. Required for `sttp.client`.	`com.softwaremill.sttp.tapir:tapir-sttp-client_2.12:1.10.6`	Documentation

*DBR 11.3 until 14.3 are tested and is not expected incompatibility.

**error found: BootstrapMethodError: java.lang.NoClassDefFoundError: java/net/http/HttpRequest. Solution find here.

Class ReadRequest

A class to deal with sttp.client Response, with attributes:

client: A SimpleHttpClient from sttp.client instance. Used to execute the request.
requestEndpoint: The endpoint informed
successStatusCode: The 200 status code

And the methods:

getResponse: Return the get response from the endpoint informed.
checkRequestStatusCode: Raises a exception if response status code is different from 200.
transformResponseToDataframe: Return a spark dataframe if request was succesfull.

How to use

Upload request_with_scala.dbc or request_with_scala.scala on your Databricks Workspace;
Install the packages listed in Cluster Configs;
Open a PR with your improvements!

Next steps

Use the tables for a logistic regression model.
Made a star schema with the current layers.

calilisantos/scala_on_databricks