/scala_on_databricks

Consuming an API with Scala in Databricks.

Primary LanguageScala

Welcome to request_with_scala

This project has educational goals and aims to explore how to use Scala in Databricks, consuming an API, and dealing with the response using Spark and native solutions.

link to the solution in Databricks --> here <--

The Reddit APIs were used. It returns the top 50 stocks discussed in the Wallstreetbets subreddit over the last 15 minutes, including a sentiment analysis of the discussions. Documentation is available here.

For this project, DBR 13.3 was used*, and the current ENV was set to use JDK-11 on the cluster:

JNAME=zulu11-ca-amd64

This JVM is necessary to enable java.net.http.HttpRequest** in Databricks and is required by the request libraries described below:

Scala package Description Maven coordinates Reference
sttp.client3 Scala library that provides HTTP request and response handlers. com.softwaremill.sttp.model:core_2.12:1.7.10 Documentation
sttp.model Provides HTTP models such as headers, URIs, methods, etc. Required for sttp.client. com.softwaremill.sttp.tapir:tapir-sttp-client_2.12:1.10.6 Documentation

*DBR 11.3 until 14.3 are tested and is not expected incompatibility.

**error found: BootstrapMethodError: java.lang.NoClassDefFoundError: java/net/http/HttpRequest. Solution find here.

A class to deal with sttp.client Response, with attributes:

  • client: A SimpleHttpClient from sttp.client instance. Used to execute the request.
  • requestEndpoint: The endpoint informed
  • successStatusCode: The 200 status code

And the methods:

  • getResponse: Return the get response from the endpoint informed.
  • checkRequestStatusCode: Raises a exception if response status code is different from 200.
  • transformResponseToDataframe: Return a spark dataframe if request was succesfull.
  1. Upload request_with_scala.dbc or request_with_scala.scala on your Databricks Workspace;
  2. Install the packages listed in Cluster Configs;
  3. Open a PR with your improvements!
  • Use the tables for a logistic regression model.
  • Made a star schema with the current layers.