`polars.SQLContext` like feature for `node-polars `
Closed this issue · 0 comments
I propose adding a feature to the node-polars library that mirrors the functionality of the SQLContext in the polars Python client. This feature would allow users to execute SQL queries against registered DataFrame data within Node.js, providing a more familiar and versatile way to manipulate and query data.
Motivation:
Introducing an SQLContext-like feature to node-polars would offer users a familiar interface for querying and manipulating DataFrame data using SQL syntax. This would improve usability and reduce the learning curve for users transitioning from other data manipulation libraries that support SQL-like querying. Additionally, it would enhance the versatility of node-polars by providing an alternative data querying method alongside the existing DataFrame API.
Example Usage:
const { DataFrame, SQLContext } = require('node-polars');
const df = new DataFrame({
title: ["The Godfather", "The Dark Knight", "Schindler's List", "Pulp Fiction", "The Shawshank Redemption"],
release_year: [1972, 2008, 1993, 1994, 1994],
budget: [6000000, 185000000, 22000000, 8000000, 25000000],
gross: [134821952, 533316061, 96067179, 107930000, 28341469],
imdb_score: [9.2, 9.0, 8.9, 8.9, 9.3]
});
const ctx = new SQLContext({ films: df });
// Execute a SQL query against the registered frame data
const result = ctx.execute(`
SELECT title, release_year, imdb_score
FROM films
WHERE release_year > 1990
ORDER BY imdb_score DESC
`, { eager: true });
console.log(result);
>>> shape: (4, 3)
┌──────────────────────────┬──────────────┬────────────┐
│ title ┆ release_year ┆ imdb_score │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════════════════════════╪══════════════╪════════════╡
│ The Shawshank Redemption ┆ 1994 ┆ 9.3 │
│ The Dark Knight ┆ 2008 ┆ 9.0 │
│ Schindler's List ┆ 1993 ┆ 8.9 │
│ Pulp Fiction ┆ 1994 ┆ 8.9 │
└──────────────────────────┴──────────────┴────────────┘
In this example, we create a DataFrame df with movie data and register it with an SQLContext ctx. We then execute an SQL query against the registered DataFrame, filtering movies released after 1990 and ordering them by IMDb score in descending order. The result is stored in result, and its shape is logged to the console.