ankane/ruby-polars

Add `read_database_uri(...)` function binding

DeflateAwning opened this issue · 6 comments

Would be awesome to be able to use Polars.read_database_uri(...) to read directly from a database URI.

Reasoning

  1. Compliance with the full API
  2. Good for quick hacking, testing against a temporary database, etc.
  3. Benchmarking the speed of the Rust connectorx driver, vs. the Ruby driver, especially for very large queries.

Notes

Python docs: https://docs.pola.rs/py-polars/html/reference/api/polars.read_database_uri.html#polars.read_database_uri

I think this is a fairly straight-forward binding. Would you be able to take a quick look at it? :)

You can achieve similar thing with the following:

    @qdb = ActiveRecord::Base.establish_connection("postgres://#{qdb_url}")
    @req = @qdb.connection.select_all("select count from events;")
      (90.6ms)  select count from events;
    => 
    #<ActiveRecord::Result:0x000000010ce76c38
    
    polars = Polars.read_database(@req)
    => 
    shape: (1, 1)

Ruby is still in the hot path of that, though. I'm aiming to keep Ruby out of the picture.

You can also try this alternative; however, you'll need to use the PG gem.

    conn = PG.connect( host: ENV['DB_HOST'], port: ENV['DB_PORT'], dbname: ENV['DB_NAME'], user: ENV['DB_USER'], password: ENV['DB_PASS'] )
    query = "select timestamp,count() FROM events;"
    req = conn.exec(query)
    data = req.to_a
    return Polars::DataFrame.new([]) if data.size == 0
    
    df = Polars::DataFrame.new(data)
    df = req.with_columns([
      Polars.col('timestamp'),
      Polars.col("count").cast(Polars::Int32),
    ])

Of course, connecting Polars directly to the database URI would be nice.
Until then, these are the alternatives I'm using on my projects.

If you want to keep Ruby out of the picture, why not use the official Python API?