delta-io/connectors

Power BI Connector: Delta Sharing - Unable to work with tables stored in SSE-C encrypted S3

ThachNgocTran opened this issue · 1 comments

Important Note: I don't know if this is the right place to post bugs related to the Power BI's Delta Sharing Connector. That is the one here: (Get-Data dialog in Power BI Desktop v2.110.805.0 (October 2022))

image

In contrast, the Power BI's Connector for Delta Lake mentioned in https://github.com/delta-io/connectors/tree/master/powerbi seems to be a different one, whereby function fn_ReadDeltaTable() is used to load Delta Lake table.

Reproducibility

  1. Save a Delta Lake table test on S3 with SSE-C as the encryption.
  2. Configure Delta Sharing Server as followed: delta-sharing-server.yaml
# The format version of this config file
version: 1
# Config shares/schemas/tables to share
shares:
- name: "deltalake"
  schemas:
  - name: "testschema"
    tables:
    - name: "test"
      location: "s3a://testfolder/testschema/test"
# Set the host name that the server will use
host: "[some_host]"
# Set the port that the server will listen on. Note: using ports below 1024 
# may require a privileged user in some operating systems.
port: 60000
# Set the url prefix for the REST APIs
endpoint: "/delta-sharing"
# Set the timeout of S3 presigned url in seconds
preSignedUrlTimeoutSeconds: 3600
# How many tables to cache in the server
deltaTableCacheSize: 10
# Whether we can accept working with a stale version of the table. This is useful when sharing
# static tables that will never be changed.
stalenessAcceptable: false
# Whether to evaluate user provided `predicateHints`
evaluatePredicateHints: false

# Authorization
authorization:
  bearerToken: [some_bearerToken]

The core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.s3a.access.key</name>
    <value>[access.key]</value>
  </property>

  <property>
    <name>fs.s3a.secret.key</name>
    <value>=[secret.key]</value>
  </property>

  <property>
    <name>fs.s3a.endpoint</name>
    <value>[some.endpoint]</value>
  </property>

  <property>
    <name>fs.s3a.server-side-encryption-algorithm</name>
    <value>SSE-C</value>
  </property>

  <property>
    <name>fs.s3a.server-side-encryption.key</name>
    <value>[some.key]</value>
  </property>

</configuration>
  1. Run the server as followed: ./bin/delta-sharing-server -J-Xmx2048m -- --config ./conf/delta-sharing-server.yaml
  2. In Power BI Desktop, start a Get-Data with Blank Query, in Advanced Editor, type in:
let
    Source = DeltaSharing.Contents("[some_host]:60000/delta-sharing"),
    deltalake = Source{[Name="deltalake"]}[Data],
    testschema = deltalake{[Name="testschema"]}[Data],
    test = testschema{[Name="test"]}[Data]
in
    test
  1. An Error is raised by Power BI Desktop: Expression.Error: Access to the resource is forbidden.

image

There is no ERROR/EXCEPTION in Delta Sharing Server console.

Debug

It looks like the Delta Sharing Server successfully authenticated the request from Power BI, read and signed the Delta Lake table files. But when receiving the urls, Power BI is unable to fetch the files because they are encrypted, e.g. with SSE-C.

The Power BI Connector probably lacks necessary HTTP headers to denote that the files are encrypted and AWS should decrypt the files first before sending back to the client. See more Presigned URLs and SSE-C.

Currently Delta Sharing doesn't support SSE-C. Feel free to open a feature request in https://github.com/delta-io/delta-sharing