aws-samples/spark-on-aws-lambda

Spark on AWS Lambda with Glue Catalog Integration

guischuwarten opened this issue · 13 comments

Hi, good morning!

how are you?

I tried read a database from glue catalog, but I can’t read.

When I try to execute spark.sql(“show databases”).show() just return me default but I have many databases:

bash-4.2# pyspark --master local[*] --conf spark.hadoop.hive.metastore.client.factory.class="com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory" Python 3.8.16 (default, Mar 31 2023, 16:49:43) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/04/12 12:57:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ / / '_/
/
/ .__/_,// //_\ version 3.3.0
/
/

Using Python version 3.8.16 (default, Mar 31 2023 16:49:43)
Spark context Web UI available at http://
Spark context available as 'sc' (master = local[*], app id = local-).
SparkSession available as 'spark'.

spark.sql("show databases").show()
+---------+
|namespace|
+---------+
| default|
+---------+`

do you have any ideia what might be happening?

do you grant the acces to glue to the current role of the lamnda ?

@guischuwarten We are considering implementing Glue jars in SoAL framework. We have a working prototype and waiting to test it @jassinancy is working on the prototype. The license will include Amazon 2.0

Check this branch out
https://github.com/aws-samples/spark-on-aws-lambda/tree/gluelib-catalog-integration

do you grant the acces to glue to the current role of the lamnda ?

Hi,

Yes, my role has authorization.

@guischuwarten We are considering implementing Glue jars in SoAL framework. We have a working prototype and waiting to test it @jassinancy is working on the prototype. The license will include Amazon 2.0

Check this branch out

https://github.com/aws-samples/spark-on-aws-lambda/tree/gluelib-catalog-integration

Hi,

Thank you so much.

I will go to test this prototype and I will go back with a feedback to you.

Thanks @guischuwarten for reaching out and your offer to test the solution.
This is a first commit for the prototype and there would be additional stuff coming as improvement with respect to image size and performance.

Meanwhile do let us know your initial feedback.

do you grant the acces to glue to the current role of the lamnda ?

permission is also required for the AWS Lambda to access Glue catalog

Thanks @guischuwarten for reaching out and your offer to test the solution.

This is a first commit for the prototype and there would be additional stuff coming as improvement with respect to image size and performance.

Meanwhile do let us know your initial feedback.

Guys,

I took your code and made some changes and finally worked.

Can I share with you?

Great can you let me know what issues you were facing and what changes were
made .
This is a work in progress so you might see some lib missing, would be
interested to know the issues.
Thanks,
Jaswinder Singh

On Tue, 18 Apr 2023 at 3:06 AM, guischuwarten @.***>
wrote:

Thanks @guischuwarten https://github.com/guischuwarten for reaching out
and your offer to test the solution.

This is a first commit for the prototype and there would be additional
stuff coming as improvement with respect to image size and performance.

Meanwhile do let us know your initial feedback.

Guys,

I took your code and made some changes and finally worked.

Can I share with you?


Reply to this email directly, view it on GitHub
#30 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AF2QFXZ24BJ6DEJFRPIVZTDXBWICFANCNFSM6AAAAAAW3DVDFI
.
You are receiving this because you were mentioned.Message ID:
@.***>

Hi,

Sorry for delay...

Can I push to this branch?

I'm getting this error:

Java.lang.NoClassDefFoundError: org/Apache/hadoop/hive/metastore/conf/MetastoreConf

Can you help me?

Please create a PR and our team can review it.

Could you please the error screenshots?

@guischuwarten Can you please try the release-0.3.0 branch, under spark-script, we showed an alternate way to connect to AWS Glue catalog