Making unit tests independent of our server
Closed this issue · 6 comments
Currently, the tests for Presto, Impala, and Redshift depend on external services. We want to make them solely run in Docker environments.
I took a look at the presto docker image: https://hub.docker.com/r/starburstdata/presto/
By default, the docker image comes with tpc-h (and also tpc-ds) read-only catalogs with different scale factors, so for unit tests that utilize tpc-h queries, it looks like we can simply use the data inside the image without having to generate them.
A potential problem I see with this image though is that we need to use memory catalog for writing (e.g., creating scrambles), but I am pretty sure it has a size limit (a doc I found says that default is 128MB -- and could not find any convenient way to change it), so I guess we should be careful that none of unit tests writes more than this limit.
Also, due to the data that the image contains by default, it seems like the image itself is about >1GB, so it might add up to startup time in each CircleCi test run.
I do not have a good idea on Impala and Redshift at the moment.
Thanks for investigation. I think things look good in general, except for the fact that the image size is > 1GB. Maybe, we can create our own docker image later (in the far future).
If presto seems to be working, please add a docker command in dev setup wiki page (in our private dev repo).
I have not tested running our presto unit tests on the docker image yet. I will work on it and update the progress accordingly.
Can you share any updates?
I am almost there to make a pull request of first draft for running presto unit tests locally. I will write the details of change/issues in that pull request.
Although you may already know, this image is using presto's built-in tpch connector that generates the data on the fly. Reference: https://teradata.github.io/presto/docs/0.167-t/connector/tpch.html