databricks/cli

clusters list -> endless scroll

marcin-sg opened this issue · 1 comments

Describe the issue

Databricks CLI v0.226.0 introduced changes to the way command clusters list works. I returns now information about all clusters that run in last 30 days. With no possibility of filtering. Which takes 6 minutes in our relatively small setup.

Our ci/cd pipeline is suppose to find a cluster with a given name, start it if it is stopped and run some unit tests there. Adding 6 minutes is a serious drawback. Is there any way of getting the cluster id for given name faster?

Please let me know if you need more informations.

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  1. Run databricks cluster list
  2. Wait 6 min for command to return 8347 results...

Expected Behavior

Seems that API is also not able to filter by name. But filter for pinned cluster would do.

For me providing parameter to filter using cluster name would be perfect. I could also live with filtering results just to get pinned clusters.

Actual Behavior

Apparently using --page-size can reduce the waiting time (100 is max) - however it still makes us make more than 80 calls to the api instead of one.

time databricks clusters list --page-size 100 | wc -l
8346
databricks clusters list --page-size 100 1.80s user 0.24s system 3% cpu 57.896 total
wc -l 0.01s user 0.06s system 0% cpu 57.895 total

OS and CLI version

uname -a
Linux codespaces-0f8790 6.5.0-1022-azure #23~22.04.1-Ubuntu SMP Thu May 9 17:59:24 UTC 2024 x86_64 GNU/Linux

databricks -v
Databricks CLI v0.226.0

Is this a regression?

Command had different behavior in previous versions (v0.225.0)

Debug Logs

Thanks for raising this. We're looking into it and I will keep this thread updated