iamseth/oracledb_exporter

Tablespace calculation

neilschelly opened this issue · 10 comments

The Oracle DBAs at my org have requested a new query to use to track tablespace utilization. It has significantly different metrics for us, but they appear a lot more useful. This matches the numbers that are reported to them in the Oracle Enterprise Manager. The dba_tablespace_usage_metrics table should be available in Oracle 10g and up.

SELECT
    dt.tablespace_name as tablespace,
    dt.contents as type,
    dt.block_size * dtum.used_space as bytes,
    dt.block_size * dtum.tablespace_size as max_bytes,
    dt.block_size * (dtum.tablespace_size - dtum.used_space) as free
FROM  dba_tablespace_usage_metrics dtum, dba_tablespaces dt
WHERE dtum.tablespace_name = dt.tablespace_name
ORDER by tablespace

Here's what's different for us:

  • The existing query will show both an undotbs1 and undotbs2 tablespace, with the second one at 0%. I'm told that second one will begin to show up in dba_tablespace_usage_metrics as soon as it's used, like when activating a second node or assigning it as an active undo_tablespace.
  • The numbers are way different, but hopefully way more useful. The existing query is aggregating the size and max_size of all the files in each tablespace. This method of capturing them is measuring by the number of extents used by those file objects, but that is going to be much higher than the actual space utilization. Even when a file gets smaller, those extents are available for it to grow into without consuming more space.
  • The query above is actually reporting on blocks (hence the conversion to bytes) that are actually in use. It won't count unused extents that haven't been deallocated from a given file, even though they will be either deallocated or re-used if needed.

I'm running both the old and new queries in parallel in our environments now for a week to compare, and I expect I'll switch over to the new one entirely. I can provide a better glimpse of how different it is then, but I suspect it won't be close (especially for tmp/undo/system tablespaces). I'd be happy to provide this as a PR to update the query used in default-metrics.conf, but I was hesitant to do it without knowing how to handle such a "breaking" change, where a lot of user thresholds based on the old calculation may be inappropriate on the new calculation.

Thanks for your feedback!

In fact, it could be a good idea to give this feedback. If you agree with that, we can then add a deprecation warning on the old metrics and then switch to the new one.

Maybe we need to be able to select this by using tags or something similar.

After a couple days of running both queries in parallel, here are my observations:

  • undo_tbs1 looks like it shows a lot more "activity" in that tablespace with the new query (yellow, negative-y) as oppose to the old query (green,positive-y):
    image
  • Most tablespaces don't really change their utilization much over 3 days, but for those that do, both metrics register a change in the same direction. The amount of the change is lower with the new query:
    image
    image
    • that second one is not even a pixel of change, but I promise the values were stable before and after the bump, and it did go up.
  • The relative sizes of tablespaces are similarly comparable between the two methods, but I can't come up with a decent single picture to show that clearly. The actual utilization percentages tend to be lower with the new calculation.
  • The utilization percentages are much more closely aligned with our own tablespaces, but the "system" tablespaces are much smaller with the new calculation. The smaller numbers seem more accurate to me.
    • SYSAUX (old query) 47%, (new query) 12%
    • SYSTEM (old query) 50%, (new query) 5%
    • TEMP (old query) 68% (new query) 0.0015%

@Yannig as a more direct reply to your questions above, I'm okay with either approach (deprecation or tags). I can definitely prepare a PR according to whichever approach you think is best.

I think we can use the tag approach and use them to switch easily between the different queries and use reserved word for the deprecation messages: if a request have a tag deprecated, we send a deprecation warning in the output. We can also add an option to disable this kind of warning.

This way, we can handle tags easily and be able to disable deprecated request using tag filter.

Alright, well, we've got a PR now: #154
But that doesn't maintain two separate docker tags. More pending...

Okay, #154 should be ready for merge. The already-existing tags will create the image with the default-metrics file including the new tablespace calculation, while the _legacy-tablespace versions will use the old calculationns. The README has been updated to explain the different tags.

@Yannig how do you feel about the PR for this? Is there anything I can do to help push it along or make it easier to polish?

Hi @neilschelly and happy new year!

Sorry for the delay, I took to long to get you feedback about this PR ...

To be honest, I had thought of a different approach but I realize that I don't have the time to do it. After all, your approach is simple and the merit of not adding any code. So if you are ok with that, I will merge it

Regards,

That would be much appreciated. If you'd like help trying to build some alternative approach, I am game for taking an attempt on that too if I can help.
-Neil

Fine. So let's merge you PR. I will try to comment this issue to explain myself on what I was thinking about.