erikdarlingdata/DarlingData

sp_HumanEvents: logging the one-off runs to table

BrentOzar opened this issue · 5 comments

Is your feature request related to a problem? Please describe.
I have a server doing ~10k batch requests/sec and getting a good chunk of recompiles for a few different reasons. As the developers work through the problems over time, they want to be able to periodically check something to see, "What are still some of the biggest offenders today?"

They don't have permissions to run jobs manually, and we'd rather not give them permissions to run XE directly, and because there's a huge amount of recompiles, we don't really wanna leave the XE session open or deal with starting/stopping it programmatically.

Describe the solution you'd like
I wanna be able to run this in an Agent job periodically (like different times of day based on workload, say 10AM and 2PM):

EXEC sp_HumanEvents 
    @event_type = 'recompiles', 
    @seconds_sample = 60, 
    @output_database_name = N'PerformanceData',
    @output_schema_name = N'dbo',
    @output_table_name = N'HumanEvents_Recompiles_Samples'

And get just the summarized data. They're also a SaaS company with multiple servers, so I was thinking of the table having columns for ID, ServerName, and CheckDate like we do with sp_BlitzCache and friends.

For recompiles, I think I can code it around the output query immediately AFTER this line:

IF @debug = 1 BEGIN SELECT N'#recompiles_1' AS table_name, * FROM #recompiles_1 AS r; END;

The work involved:

  • Add an @output_table_name parameter - this is a little tricky because I think people are gonna think it's related to the other existing outputs. They ONLY have to pass this in if they're thinking about writing the one-time analysis to disk.
  • In the output section, if all the output params are passed in, check for the existence of the table. If it exists, but has the wrong columns, fail with an error.
  • If it doesn't exist, create it starting with columns for ID bigint, ServerName, CheckDate (same style as sp_Blitz%.)
  • Use dynamic SQL for the output query. If @output_table_name is not null, use an INSERT INTO for the output. Only run the insert, don't select the data back out. Return a debug message saying the data was inserted.

For testing, I'll also need to make sure it still works with the existing long-term sampling too.

Describe alternatives you've considered

  • I tried using the openquery approach to dumping a stored proc's result set to table, but it fails due to the complexity of the proc.
  • (edit, added:) I tried setting up the permanent XE session and using the existing log-to-table functionality, thinking I'd start the session only periodically, but we got an avalanche of data even in 10 seconds, so I wanted to avoid the risk of my stop-XE-session not working for some reason, and blowing up the logging database.
  • For temporary purposes for the time being, I've edited sp_HumanEvents on these servers to do a select-into into a global temp table for the recompiles branch of the output, using a new table name with the date/time each time it runs.

Are you ready to build the code for the feature?
Yep! I'm totally open to other approaches on the code too - I haven't written anything for it yet, wanted to talk through the design first. If you're cool with it being in sp_HumanEvents, once the design is nailed down, I can code the recompiles part, make sure everybody's happy with that, and then do it for the other event types.

(Edit: I don't need it for the other event types, but it's one of those things where, if I'm in the code already and in the zone, it's not much more work to do the other event types as well just to be done with it.)

Oh this would be really cool! I'm into it.

I think an easy way to differentiate the parameter intentions would be to call them @sampled_output_table_name and @sampled_output_schema_name. I guess probably a database one, too. Don't worry about adding those to the help section, I'll do that after it's in.

This might be a fair pain in the ass for some of the events that have multiple results in the output. The one you're starting with has the compilation events and the parameterization events tables in the output. Not that you don't know that, just saying it out loud.

An easier example is the waits output, which breaks stuff down by total, total by database, total by database and query, but they all work off the same temp table. For that there's no need to log all three results, since they're derivative of the main result.

Thanks @BrentOzar!

OK, cool.

  • About the @SampleD params, makes sense, way less confusing.
  • About the multiple tables - great point there, and come to think of it, maybe no table name parameter at all. Maybe we use a similar set of table names as the existing HumanEvents output tables, but with _Sampled at the end of the name.

Oh, yeah, the only reason I can think of for a custom table name would be to add a date to the end, but with a run date in the table that's less valuable and way more typing.

@BrentOzar are you still planning on submitting a PR for this?

No, I'll go ahead and close it. Sorry about that!