spaghettidba/WorkloadTools

Unable to capture Baseline data

rreggie-rmd opened this issue · 28 comments

While replaying in our test environment we are unable to capture the baseline data for comparison with our production load. Can you elaborate where exactly the baseline data gets written to. Although creating the schema a head of time still it is not capturing the data.

https://spaghettidba.com/2019/02/15/benchmarking-with-workloadtools/
in the above link , I am trying to understand difference between SQL WorkLoad A and SQL WorkLoad B. As per the example config Json file I am unable to write it to the baseline schema. Any help is greatly appreciated.

Can you please share your (redacted) JSON files?
It would also be helpful if you could describe what you are trying to achieve. What I understand from your messages is that you want to replay a (previously captured?) workload to a test environment, to compare the workload analysis data with what you already have captured in production. Is this correct?
The workload analysis data gets written wherever you find appropriate: the production server (I would not do that), the test server (a possible good choice), your laptop (also a good choice). Where are you attempting to save the workload analysis data?

{
"Controller": {

    "Listener":
    {
        "__type": "ExtendedEventsWorkloadListener",
        "ConnectionInfo":
        {
            "ServerName": "ProdServer",
            "DatabaseName": "ProdDB",
            "UserName": "sa",
            "Password": "P4$$w0rd!"
        },
        "DatabaseFilter": "SourceDatabase"
    },

    "Consumers":
    [
        {
            "__type": "ReplayConsumer",
            "ConnectionInfo":
            {
                "ServerName": "TestServer",
                "DatabaseName": "ProdDB_Copy",
                "UserName": "sa",
                "Password": "Pa$$w0rd!"
            }
        },
        {
            "__type": "AnalysisConsumer",
            "ConnectionInfo":
            {
                "ServerName": "TestServer",
                "DatabaseName": "SQLWorkLoad",
                "SchemaName": "baseline",
                "UserName": "sa",
                "Password": "P4$$w0rd!"
            },
            "UploadIntervalSeconds": 60
        }
    ]
}

}

if you look at the above config which is SQLWorkload A , in the Listener component is my production and in Consumer Replay i have a copy of prod backup restored and running replay against it and in analysis If my understanding is correct the it is the stats of the production workload . Is my understanding correct ?

image

Or can you give me sample config files for the scenario in Picture I pasted.

If my understanding is correct the it is the stats of the production workload . Is my understanding correct ?

Yes, this is correct. If you're not seeing the data in test server, SqlWorkload database and baseline schema then something is wrong and the logs might have some relevant info.

If you want to record the performance of the replay on the test server, then you need a second sqlworkload.exe running at the same time as the capture/replay (the one for which you already posted the JSON configuration).
The configuration of the second sqlworkload should look as described in this blog post: https://spaghettidba.com/2020/03/03/performing-a-real-time-replay-with-workloadtools/

Hope this makes sense!

I am able to capture the sqlworkload 2 replay analysis via second command workload exe. But I am not able to write the prod baseline in the first command line workload exe. which is as below

"Listener":
{
"__type": "ExtendedEventsWorkloadListener",
"ConnectionInfo":
{
"ServerName": "ProdServer",
"DatabaseName": "ProdDB",
"UserName": "sa",
"Password": "P4$$w0rd!"
},
"DatabaseFilter": "SourceDatabase"
},

"Consumers":
[
    {
        "__type": "ReplayConsumer",
        "ConnectionInfo":
        {
            "ServerName": "TestServer",
            "DatabaseName": "ProdDB_Copy",
            "UserName": "sa",
            "Password": "Pa$$w0rd!"
        }
    },
    {
        "__type": "AnalysisConsumer",
        "ConnectionInfo":
        {
            "ServerName": "TestServer",
            "DatabaseName": "SQLWorkLoad",
            "SchemaName": "baseline",
            "UserName": "sa",
            "Password": "P4$$w0rd!"
        },
        "UploadIntervalSeconds": 60
    }
]

}

What happens? Do you get an error? Do you have a log file?

nothing happens I even see a connection but nothing gets written. I restarted the server where the workload tool is installed, it still does not work.

"__type": "AnalysisConsumer",
"ConnectionInfo":
{
"ServerName": "TestServer",
"DatabaseName": "SQLWorkLoad",
"SchemaName": "baseline",
"UserName": "sa",
"Password": "P4$$w0rd!"
},
"UploadIntervalSeconds": 60

This part does not work.

So sqlworkload.exe outputs absolutely nothing on stdout? That's weird.
The only thing I can think of is that it can't cope with the load. Is the process eating up a lot of RAM/CPU?

nope . Memory utilization is only 30 GB tops and CPU is about 30 - 50 % utilization. Do you want me to try running the Anylze component separately for prod load as a seperate cmdline?

nope . Memory utilization is only 30 GB tops and CPU is about 30 - 50 % utilization.

sqlworkload.exe uses 30 GB of RAM? That's a lot of memory! How many batches per second?

Do you want me to try running the Anylze component separately for prod load as a seperate cmdline?

This may help. I would rather focus on why you get so much load and maybe add some filters to the capture, in order to reduce it.

the load is pretty much high on prod. The whole idea we are doing this is we want to analyze the impact of moving from Nvme local storage to SAN

OK, makes sense. However capturing everything will likely choke sqlworkload to death. I would start with a subset of the workload (filter by host or application or whatever makes sense) and work from there. I'm pretty sure you will get the numbers you're looking for even with a filtered workload

still the same bud .. no out put even tried directly just to capture the baseline load it does not work ..

Have you tried to filter the workload? Or did you try to capture the whole thing?

i tried both , on a specific DB and on whole ... still the same ..

does it require clr enabled?

does it require clr enabled?

Nope

i tried both , on a specific DB and on whole ... still the same ..

OK, I would try to limit the workload as much as possible to understand what is going on. Try to limit to one single client host, like your computer. It is not going to be useful, but it will at least confirm it is something related to the size of the workload.

Hi @spaghettidba, I'm helping @rreggie-rmd with this problem.

It seems that the issue we're seeing is that whenever using the Analysis consumer some of our queries are choking on a regex that was recently added to SqlTextNormalizer.cs _paramNameValue. I discovered this by adding logging at point throughout the code - my C# isn't the best ;).

It appears that when we run sqlworkload that none or very few records are added to the workload database. When I take that regex definition and the replace out and recompile the Analysis consumers process fine.

Here is an example of the SQL code that it was hanging on, possibly the large number of parameters is confusing the regex:
declare @p2 int set @p2=2 declare @p13 int set @p13=1234567 exec Invoice_Insert @ClientNo=123456,@SeqNo=@p2 output,@Desc=N'2022 XX',@Status=N'D',@InvoiceNo=0,@Disburse=12.34,@GST=1.32,@PST=3.00,@PSTProv=N'SK',@UpdateBy=1234,@HST=0,@IsReversal=0,@InvoiceKey=@p13 output,@DatePrinted=N'',@Interim=0,@Locked=0,@FromProfile=0,@Retainer=0,@Address=N'',@HSTBalance=0,@GSTBalance=0,@ReversedInvoiceKey=0,@Language=N'en',@TotalForMiscDisb=250.00,@ParentInvoiceKey=default,@eInvoice=0,@ClientProv=N'SK',@TaxOverrideReason=N'',@TaxOverrideFlag=0,@SensitiveFlag=0,@AcknowlegdeRealization=default,@AcknowlegdeRealizationReason=default,@AcknowlegdeRealizationDateTime=default,@AcknowlegdeRealizationCompletedBy=default select @p2, @p13

I still see a few discrepancies between a live capture (XE listener to workload file and analysis consumer) and another (workload file listener to analysis consumer) but one thing at a time...

Hey @ddubick, thank you very much for pointing this out! I'm working on a fix right now and will keep you posted.
Thanks a lot!

OK, could you please test this release and see if it works for you? https://github.com/spaghettidba/WorkloadTools/releases/tag/v1.6.2

Please keep me posted! :)

result.NormalizedText = _paramNameValueStr.Replace(result.NormalizedText, "@${paramname} = {NUM}");
Was that supposed to be using the _paramNameValueNum that you added?

Whoops...
Thanks for spotting it.
Release updated

great collaboration guys thank you .. !!!! @ddubick @spaghettidba

Hey! Any updates? Is this working for you now? If so, I could realese the fix to the public

Yes sorry, seems to be working fine. Thank you!