krakjoe/stat

[help] - connecting and retrieving data

nikola-b opened this issue · 18 comments

I'm not sure if this is a P.I.C.N.I.C. error or what, but I decided to check with you if I'm doing something wrong.

Problem

I receive the same data, infinitely.

Stat configuration

$ cat /etc/php/7.3/mods-available/stat.ini
zend_extension=stat.so
stat.stream=tcp://127.0.0.1:8010

I create a stream socket, it connects and then it receives the same data infinitely, over and over.

Sample PHP client

I used ZeroMQ, prior I used stream_socket_client but the result is the same - I receive repeating data.

<?php

$context = new ZMQContext();
$poll = new ZMQPoll();

$stat = $context->getSocket(ZMQ::SOCKET_STREAM);
$res = $stat->connect("tcp://127.0.0.1:8010");

$id = $poll->add($stat, ZMQ::POLL_IN);

$readable = [];
$writable = [];

while(true)
{
    $events = $poll->poll($readable, $writable, -1);

    if($events > 0)
    {
        foreach($readable as $read)
        {
            $msg = $read->recvMulti();

            file_put_contents('output', print_r($msg, true), FILE_APPEND);
        }
    }
}

Output

I cut this short because it starts repeating. After a while (20ish seconds) I start seeing this line being read from the socket:

{"request": {"pid": 17550, "elapsed": 69.4799096940, "method": "GET", "uri": "/index.php", "query": "

And its just that. As long as I'm connected, I keep seeing the same data.

Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {
)
Array
(
    [0] =>  �j��
    [1] => "pid": 
)
Array
(
    [0] =>  �j��
    [1] => 17550, "elapsed": 
)
Array
(
    [0] =>  �j��
    [1] => 69.4799096940, "method": "
)
Array
(
    [0] =>  �j��
    [1] => GET
)
Array
(
    [0] =>  �j��
    [1] => "
)
Array
(
    [0] =>  �j��
    [1] => , "uri": "
)
Array
(
    [0] =>  �j��
    [1] => /index.php
)
Array
(
    [0] =>  �j��
    [1] => "
)
Array
(
    [0] =>  �j��
    [1] => , "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {
)
Array
(
    [0] =>  �j��
    [1] => "request": {
)
Array
(
    [0] =>  �j��
    [1] => "pid": 
)
Array
(
    [0] =>  �j��
    [1] => 17550, "elapsed": 
)
Array
(
    [0] =>  �j��
    [1] => 69.4799096940
)
Array
(
    [0] =>  �j��
    [1] => , "method": "
)
Array
(
    [0] =>  �j��
    [1] => GET
)
Array
(
    [0] =>  �j��
    [1] => "
)
Array
(
    [0] =>  �j��
    [1] => , "uri": "
)
Array
(
    [0] =>  �j��
    [1] => /index.php
)
Array
(
    [0] =>  �j��
    [1] => "
)
Array
(
    [0] =>  �j��
    [1] => , "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {
)
Array
(
    [0] =>  �j��
    [1] => "pid": 
)
Array
(
    [0] =>  �j��
    [1] => 17550, "elapsed": 69.4799096940, "method": "GET", "uri": "/index.php
)
Array
(
    [0] =>  �j��
    [1] => ", "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {"pid": 17550, "elapsed": 69.4799096940, "method": "GET", "uri": "/index.php", "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {"pid": 17550, "elapsed": 69.4799096940, "method": "GET
)
Array
(
    [0] =>  �j��
    [1] => ", "uri": "/index.php", "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {"pid": 17550, "elapsed": 69.4799096940, "method": "GET", "uri": "/index.php", "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => 
)
Array
(
    [0] =>  �j��
    [1] => {"request": {"pid": 17550, "elapsed": 
)
Array
(
    [0] =>  �j��
    [1] => 69.4799096940, "method": "GET", "uri": "/index.php", "query": "
)
Array
(
    [0] =>  �j��
    [1] => 
)

Questions

Does stat ever stop streaming? Using tcpdump -i any port 8010 I can see that while I'm connected, I'll be getting data which appears to be repeating.

Is my approach wrong? I know I showed the example with ZeroMQ but I get the same result with PHP's socket functions.

Can you provide a minimal, complete, executable, reproducing example, please ?

Is the process that connects to stat also running stat (on the same address maybe) ?

stat never stops streaming while the master process is running.

PHP reproducible example without ZMQ

<?php

$socket = stream_socket_client("tcp://localhost:8010", $errno, $errstr, 30);

stream_set_blocking($socket, true);

while(true)
{
    $result = stream_get_contents($socket);

    var_dump($result);

    sleep(1);
}

I run this from CLI. I was under the impression I'm supposed to create a small server that reacts when stat emits the data and then I can feed that data further (to a db, file etc.)

Once this server is up, I just load a page handled by php-fpm which includes stat and that triggers the profiling, feeds my server with data.

Stat setup

Stat is loaded by PHP-FPM only, not by CLI. I used the CLI script to connect to the socket.

sudo lsof -i :8010

COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
php-fpm7. 22171    root    4u  IPv4 309586      0t0  TCP localhost:8010 (LISTEN)
php-fpm7. 22191 vagrant    4u  IPv4 309586      0t0  TCP localhost:8010 (LISTEN)
php-fpm7. 22192 vagrant    4u  IPv4 309586      0t0  TCP localhost:8010 (LISTEN)

Once I started my server, I noticed CLOSE_WAIT so I went and connected to stat from http context (namely - Laravel's route).

Code used with Laravel

Route::get('/api/stat', function()
{
    $socket = stream_socket_client("tcp://localhost:8010", $errno, $errstr, 30);

    stream_set_blocking($socket, true);

    while(true)
    {
        $result = stream_get_contents($socket);

        if(false === $result || 0 === strlen($result))
        {
            //echo "\nstream_get_contents() === false";
            break;
        }
        else
        {
            $data[] = $result;
        }
    }

    print_r($data);
});

output:

Array ( [0] => {"request": {"pid": 22592, "elapsed": 10.4831088200, "method": "GET", "uri": "/index.php", "query": " )

That's all I get. It's cut off at the "query": " part. I stopped and started PHP-FPM, hoping to clear anything that might be cached. The result is always the same - instead of JSON with the schema from this repository, I get incomplete line that's cut off at the "query": " part.

Am I supposed to create a CLI server? Am I supposed to connect to stat from script ran by PHP-FPM?

It should look more like this:

<?php
$socket = stream_socket_client(
    "tcp://127.0.0.1:8010", $errno, $errstr, 30);

while(($line = fgets($socket)))
{
    $result = json_decode($line);

    var_dump($result);
}

You can't use stream get contents, as eof is never explicitly sent. I can't really explain the behaviour you saw, but it's down to incompatibility.

There's no need to create a cli server, and you can connect to stat from anywhere, but connecting from a script that stat is currently profiling is obviously not a good idea.

Using the example you posted, I'm still seeing the same result.

Route::get('/api/stat', function()
{
    $socket = stream_socket_client("tcp://127.0.0.1:8010", $errno, $errstr, 30);

    $result = [];

    while(($line = fgets($socket)))
    {
        $result[] = $line;
    }

    print_r($result);
});

Output:

Array ( [0] => {"request": {"pid": 22680, "elapsed": 3.4957535790, "method": "GET", "uri": "/index.php", "query": " )

Reloading the page I use to trigger the stat does make it show different phpfpm child pid:

Array ( [0] => {"request": {"pid": 22681, "elapsed": 116.3712057360, "method": "GET", "uri": "/index.php", "query": " )

If I run a simple script like http://localhost/test.php, I see the same exact line being sent by stat which is cut off at the "query": " part.

I can test on different versions of PHP. I do run this in a virtual machine, but if that were the problem then I can't see how stat would be useful if ran on cloud environment (I doubt it's the VM).

I'm very confused about what you are doing:

Stat is loaded by PHP-FPM only, not by CLI. I used the CLI script to connect to the socket.

Then you said:

so I went and connected to stat from http context (namely - Laravel's route).

If the laravel code you have posted is in the same application and being run by the same fpm pool as the code you are trying to profile, then that explains why the buffer never empties.

The code that is connecting to stat should not be run in the same pool that is running stat.

The sample being cut at the query part sounds like an actual bug, but I can't investigate it until you can reproduce it in a way that stat is supposed to support.

Oh, I think I see your confusion ...

You seem to be creating the laravel route because you're under the impression that opening the socket is what starts the profiling ?

Profiling starts automatically, all you need to do in the target pool is load stat. In some other process you can just connect to the socket, this other process must not be in the target pool.

I'm under impression that anything that PHP-FPM does gets profiled by stat and ready to transmit, which means I just need to connect and read what's sent. I did scout the code and read what you wrote a few times over before opening the issue, I don't want to waste anyone's time if I can figure it out on my own :)

Readme mentioned a real time stream socket so I immediately went and created a server, thinking it'd get what stat streams and once stat is done - no more data on the socket (wrong, you cleared that out).

After checking in with you, I connected to stat from http context using entirely different PHP version which doesn't have stat loaded at all, so I don't trigger profiling. I got the incomplete JSON and this is the bit where I posted Laravel related code.

Stat runs on PHP 7.3, I use PHP 7.2 to read the output. PHP 7.3 script reads the database and outputs JSON. My thinking was: Ok, 7.3 will get profiled, I'll connect with 7.2 and read the output - but I keep getting the incomplete line, so I thought I'd check in with you as you might have some insight about what goes wrong.

It seems that whatever I do, I'll get incomplete JSON, as if there's something that breaks the whole thing due to a special character.

Okay, we seem to understand each other ...

I'm investigating ...

If possible, and you have the time and patience, a docker setup that reproduces would be helpful ...

Time and patience I have, I can't promise the docker thingy right away but I'll get on it and provide it.

Out of curiosity, you have stat running completely fine on your setup?

Time and patience I have, I can't promise the docker thingy right away but I'll get on it and provide it.

No particular rush, and thanks for the effort ...

Out of curiosity, you have stat running completely fine on your setup?

I do yeah, and I've asked a few others to do a sanity check too, one of those already completed and reports nothing strange ...

I've not tested in fpm for a few days so I'm getting that setup now, are you able to reproduce the same behaviour with the built in cli server (if that is able to run your app) ?

I'm trying to find out what's going on, I'm sure I will, eventually :)

Sure, I can do it with CLI version, it's not a big deal, I'm literally running such simple stuff hoping to see the JSON with actual data so I can fiddle with UI for stat. Your project goes hand in hand with what I'm working on right now so I'll help as much as possible. I really hope it's down to me and not stat :)

Well, prepare yourself for it being down to stat ... it is alpha quality at the moment, we expect things to be broken ... I haven't even got a test harness setup yet, I'm not sure what testing non-deterministic output looks like and haven't had much thinking time for that yet ...

I hope you have the patience to persevere, people working on UI stuff is what I wanted ... I'm very terrible at things you look at ...

I'm ok with that, I've been using your stuff since pthreads, patience I've got because this one might be the holy grail I needed for completing certain projects.

Going off to check what's messed up, will report back as soon as I have something.

I've (read: someone else) just realised you are using an outdated version of the source code, could you pull in the latest version please ?

Now we're talking! 👍

I'm facepalming because I didn't do it immediately..

So, this is the result right now: I get entire JSON back, I can work with it - there are some weird characters I'm seeing in JSON but I can deal with it.

I've profiled a Laravel route that uses its ORM Eloquent to fetch data from the db and I got a whooping 5.7MB of JSON back. I've enough to fiddle with the UI right now, thank you for the help and thank the person who told you to tell me to pull the latest code, you guys/girls are great!

If you can isolate what generates those strange characters, I'd be interested to hear about that ...

False alarm, it was problem in chair, not in computer, I messed up. Stat generates proper JSON, it's my fiddling with sockets that caused it. :)

Excellent :)