shadow-1/yi-hack-v3

Streaming ideas

dvv opened this issue ยท 26 comments

dvv commented

On camera:

# tcpsvd -v 0.0.0.0 4444 tail -qF -c+4096 /tmp/sd/record/tmp.mp4.tmp

On client:

$ vlc -v tcp://CAMERA:4444

...
[00007fa664c02ad8] ps demux warning: this does not look like an MPEG PS stream, continuing anyway
[00007fa664c02ad8] ps demux warning: garbage at input, trying to resync...
[00007fa664c02ad8] ps demux warning: found sync code
[00007fa664c02ad8] ps demux warning: garbage at input, trying to resync...
[00007fa664c02ad8] ps demux warning: found sync code
[00007fa664c02ad8] ps demux warning: garbage at input, trying to resync...
...

This hack would effectively stream but I do not know how to tell vlc demuxer and codecs parameters. Any vlc guru to help me?

dvv commented

In fact the will be the problem of missing moov atoms so not so easy

@dvv
I wonder if the issue you are facing is due to the issue described in this thread: #129

dvv commented

mp4 roughly is a bundle with raw data heap and dictionary to lookup for frames in that heap. The latter (generally) is written at the end of mp4 so frames are not available until mp4 is fully written. Our camera is of the case so this method is void.
In #129 the problem is in getting a short cut file probably which in effect leads to no frames dictionary.

dvv commented

Ok, got a poorman mjpeg streamer for this hack idea.

  • On camera with continuous recording enabled run:
# tcpsvd -v 0.0.0.0 4444 tail -qF -c+4096 /tmp/sd/record/tmp.mp4.tmp
  • On host install Lua

  • On host create decode.lua:

--[[
    Copyright 2018 Vladimir Dronnikov
    GPL
]]

local NAL_START         = "\x00\x00\x00\x01"
local SPS_PPS_640X360   = "\x00\x00\x00\x01\x67\x64\x00\x1E\xAD\x84\x01\x0C\x20\x08\x61\x00\x43\x08\x02\x18\x40\x10\xC2\x00\x84\x3B\x50\x50\x17\xFC\xB3\x70\x10\x10\x10\x20\x00\x00\x00\x01\x68\xEE\x3C\xB0"
local SPS_PPS_1280X720  = "\x00\x00\x00\x01\x67\x64\x00\x1F\xAD\x84\x01\x0C\x20\x08\x61\x00\x43\x08\x02\x18\x40\x10\xC2\x00\x84\x3B\x50\x28\x02\xDD\x37\x01\x01\x01\x02\x00\x00\x00\x01\x68\xEE\x3C\xB0"

local process = function()
    local buf = ""
    while true do
        -- buffer too grown, flush and resync
        if #buf > 2000000 then buf = "" end
        local dat = io.read(4096)
        if not dat then break end
        buf = buf .. dat
        -- find IDR signature
        local beg = buf:find("\x65\xb8", 1, true)
        if beg then
            if beg > 4 then
                -- io.stderr:write(("BEG @%d %02X%02X%02X%02X%02X%02X\n"):format(beg, buf:byte(beg-4), buf:byte(beg-3), buf:byte(beg-2), buf:byte(beg-1), buf:byte(beg), buf:byte(beg+1)))
                if buf:byte(beg - 4) == 0x00 and buf:byte(beg - 3) == 0x00 then
                    buf = buf:sub(beg - 4)
                    -- get NAL unit size
                    local size = ((buf:byte(1) * 256 + buf:byte(2)) * 256 + buf:byte(3)) * 256 + buf:byte(4)
                    -- io.stderr:write(("SIZE %d %d\n"):format(size, #buf))
                    -- wait for buffer to contain the whole NALU
                    if #buf >= size + 4 then
                        frame = buf:sub(5, size + 4)
                        -- io.stderr:write(("FRAME %d %d\n"):format(size, #frame))
                        -- TODO: FIXME: daylight size of 1280x720 > 70000
                        if size < 60000 then
                            -- convert frame to JPEG and dump it to stdout
                            local fp = io.popen("ffmpeg -y -loglevel fatal -f h264 -i - -f image2pipe -", "w")
                            fp:write(SPS_PPS_640X360)
                            fp:write(NAL_START)
                            fp:write(frame)
                            fp:flush()
                            fp:close()
                        end
                        buf = buf:sub(size + 4)
                    end
                else
                    buf = buf:sub(beg + 2)
                end
            end
        end
    end
end

process()
  • On host run:
$ busybox nc CAMERA 4444 | lua decode.lua | streameye -p 9999
  • On host watch:
$ open http://127.0.0.1:9999
dvv commented

Another approach is to stream the content of mmap-ped memory regions which contain video data.

For tests I made the grab tool: grab.tar.gz

To stream 320x192 YUV420P buffer

  • on camera:
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /dev/mem 0x81b30000 0x00 +92160 10
  • on host (for MJPEG at http://HOST:9999):
$ ffmpeg -y -loglevel error -f rawvideo -s 320x192 -pix_fmt yuv420p -i tcp://CAMERA:4444 -q:v 2 -f image2pipe - | streameye -p 9999

To stream h264

NB: sync broken

/tmp/view holds circa 6 seconds of dual-channel h264 video.

  • ??? Probably control info ???
  • High resolution channel 1280x720 occupies 0x00004B44 - 0x00081B43 region.
  • Low resolution channel 640x360 occupies 0x00081B44 - 0x000AEB43 region.
  • AAC audio ADTS frames

Experiment:

  • on camera:
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /tmp/view 0 19268 +531268 6000000 # for 27US
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /tmp/view 0 19268 +409600 6000000 # for 47US
  • on host (for MJPEG at http://HOST:9999):
$ ffmpeg -y -loglevel error -f h264 -i tcp://cam4:4444 -q:v 2 -f image2pipe - | streameye -p 9999

The main task is to determine /tmp/view circular buffer head and tail -- which has been just written by driver.

@shadow-1 ^^^

@dvv just saw your excellent experiment!
How did you determine the circular buffer's regions, or head and tail addresses? I had been experimenting on the Yi home camera 1080p (I know yours are based on the 720p cam) to get the live streaming working, but I couldn't get the exact head & tail memory addresses.

Check out the previous update I posted in another thread.
xmflsct/yi-hack-1080p#5 (comment)

dvv commented

@andy2301 IIRC initially I strace-d mp4record and found boundaries analysing mmap calls addresses.

To say the truth, I got tired with that. My task was to detect motion (so no need in h264 at all), and firstly I moved to use 320x192 YUV420P buffer and then decided to use ad-hoc PIR sensor.

I recall my latest thought on that was:

cat /proc/umap/h264e

analyse numbers under "-----STREAM BUFFER-------------------------------------------------------------"

They should point to base+head/tail.

@dvv Got it. Thanks. I tried out your streaming idea "To stream 320x192 YUV420P buffer" and "To stream h264", but only got pictures full of mosaics.

I suspect that I need to specify some different parameters for the "grab" tool, as my stream buffer's base address seems to be different, as shown below with cat /proc/umap/h264e

-----STREAM BUFFER-------------------------------------------------------------
     ID     Base        RdTail      RdHead      WrTail      WrHead      DataLen     BufFree     
     0      0xc3c00000  0x19c0      0x19c0      0x19c0      0x19c0      0           921536      
     1      0xc3d00000  0x31c0      0x31c0      0x31c0      0x31c0      0           233408    

I have two questions:

  1. Where to find the values for the parameters of the grab tool? Specifically, I'd like to know how you determined these 3 values "0x81b30000", "+92160", "+531268".
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /dev/mem 0x81b30000 0x00 +92160 10
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /tmp/view 0 19268 +531268 6000000
  1. Do you mind sharing the source code of the "grab" tool?

Thanks again!

Update:
After a few experiments by providing different values for the grab tool, I got the h264 streaming to work. This is my guess: grab <file|dev> <baseAddr> <offset> <+chunkSizeToRead> <delayInUsec>

But I still have the two questions. Especially for question 1, how was the magic number 19268 determined? I understand that 0x00004B44 is 19268 (starting offset of the 720p region), and [0, 19268] might be the block with control info. But how did you get to know this 0x00004B44 number?

Update2:
I read your reply again and noted that you said "strace-d mp4record". That makes sense, because mp4record writes both the 720p video and 360p video to disk. Thanks.

I managed to compile strace and did a strace on mp4record, but no dice in finding the /tmp/view boundaries. When motion is detected, strace prints the following (the polling messages, such as clock_gettime and _newselect, are removed).

It looks like mp4record simply renames/moves the /tmp/tmp.mp4.tmp file to the record location.

rename("/tmp/sd/record/tmp.mp4.tmp", "/tmp/sd/record/2018Y10M05D06H/40M00S.mp4") = 0
open("/etc/TZ", O_RDONLY)               = -1 ENOENT (No such file or directory)
open("/etc/localtime", O_RDONLY)        = -1 ENOENT (No such file or directory)
mq_timedsend(3, "\1\0\0\0\20\0\0\0\350\0\350\0\0\0\0\0\355=\0\0", 20, 1, NULL) = 0
sendto(5, "\342\3\0\0msg snd success", 19, MSG_DONTWAIT, {sa_family=AF_UNIX, sun_path="/tmp/logsock"}, 110) = 19
open("/etc/TZ", O_RDONLY)               = -1 ENOENT (No such file or directory)
open("/etc/localtime", O_RDONLY)        = -1 ENOENT (No such file or directory)
mkdir("/tmp/sd/record", 0777)           = -1 EEXIST (File exists)
mkdir("/tmp/sd/record/2018Y10M05D06H", 0777) = -1 EEXIST (File exists)
access("/tmp/sd/record/tmp.mp4.tmp", F_OK) = -1 ENOENT (No such file or directory)
open("/tmp/sd/record/tmp.mp4.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 8
fcntl64(8, F_GETFL)                     = 0x1 (flags O_WRONLY)
fcntl64(8, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
mq_timedsend(3, "\1\0\0\0\20\0\0\0\347\0\347\0\0\0\0\0", 16, 1, NULL) = 0
sendto(5, "\342\3\0\0msg snd success", 19, MSG_DONTWAIT, {sa_family=AF_UNIX, sun_path="/tmp/logsock"}, 110) = 19
sendto(5, "\342\3\0\0main stream record finish", 29, MSG_DONTWAIT, {sa_family=AF_UNIX, sun_path="/tmp/logsock"}, 110) = 29
sendto(5, "\342\3\0\0sub stream record finish", 28, MSG_DONTWAIT, {sa_family=AF_UNIX, sun_path="/tmp/logsock"}, 110) = 28 
lseek(8, 32, SEEK_SET)                  = 32                                                                             
close(8)                                = 0 

dvv commented

@andy2301

On YUV

To know regions for YUV do

$ cat /proc/umap/vb

You'll see kinda:

------------------------------------------------------------------------------
PoolId    PhysAddr    VirtAddr    IsComm    Owner     BlkSz    BlkCnt      Free       MinFree
     0  0x8163b000  0x0                1       -1   1382400         3       0(0)         0
...
------------------------------------------------------------------------------
PoolId    PhysAddr    VirtAddr    IsComm    Owner     BlkSz    BlkCnt      Free       MinFree
     1  0x81a31000  0x0                1       -1    345600         3       0(0)         0
...
------------------------------------------------------------------------------
PoolId    PhysAddr    VirtAddr    IsComm    Owner     BlkSz    BlkCnt      Free       MinFree
     2  0x81b30000  0x0                1       -1     92160         3       1(1)         0
...

You'll then want to

$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /dev/mem <PhysAddr> 0x00 +<BlkSz> 10

NB: for just motion detection purpose one may drop V-component of YUV, so the command above may transform to:

$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /dev/mem <PhysAddr> 0x00 +<2*BlkSz/3> 10
dvv commented

@andy2301

grab.c

//    Copyright 2018 Vladimir Dronnikov <dronnikov@gmail.com>
//    GPL

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>

#define HD_STREAM_START 0x8178c800
#define HD_STREAM_SIZE  1382400 // (1280 * 720 * 3)
#define SD_STREAM_START 0x81a31000
#define SD_STREAM_SIZE  345600 // (640 * 360 * 3)
#define JP_STREAM_START 0x81FC3000
#define JP_STREAM_SIZE  (0x81FD5000 - 0x81FC3000) // (61440)

#define STREAM_START    JP_STREAM_START
#define STREAM_SIZE     JP_STREAM_SIZE

static ssize_t full_write(int fd, const void *buf, size_t len)
{
    ssize_t cc;
    ssize_t total;
    total = 0;
    while (len) {
        for (;;) {
            cc = write(fd, buf, len);
            if (cc >= 0 || EINTR != errno) {
                break;
            }
            errno = 0;
        }
        if (cc < 0) {
            if (total) {
                return total;
            }
            return cc;
        }
        total += cc;
        buf = ((const char *)buf) + cc;
        len -= cc;
    }
    return total;
}

int main(int argc, char *argv[])
{
  if (argc < 6) {
    fprintf(stderr, "grab FILE START SKIP +SIZE|END DELAY-MICROSECONDS\n");
    exit(1);
  }

  off_t start = strtoul(argv[2], NULL, 0);
  off_t skip = strtoul(argv[3], NULL, 0);
  size_t size;
  if (argv[4][0] == '+') {
    size = strtoul(argv[4], NULL, 0);
  } else {
    size = strtoul(argv[4], NULL, 0) - start + 1;
  }
  unsigned long delay = strtoul(argv[5], NULL, 0);
  fprintf(stderr, "mmap %s from 0x%08lx+0x%08lx, %ld bytes, delay %ld microseconds\n", argv[1], start, skip, size, delay);

  int fd = open(argv[1], O_RDONLY);
  if (fd < 0) {
    perror("open");
    exit(2);
  }
  void *map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, start);
  if (MAP_FAILED == map) {
    perror("mmap");
    exit(4);
  }

  while (1) {
    if (-1 == full_write(1, map + skip, size - skip)) {
      perror("write");
      break;
    }
    write(2, ".", 1);
    if (delay <= 0) break;
    usleep(delay);
  }

  munmap(map, size);
  close(fd);

  return 0;
}
dvv commented

@andy2301

If you send me (say to dronnikov@gmail.com) a gzipped snapshot of /tmp/view (try not to disclose anything -- say obscure cam with hand and issue a dd) I'll try to find you the streams markers.

@dvv thank you. I sent a dump of /tmp/view to your email box.

My purpose is to be able to reconstruct the 1920x1080p video (on Yi home camera 1080p) from /tmp/view and RTSP-streaming it, while the official Yi Home related processes are running. I had partial success in doing that by loop-reading between offsets [19268, 1050000] (previously [0, 1050000]). So, it really helps if you can determine the streams markers.

More than that, I'd hope to get a better understanding about the process of writing to /tmp/sd/record/tmp.mp4.tmp file when motion is detected. With that, I would be able to grab the latest video frames and stream it out.

So, to sum up, the following info will be super helpful.

  1. the video stream boundaries of /tmp/view: 1920x1080p video stream and 640x360p video stream, and possible the audio stream. With this info, the streaming will be much more smooth (but may still be lagging behind, because we don't know which offset of /tmp/view has the latest video frame).
  2. the write head/tail to /tmp/view. With this, I can make the RTSP streaming always pick up the latest video frame. My understanding of the video frame flowing path is the following:
    /dev/isp_dev (cam device) --> /dev/h264e (reflected by /proc/umap/h264e) ---> /tmp/view (memory-mapped file) ----> /tmp/sd/record/tmp.mp4.tmp.
    So it really helps to know that, at the second when motion is detected, which offset of /tmp/view has the latest video frame data. I assume that is how the video frame data is written from /tmp/view to /tmp/sd/record/tmp.mp4.tmp.
/tmp # cat /proc/umap/h264e 

[H264E] Version: [Hi3518EV200_MPP_V1.0.3.0 B040 Release], Build Time[May 20 2016, 12:01:09]

-----MODULE PARAM--------------------------------------------------------------
      OnePack  H264eVBSource    H264eRcnEqualRef    H264eMiniBufMode
            0              1                   1                   1
-----CHN ATTR------------------------------------------------------------------
     ID    MaxWidth   MaxHeight   Width    Height   profile   C2GEn   BufSize   ByFrame   MaxStrCnt
      0      1920      1080       1920      1080        hp       N    200704         Y         200
      1       640       360        640       360        hp       N     65536         Y         200

-----PICTURE INFO--------------------------------------------------------------
     ID     EncdStart   EncdSucceed        Lost        Disc        Skip       Pskip      Recode      RlsStr     UnrdStr
      0     142173717     142125644          66           0          66           0       48073   142125644           0
      1     145304081     145303504           0           0           0           0         577   145303504           0

-----STREAM BUFFER-------------------------------------------------------------
     ID     Base        RdTail      RdHead      WrTail      WrHead      DataLen     BufFree     
     0      0xc2d80000  0x23e00     0x23e00     0x23e00     0x23e00     0           200640      
     1      0xc2d60000  0xe400      0xe400      0xe400      0xe400      0           65472    
dvv commented

@andy2301 It looks like your stream is in [38468, 1087044).

Try:

# hi stream
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /tmp/view 0 38468 +1048576 6000000
# lo stream
$ tcpsvd -l 0 -v 0.0.0.0 4444 ./grab /tmp/view 0 1087044 +368640 6000000

Success?

dvv commented

@andy2301 on your last (2)

In my empirical opinion /tmp/sd/record/tmp.mp4.tmp is purely controlled by mp4record. It is essentially a dump of /tmp/view's streams once motion is detected. That is why the length of event videos is about 6 seconds -- the length of h264 buffer in /tmp/view (hence 6000000 in the commands above).

NB: One of my attempts to dump the stream was to strace mp4record, grep for write(...) syscalls and extract frames from its argument. It had some advance but I gave up to distinguish robustly high/low stream chunks due to my lack of h264 guts knowledge.

@dvv: Really a good idea, thank you for your work.

Based on your idea; dirty but some parts of H264 stream can be received with throught RTSP
https://github.com/Necromix/yi-vencrtsp_v2

dvv commented

\o/

Hi everyone, after some work I managed to use the SDK to compile @dvv's grab to test it on my Yi Dome 720p camera.

I managed to get a 720p YUV stream at about 0.5 FPS (which is obviously too slow), so I tried to make the H264 stream work. After reading all of the issues regarding the RTSP on the projects (and trying to make it work without too much success: laggy or corrupted video) I noticed that a question that hasn't been answered yet is:

"How can we know where the head/tail of the circular buffer is?"

I might be wrong but the answer could be in the /proc/umap/h264e file, in fact in a tiny moment of rage I executed cat on the file multiple times and something was changing. In fact the RdTail, RdHead, WrTail, WrHead values were changing. I don't know if this a known thing, anyway I'll be happy to contribute to the project.

The changing values can be checked with this command:
while :; do cat /proc/umap/h264e | sed -n 17,20p; sleep 1; done

I tried @Necromix's vencrtsp_v2 too without much success (laggy and corrupted video).

Another thing I don't fully understand is how to calculate the <baseAddr> <offset> <+chunkSizeToRead> values. Especially the chunkSizeToRead.

Here's my h264e file:

[H264E] Version: [Hi3518EV200_MPP_V1.0.4.1 B030 Release], Build Time[Jul  3 2017, 17:50:46]

-----MODULE PARAM--------------------------------------------------------------
      OnePack  H264eVBSource    H264eRcnEqualRef    H264eMiniBufMode
            0              1                   1                   1
-----CHN ATTR------------------------------------------------------------------
     ID    MaxWidth   MaxHeight   Width    Height   profile   C2GEn   BufSize   ByFrame   MaxStrCnt
      0      1280       720       1280       720        hp       N    131072         Y         200
      1       640       360        640       360        hp       N     65536         Y         200

-----PICTURE INFO--------------------------------------------------------------
     ID     EncdStart   EncdSucceed        Lost        Disc        Skip       Pskip      Recode      RlsStr     UnrdStr
      0         80624         80578          15           0          15           0          45       80578           0
      1         80643         80611           0           0           0           0          32       80611           0

-----STREAM BUFFER-------------------------------------------------------------
     ID     Base        RdTail      RdHead      WrTail      WrHead      DataLen     BufFree     
     0      0xc41c0000  0x6940      0x6940      0x6940      0x6940      0           131008      
     1      0xc4200000  0x55c0      0x55c0      0x55c0      0x55c0      0           65472       

-----RefParam INFO--------------------------------------------------------------
     ID    EnPred        Base     Enhance bVirtualIEnable  VirtualIInterval VirtualIQpDelta GetVbFail
      0         Y           1           0               N                30               0         0
      1         Y           1           0               N                30               0         0

-----ROI INFO------------------------------------------------------------------
     ID     Index    bAbsQp    Qp     Width    Height    StartX    StartY   BgSrcFr   BgTarFr

-----Syntax INFO1---------------------------------------------------------------
     ID SlcspltEn   Slcmode   Slcsize   IntraRefresh  enIslice    RefreshLine   QpOfIDR
      0         N       N/A       N/A              N         N             11        40
      1         N       N/A       N/A              N         N              5        40

-----Inter & Intra prediction INFO---------------------------------------------
     ID   profile  HWsize  VWsize  P16x16   P16x8   P8x16    P8x8   MvExt  I16x16    Inxn    Ipcm
      0        hp       7       3       Y       Y       Y       Y       Y       Y       Y       Y
      1        hp       5       2       Y       Y       Y       Y       Y       Y       Y       Y

-----Syntax INFO2--------------------------------------------------------------
     ID   Profile   EntrpyI   EntrpyP  Itrans  Ptrans QMatrix   POC   DblkIdc   Alpha    Beta
      0        hp     cabac     cabac     all     all       N     2         0       0       0
      1        hp     cabac     cabac     all     all       N     2         0       0       0

Edit: after some digging into /tmp/view it appears that the values <baseAddr> <offset> <+chunkSizeToRead> can be easily extracted from the file itself.

------------------------------------------------------------
|   First section (data) - usually less than 19268 bytes   |
------------------------------------------------------------
|                  Padding - variable size                 | 
------------------------------------------------------------
|              Hi-res H264 data - variable size            |
------------------------------------------------------------
|             Padding - at least 8 0x00 bytes              |
------------------------------------------------------------
|             Low-res H264 data - variable size            |
------------------------------------------------------------
|            Audio? - needs further investigation          |
------------------------------------------------------------
|                  Padding - variable size                 | 
------------------------------------------------------------
  • baseAddr is always 0.
  • offset is the address of the first H264 header 0x00 0x00 0x00 0x01 which can be found after the first padding section.
  • +chunkSizeToRead is the address (minus the offset) of the padding between the hi and low res sections.

Btw I'm now trying to create a stable RTSP stream using @andy2301's rtsp2303. I had to edit some values (offsets, adjustIdx, chunk size and fps IIRC) to make the stream work. It's not really stable, the program randomly hangs but it's a step forward!

Edit 2: I just locked out myself from the camera by executing rtsp2303 in init.sh without adding the "&" characted at the end to make it work in the background. Damn.

Edit 3: Added the audio to the /tmp/view file structure according to @dvv's post:

High resolution channel 1280x720 occupies 0x00004B44 - 0x00081B43 region.
Low resolution channel 640x360 occupies 0x00081B44 - 0x000AEB43 region.
AAC audio ADTS frames

I'm now working on a daemon which will populate a file in /tmp with the head/tail values, ready to be read by the RTSP server to sync the stream.

buffer_viewer

I spent an afternoon optimizing it, the cpu usage with a check rate of 200ms is about 0.5/1.5%, the memory footprint is something around 100k but it can be reduced at the cost of accuracy. The first version needed at least 20/30% of cpu time and 1.5MB of ram.

The quick jumps of the offsets you can see in the gif could be the keyframes, they take more space than the smaller packets.

Hello everyone!

Here's my analysis of /tmp/view on my Yi 1080P Dome. I'm new to the device (Christmas gift), so maybe it's not news for you guys but I wanted to share in case it helps at all. ;)

[     0-F     ]: Data #1  (16 bytes)
[   B2D-3A9D  ]: Data #2  (12145 bytes)
[  3D1D-5BCB  ]: Data #3  (7855 bytes)
[  620D-70AB  ]: Data #4  (3743 bytes)
[  9644-1095DC]: HD 1080P (1048473 bytes)
[109647-16363B]: SD 360P  (368629 bytes)
[163644-17363B]: Audio?   (65528 bytes)

The Data #2,3,4 addresses slide around. They also appear to be H264 NAL units.

I was able to use vencrtsp_v2 to stream both the SD and HD streams to VLC v3.0.4. I adjusted START_VIEW and SIZE_VIEW as appropriate, although my C is pretty rusty so I only use half of the HD data (and it still plays!) as setting SIZE_VIEW larger than 512K? gives a segmentation fault for some reason.

The Messages window in VLC is full of late frame warnings and the video is delayed and jumpy, but the stream ran for 30+ minutes on each before I ended it. I believe VLC is very forgiving.

I also tried streaming to MotionEye, but it fails to even start the stream.

Edit: After reading more about H264, Data 2/3/4 is not NAL units.

Hi @drlarsen77,
would you mind trying to run viewd? It should give you the offsets value. (the binary is in the first issue)

https://github.com/TheCrypt0/viewd

@drlarsen77 @TheCrypt0

Hey guys, I have a Yi 1080p Dome and tried to look into running vencrtsp_v2, but honestly am not sure what I should be changing in that script in order to match viewd's output. Just the START_VIEW and SIZE_VIEW variables? vlc is able to connect to the running stream as-is (fresh git clone), but there's no video or audio at all.

Also, is there a separate repository with the extra files needed for running make for vencrtsp_v2? I can't seem to find sample_comm.h, for example.

@bharris6 You need the Hi3518E SDK in order to cross compile a binary that will work on your camera. Personally, I just created a vencrtsp_v2 folder under Hi3518E_SDK_V1.0.4.0/mpp/sample/ so that all the references lined up.

If you set START_VIEW and SIZE_VIEW (be sure to convert from hex to dec) you'll be able to get a (glitchy) stream working in VLC. I find the SD stream works better, but they both do work since VLC is very forgiving. The HD stream's SIZE_VIEW is larger than the max allowed by some of the programs functions (you'll get a Segmentation Fault) but you can reduce it to something smaller.

Anyway, that only gets you part of the way there. The vencrtsp_v2 program doesn't account for the cyclical video buffer, so all of that would need to be written. Basically for every read, it needs to check the current offset and move accordingly.

@TheCrypt0 is working on yi-hack-v4 and hasn't released it yet (I don't have it yet either). I believe he plans to have something soon. Once that is released, it will incorporate his idea into a functioning RTSP server.

@drlarsen77 ok thanks! I'll keep my eyes peeled for V4, and in the meantime I will look into that SDK. I knew I was missing something!

@bharris6

I'm making a lot of progress with the RTSP server. Latest update:

------------------- RTSP UPDATE ------------------- 

I've just discovered that the new Xiaomi updates treat the H264 buffer differently,
the plan now is to create a Kernel module which will provide three devices in /dev
with the already parsed stream.

The devices will be:

- /dev/viewd_hires   : High resolution H264 20FPS stream
- /dev/viewd_lowres  : Low resolution H264 20FPS stream
- /dev/viewd_audio   : AAC Audio

(the names may change if I find better ones)

WARNING: 
    This implementation COULD corrupt ~0.5sec (out of 10) of the cloud recording, 
    this is because the end of the circular buffer needs to be zeroed before reading. 
    There's a small chance that this particular section will be used by the cloud 
    process. 
    
    However, tests on my Yi Dome 720p showed that the cloud recording are not corrupted
    and working properly.

NOTE:
    I tested the above process with a small program I wrote, just to be sure that my
    assumptions were correct. Just redirecting the output of it to a file *.h264 worked
    like a charm.

Greetings,
Crypto

Updates in the channel #rtsp-server of the Discord Server. Invite link.