azhurb/stalker_portal

Inefficient tv archive recording

javiermatos opened this issue · 9 comments

Hi,

I started using stalker portal with a second server for storage. I have developed systems in the past which faced the "multiple write operations" problem. This is exactly what you have in your storage server: many channels getting recorded at the same time, causing hundreds of writes to disks. HDD needle have to move from one location to the other, and moving that needle is the slowest and less efficient thing ever. The consequences are: HDD with shorter lifespan, slower servers, less customers per server and more money spent.

There is a tiny and stupid solution for that: using RAM for having buffers that will store big chunks of video recording and will move it to HDD in an efficient way.

I have seen that you have implemented your logic in two places: tvarchiverecorder.class.php and dumpstream. You can simply use "buffer" linux utility in the following wey:

Stream -> dumpstream ---(pipe)---> buffer -> HDD

You can redirect dumpstream stdout to buffer and save the result in HDD as shown in the followsing example:
python dumpstream -a 239.255.0.1 -p 1234 -n 22 | buffer -s 256K -m 10M -p 80 -o output.mpg

Once you install buffer (apt-get install buffer, that is a 72 kB utility) you have these options:

Usage: buffer [-B] [-t] [-S size] [-m memsize] [-b blocks] [-p percent] [-s blocksize] [-u pause] [-i infile] [-o outfile] [-z size] [-Z] [-d]
-B = blocked device - pad out last block
-t = show total amount written at end
-S size = show amount written every size bytes
-m size = size of shared mem chunk to grab
-b num = number of blocks in queue
-p percent = don't start writing until percent blocks filled
-s size = size of a block
-u usecs = microseconds to sleep after each write
-i infile = file to read from
-o outfile = file to write to
-z size = combined -S/-s flag
-Z = seek to beginning of output after each 1GB (for some tape drives)
-d = print debug information to stderr

Now the question is... is it safe for me to just edit tvarchiverecorder.class.php and change dumpstream call to use buffer? Is there any side effect that I can break by doing so?

Regards,
Javier

I have created a pull request with the changed that are needed for having dumpstream python program to use buffering and reduce IO operations with disks. The default is to use a 8 MB buffering, so you improve your server performance by using 8MB of RAM for each channel that you are recording. I think is a good tradeoff.

I'm actually using it. I have done some previous tests (in an independent code) and it works as expected. Also, I'm testing in testing environment with a dedicated storage server and is ok.

Thank you for such a detailed description. We need a little time to investigate this problem.

Hello again,

I did a small python program so you can check the behavior of the buffer when writing content to disk. This program just creates a file using a custom buffer size. If you have a small buffer size, then you can see that it makes many disk writes, but the larger the buffer, the less the write operations.

Use -s or --size for file size (the file size it creates, in MB), -b or --buffering for the buffering size (in MB). You don't need to specify the -c or --chunk-size. There you can see that with 100 MB file size and 10 MB buffering size, you only make 10 disk writes. If you don't use buffering, you will be making many more disk writes (more stress to disks, less efficiency). I hope this helps to your analysis.

import argparse
import io
from datetime import datetime


MEGABYTES = 1024*1024


def buffered_write(filename, size, buffering, chunk_size=1):

    c_size = 0
    with io.open(filename, 'ab', buffering) as f:
        while size > c_size:
            f.write(b'a' * chunk_size)
            c_size += chunk_size


def main():
    parser = argparse.ArgumentParser(description='Python buffering test')
    parser.add_argument('-f', '--filename', type=str)
    parser.add_argument('-s', '--size', type=int, default=64)
    parser.add_argument('-b', '--buffering', type=int, default=1)
    parser.add_argument('-c', '--chunk-size', type=int, default=1)
    args = parser.parse_args()

    if not args.filename:
        args.filename = 'buffering_%s' % datetime.utcnow().strftime('%Y%m%d%H%M%S')
    args.size *= MEGABYTES
    args.buffering *= MEGABYTES

    buffered_write(args.filename, args.size, args.buffering, args.chunk_size)


if __name__ == '__main__':
    main()

Looks like by default python (on my test server) uses 8M buffering, therefore nothing has changed with this option for dumpstream.

But we can add the option to the config.php to configure buffering.

Sorry, my mistake. By default it is only 8K (or 4K), not 8M.

I can do the changes to expose the buffer size in config.php. What name do you want me to use? DUMPSTREAM_BUFFER for instance? I will create that and left 8 MB as default in case this value is not defined.

We have already added
a835ae2

Thanks again!