docker version orkaudio maybe memory leak
wangduanduan opened this issue · 23 comments
docker voiceip/orkaudio:master
only 10 cps, every call talk 10 seconds, the memory useage keep grow always, and never reduce
Your second graph shows the the graph flattens around 2GB. Any chance that you have kept it running for longer duration and observed the usage?
@kingster thanks for reply.
the second graph show on 50 cps stress test. the graph flattens around 2GB, because i stop the stress test.
the fire graph show on 10 cps stress test
this is my config.xml
<config>
<!-- This is an example configuration file for the Oreka orkaudio capture service on Linux -->
<!-- Copy this to config.xml and modify according to taste -->
<AudioOutputPath>/var/log/orkaudio/audio</AudioOutputPath>
<!-- <TapeFileNaming>[trackingid],[localparty],[remoteparty],[nativecallid]</TapeFileNaming> -->
<!-- Uncomment the plugin you want to use: -->
<!-- Use libvoip.so for SIP, Cisco Skinny and pure RTP -->
<!-- Use libh323voip.so for Avaya, Nortel Unistim, H.323 and MGCP -->
<!-- See in <VoIpPlugin> below for more precise protocol tuning -->
<CapturePlugin>libvoip.so</CapturePlugin>
<!--<CapturePlugin>libh323voip.so</CapturePlugin>-->
<!--<CapturePlugin>liborksipua.so</CapturePlugin>-->
<CapturePluginPath>/usr/lib</CapturePluginPath>
<!--<PluginsDirectory>/oreka-src/orkaudio/plugins</PluginsDirectory>-->
<!-- Audio file storage format: choose from: native, gsm, ulaw, alaw, pcmwav -->
<StorageAudioFormat>pcmwav</StorageAudioFormat>
<StereoRecording>true</StereoRecording>
<TapeNumChannels>2</TapeNumChannels>
<AudioFileBitRate>8000</AudioFileBitRate>
<!-- If you want to keep native audio files as well as compressed, change this to "no" -->
<DeleteNativeFile>yes</DeleteNativeFile>
<TrackerHostname>192.168.40.186</TrackerHostname>
<TrackerTcpPort>8080</TrackerTcpPort>
<CapturePortFilters>LiveMonitoring</CapturePortFilters>
<TapeProcessors>BatchProcessing, Reporting</TapeProcessors>
<BatchProcessingEnhancePriority>true</BatchProcessingEnhancePriority>
<NumBatchThreads>4</NumBatchThreads>
<AudioFileOwner>tomcat</AudioFileOwner>
<AudioFileGroup>tomcat</AudioFileGroup>
<AudioFilePermissions>644</AudioFilePermissions>
<!--<TapeDurationMinimumSec>3</TapeDurationMinimumSec>-->
<!-- Uncomment the parameter below and fill in a comma-separated -->
<!-- list of TCP addresses which you wish to open a connection to. -->
<!-- For example 192.168.1.250:1721, 192.168.1.1:8091. A TCP -->
<!-- connection shall be opened and a read-loop shall be entered -->
<!-- into whereby any data read shall be discarded, and a record -->
<!-- maintained of the amount of data which has been read. -->
<!-- <SocketStreamerTargets></SocketStreamerTargets> -->
<VoIpPlugin>
<PcapSocketBufferSize>8388608</PcapSocketBufferSize>
<!--queuemetrics integration, uncomment the following line-->
<SipExtractFields>W_Call_ID</SipExtractFields>
<!-- Use this for Nortel proprietary VoIP protocol -->
<!--<UnistimDetect>yes</UnistimDetect>-->
<!-- Turn both these on this for Avaya H.323 extensions -->
<!--<AvayaDetect>yes</AvayaDetect>-->
<!--<RtcpDetect>yes</RtcpDetect>-->
<!-- Set the option below to "true" to enable IAX2 support -->
<!-- the default is that IAX2 support is disabled -->
<!--<Iax2Support>false</Iax2Support> -->
<!-- Use this if you want to force capture from a given list of devices. -->
<!-- All available devices are listed in orkaudio.log when the service is starting -->
<Devices>enp89s0</Devices>
<PcapFilter>host 192.168.2.221</PcapFilter>
<!--<SipOverTcpSupport>yes</SipOverTcpSupport>-->
<!--<SipReportFullAddress>yes</SipReportFullAddress>-->
<!-- <SipRequestUriAsLocalParty>yes</SipRequestUriAsLocalParty> -->
<!--<SipUse200OkMediaAddress>yes</SipUse200OkMediaAddress>-->
<!-- Those two parameters are only needed for call direction detection (one or the other) -->
<!--<SipDomains>company.com, 65.34.25.87</SipDomains>-->
<!--<SipDirectionRefenceIpAddresses>65.34.98.56, 65.34.98.57</SipDirectionRefenceIpAddresses>-->
<!-- Sangoma wanpipe RTP tap for TDM boards -->
<!--<SangomaRxTcpPortStart>9000</SangomaRxTcpPortStart>-->
<!--<SangomaTxTcpPortStart>11000</SangomaTxTcpPortStart>-->
<!-- Mitel Communications Platform -->
<!-- Turn on the parameter below to enable support for Mitel -->
<!-- <MitelDetect>yes</MitelDetect> -->
<!-- The parameter below sets the Mitel signalling port. The -->
<!-- default is 3999 -->
<!-- <MitelSignallingPort>3999</MitelSignallingPort> -->
<!-- The parameter below sets the amount of time in seconds -->
<!-- after which the cached Mitel metadata shall be discarded. -->
<!-- The default is 60 seconds. -->
<!-- <MitelMetadataTimeoutSec>60</MitelMetadataTimeoutSec> -->
<!-- Turn on the parameter below to enable extension Mitel -->
<!-- extension detection using ARP. Turning on this parameter -->
<!-- automatically turns on MitelDetect -->
<!-- <MitelArpExtensionDetect>yes</MitelArpExtensionDetect> -->
<!-- Set MitelSmdrPort to the port where Mitel SMDR records -->
<!-- may be accessed. The default is 1752. Note that you -->
<!-- shall need to configure SocketStreamerTargets with the -->
<!-- host and this port, in order for Oreka to access the -->
<!-- SMDR records. See SocketStreamerTargets above for more -->
<!-- information on how to configure it. -->
<!-- <MitelSmdrPort>1752</MitelSmdrPort> -->
<!-- End of Available Configurations for Mitel Communications Platform -->
</VoIpPlugin>
</config>
the orkaudio will be killed because the Out Of Memory limit of the docker service. so it can not be runing long time
i think the speed of memory grow is not normal.
i use sipp stress test
ename=2022/06/24/05/20220624_051741_SYLT.wav nativeCallId=64699-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLV localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLV.wav nativeCallId=64700-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLX localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLX.wav nativeCallId=64701-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLZ localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLZ.wav nativeCallId=64702-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMB localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMB.wav nativeCallId=64703-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMD localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMD.wav nativeCallId=64704-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMF localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMF.wav nativeCallId=64705-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMH localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMH.wav nativeCallId=64706-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMJ localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYMJ.wav nativeCallId=64707-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYML localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYML.wav nativeCallId=64708-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMN localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYMN.wav nativeCallId=64709-3265469@192.168.2.221 ondemand=false
this is start log
OrkAudio version : service starting
2022-06-24 05:21:23,425 WARN config:278 - It is not recommended to have more batch threads than CPUs
2022-06-24 05:21:23,426 INFO root:109 - Loaded plugin: /usr/lib/libvoip.so
2022-06-24 05:21:23,428 INFO packet:1847 - Initializing VoIP plugin
2022-06-24 05:21:23,428 INFO packet:1554 - Available pcap devices:
2022-06-24 05:21:23,428 INFO packet:1561 - * veth010dbad -
2022-06-24 05:21:23,428 INFO packet:1561 - * enp89s0 -
2022-06-24 05:21:23,428 INFO packet:1353 - Setting pcap socket buffer size:8388608 bytes successful
2022-06-24 05:21:23,480 INFO packet:1377 - Activating pcaphandle:fc065140 successfully
2022-06-24 05:21:23,480 INFO packet:1392 - Setting setsockopt with bufsize:8388608 successfully
2022-06-24 05:21:23,480 INFO packet:1484 - Successfully opened device. pcap handle:fc065140 message:
2022-06-24 05:21:23,480 INFO packet:1561 - * docker0 -
2022-06-24 05:21:23,480 INFO packet:1561 - * vethce1df71 -
2022-06-24 05:21:23,480 INFO packet:1561 - * vethdb66095 -
2022-06-24 05:21:23,480 INFO packet:1561 - * vethe581717 -
2022-06-24 05:21:23,480 INFO packet:1561 - * lo -
2022-06-24 05:21:23,480 INFO packet:1561 - * any - Pseudo-device that captures on all interfaces
2022-06-24 05:21:23,480 INFO packet:1561 - * wlo1 -
2022-06-24 05:21:23,480 INFO packet:1561 - * bluetooth-monitor - Bluetooth Linux Monitor
2022-06-24 05:21:23,480 INFO packet:1561 - * nflog - Linux netfilter log (NFLOG) interface
2022-06-24 05:21:23,480 INFO packet:1561 - * nfqueue - Linux netfilter queue (NFQUEUE) interface
2022-06-24 05:21:23,480 INFO packet:1561 - * bluetooth0 - Bluetooth adapter number 0
2022-06-24 05:21:23,481 INFO packet:1744 - No localpartymap.csv supplied, either locally or at /etc/orkaudio/localpartymap.csv
2022-06-24 05:21:23,481 INFO packet:1805 - LoadSkinnyGlobalNumbersList: Could not open file:skinnyglobalnumbers.csv -- trying:/etc/orkaudio/skinnyglobalnumbers.csv now
2022-06-24 05:21:23,481 INFO packet:1811 - LoadPartyMaps: Could not open file:/etc/orkaudio/skinnyglobalnumbers.csv either -- giving up
2022-06-24 05:21:23,482 INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/librtpmixer.so
2022-06-24 05:21:23,482 INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/libsilkcodec.so
2022-06-24 05:21:23,482 INFO silk:243 - SILK codec filter initialized.
2022-06-24 05:21:23,483 INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/libg729codec.so
2022-06-24 05:21:23,483 INFO g729:149 - G729 codec filter starting.
2022-06-24 05:21:23,483 INFO g729:152 - G729 codec filter initialized.
2022-06-24 05:21:23,483 INFO taperegistry:62 - Registered processor: BatchProcessing
2022-06-24 05:21:23,483 INFO taperegistry:62 - Registered processor: CommandProcessing
2022-06-24 05:21:23,483 INFO taperegistry:62 - Registered processor: Reporting
2022-06-24 05:21:23,483 INFO taperegistry:62 - Registered processor: TapeFileNaming
2022-06-24 05:21:23,483 INFO taperegistry:62 - Registered processor: DirectionSelector
2022-06-24 05:21:23,483 INFO reporting:283 - [192.168.40.186:8080/orktrack] reporting thread started.
2022-06-24 05:21:23,483 INFO immediateProcessing:90 - thread starting - queue size:10000
2022-06-24 05:21:23,483 INFO batchProcessing:233 - thread Th0 starting - queue size:20000
2022-06-24 05:21:23,483 INFO batchProcessing:233 - thread Th1 starting - queue size:20000
2022-06-24 05:21:23,483 INFO batchProcessing:233 - thread Th2 starting - queue size:20000
2022-06-24 05:21:23,484 INFO batchProcessing:233 - thread Th3 starting - queue size:20000
2022-06-24 05:21:23,484 INFO tapeFileNamingLog:86 - Started
2022-06-24 05:21:23,484 INFO batchProcessing:106 - Command Processing thread Th0 starting - queue size:10000
2022-06-24 05:21:23,484 INFO httpserver:247 - Started HttpServer on port:59140
2022-06-24 05:21:23,484 INFO directionSelector:184 - thread Th0 starting - queue size:20000
2022-06-24 05:21:23,484 INFO tlsserver:318 - HTTPS server disabled
2022-06-24 05:21:23,484 INFO directionSelector:129 - LoadAreaCodesMaps: Could not open file:area-codes-recorded-side.csv -- trying:/etc/orkaudio/area-codes-recorded-side.csv now
2022-06-24 05:21:23,484 INFO eventstreamingserver:736 - Started EventstreamingServer on port:59150
2022-06-24 05:21:23,484 INFO directionSelector:135 - LoadAreaCodesMaps: Could not open file:/etc/orkaudio/area-codes-recorded-side.csv either -- giving up
2022-06-24 05:21:23,484 INFO packet:980 - Start Capturing: pcap handle:fc065140
2022-06-24 05:21:23,487 INFO reporting:329 - [192.168.40.186:8080/orktrack] init success:true comment:
2022-06-24 05:21:29,046 INFO packet:1744 - No localpartymap.csv supplied, either locally or at /etc/orkaudio/localpartymap.csv
2022-06-24 05:21:33,100 INFO pcapstats:1906 - enp89s0: handle:fc065140 received:181 received10s:181 dropped:0 dropped10s:0 ifdropped:0 ifdropped10s:0
2022-06-24 05:21:33,100 INFO pcapstats:831 - numPackets:155 maxPPS:42 minPPS:3
Can you share your sipp stress scripts, so that I can reproduce this issue?
I am running a slightly older version on production (v0.2.5) and haven't observed any memory leak, so it's possible that the memory leak got introduced in the recent merges from upstream.
Thanks, I will try to reproduce the issue in our setup.
docket stats show orkaudio use 724Mib
NTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
4f88849a9af3 capture 47.14% 772.4MiB / 1000MiB 77.24% 0B / 0B 1.95GB / 198GB 17
but when get into the contatiner and use top, the orkaudio only use 233MB
32 root 20 0 1040824 233364 21040 S 41.7 0.7 74:21.79 orkaudio
when out of the container, htop show orkaudio only use 227MB
in the container, the top show orkaudio memory usage will be stable after a while. but docker stats show orkaudio's MEM USAGE will grow always, when it hit the memory limit, docker will restart orkaudio.
the metrics container_memory_working_set_bytes from cadvisor is keep grow
I think this is could be an explainable behaviour as you confirmed the memory usage of orkaudio is stable from the top output. Given that the recorder is continuously writing recorded files, the file cache would also be reported as memory used.
Have a look at this issue moby/moby#40415 which exactly talks about this behaviour.
i set a cron job in the orkaudio container
* * * * * echo 3 > /proc/sys/vm/drop_caches
the memory usage will be release erery minute. but as long as i keep stress test it, the memory usage keep grow very slowly, about 1 mb grow every minite, but i think it is ok.
but what i realy don't understand why after the stress test, the memory usage keep a stable level (300MB) and never go down.
The behavior I expect is that after the stress test is over, the memory usage return to a lower level, not 300MB
@kingster do you use production v0.2.5 is a docker container? or just a install version?
i also test docker orkaudio:0.2.5, the memory is keep grow very slowly too
my stress test is 500 concurrent call, every second make 25 calls, every call duration is 20 seconds, media type is g711, what memory it should use?
The original docs mentions about 4 CPU cores and 4GB RAM per OrkAudio engine with up to 400 calls per engine
In the real world, oreka is much lot more CPU intensive unless you have unlimited disk and continue to record on pcmwav
format. Transcoding to any compressed format is very resource-intensive.
In our production environment, we run the native version on bare metals, given the CPU utilisation it has (we transcode to compressed ogg
format). Regarding memory utilisation, truly speaking, we never checked since we have enough memory available. I will see if I can figure out some memory growth/leaks.
my stress test is 500 concurrent call, every second make 25 calls, every call duration is 20 seconds, media type is g711, what memory it should use?
I would suggest you start with the original recommendation of 4GB, and then tweak based on your utilisation.
In my experience (we use a very customized version of OrkAudio in production, more than 500 servers), these specs are way high. Out of all the threads used in OrkAudio, only the "batch" threads use more of the CPU/IO when they need to transcode from MCF to whatever format you need (stereo WAV) in our case. That said, we will also investigate the memory leak in our version. Usually, we crank the number of batch threads in sites where the traffic is high, more than 400 calls, the magic number is 4 threads in a server with 4 cores and 16 Gb of RAM. Right now, I just check this server and the RAM usage is below 1.3 Gb, the peaks in CPU usage are due to batch threads.
Some metrics from one of our production recorder (Post Opus Memory fixes) , our memory utilisation looks sort of constant, doesn't increase and has very slight decrease .
Call Rate: ~400calls/min, avg call duration ~1min, i.e concurrent ~400 calls, being transcoded to opus codec.
System Info: 12 cores (24vcpu), 32GB memory
top - 17:05:09 up 771 days, 21:43.....
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30463 root 20 0 6.294g 4.493g 4.009g S 911.6 14.3 4186:12 orkaudio
After about few hours..,
top - 21:36:41 up 772 days, 2:15...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30463 root 20 0 6.294g 4.483g 4.008g S 598.0 14.3 6606:22 orkaudio
Following are some of metrics of the bare-metal.
i have this leakage as well it seem it's from AudioTape class which holds all the audio chunck
i test the project using SSIP and valgrind tool and it seems we have leakage in AudioTape when the audio chunck are stored but not free