Introduction The aim is to provide process specific data without interfering with the operation of the process. Description This script is written in python and intended to be run as a tcollector under Linux. All of metrics produced are calculated by parsing various files under /proc as such it should be real time, see the man page for proc(5) What this script collects depends on the configuration passed to it. Metrics The metrics collected follow the prefix proc.stat.ps and are as follow : proc.stat.ps.io rchar: characters read The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read(2) and similar system calls. It includes things such as terminal I/O and is unaffected by whether or not actual physical disk I/O was required (the read might have been satisfied from pagecache). wchar: characters written The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar. syscr: read syscalls Attempt to count the number of read I/O operations---that is, system calls such as read(2) and pread(2). syscw: write syscalls Attempt to count the number of write I/O operations---that is, system calls such as write(2) and pwrite(2). read_bytes: bytes read Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block- backed filesystems. write_bytes: bytes written Attempt to count the number of bytes which this process caused to be sent to the storage layer. cancelled_write_bytes: The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. In other words: this field represents the number of bytes which this process caused to not happen, by truncating pagecache. A task can cause "negative" I/O too. If this task truncates some dirty pagecache, some I/O which another task has been accounted for (in its write_bytes) will not be happening. eg. cluster=dev42 image=httpd type=write_bytes proc.stat.ps.fd Open file descriptors proc.stat.ps.threads Number of active threads for the specified process. proc.stat.ps.net Number of ESTABLISHED connections of the following types : tcp udp unix (unix domain sockets) eg. cluster=ams2 class=nxa image=java type=tcp proc.stat.ps.pcpu Percentage CPU used. The calculation is based on a time sample over 1 second. proc.stat.ps.mem total - total program size - (Virtual memory size) resident - resident set size eg class=pxy cluster=ams3 image=java type=total Configuration By default the configuration file is located at /<class>/shared/conf/psstat.conf and should be owned by appsc The format is as follows : imagename<white space>regex The regex is passed verbatim to the python regex engine, so please be careful. It is matched to full command line of the running process. Test your regexs here http://regex101.com/ (select python from the drop down) eg. fi/lrh lrh.*fi/ In TSD you would then be able to do : image=fi/lrh The collector will notice changes to the configuration file and reload it automatically, a restart is not needed. If you get unexpected results look at the logs located here : /var/log/tcollector.log For more information set : DEBUG = True