/live-profiler

Header only library for real time performance analysis, supports c, c++, go, java, .net

Primary LanguageC++MIT LicenseMIT

Header only library for real time performance analysis

Codacy Badge Build Status license GitHub release

There are already many profiling tools on the market, but not much integrations well with other services.
So I decided to write a profiler library that can help peoples build their own profiler or APM agent.
This library is intend to be high performance and easy to understand.

Design

There are some concepts in the design:

  • Model: The model contains the data that needs to be analyzed
  • Collector: The collector collect the model data in real time
  • Analyzer: The analyzers take the model data and generate the report in real time
  • Interceptor: The interceptors alter the model data in real time
  • Profiler: The entry point class coordinate collector, analyzers and interceptors
+-------------------------------------------------------------------------------------+
| Profiler                                                                            |
|                                                                       +----------+  |
|                                                             +---------> Analyzer |  |
|                                                             |         +----------+  |
|                                                             |                       |
|  +-----------+   +------------+   +-------------+   +-------+-----+   +----------+  |
|  | Collector +---> Model Data +---> Interceptor +---> Model Data' +---> Analyzer |  |
|  +-----------+   +------------+   +-------------+   +-------+-----+   +----------+  |
|                                                             |                       |
|                                                             |         +----------+  |
|                                                             +---------> Analyzer |  |
|                                                                       +----------+  |
|                                                                                     |
+-------------------------------------------------------------------------------------+

Different to many profiler, model data is handed over to analyzer in real time,
the analyzer has the right to choose whether to incremental update the report or analysis at once,
incremental updates can reduce memory footprint and allocation, which improves performance.

The profiler should have exactly one collector and may have one or more analyzers or interceptors,
this is because some collector may use io multiplexing mechanism to wait for events,
and other may periodically polls the data, mixing them will create a lot of problems.
For now if you want to use multiple collectors you should create multiple profilers and run them in different threads.
Unlike collector, analyzers and interceptors should be non blocking, so more than one is allowed.

Each analyzer may return different types of report,
you can dump them to console, generate a graph, or send to an APM service,
anyway you should write your own code to handle the report.

Requirement

C++ compiler support at least c++14

How To Use

There are many combinations of collectors and analyzers,
here I chose the example "CpuSampleFrequencyAnalyzer" to explain,
this example program can analyze which functions have the higest CPU usage.

First, install the required packages:

  • Ubuntu: sudo apt-get install g++ cmake binutils-dev
  • Fedora: su -c "dnf install gcc-c++ cmake binutils-devel"

Then, compile and run the example:

cd live-profiler/examples/CpuSampleFrequencyAnalyzer
sh run.sh a.out 20000

It collects the running status of all programs named "a.out" in real time, and output the report after 20 seconds.
The content of the report is like:

top 16 inclusive symbol names:
No. Overhead Samples SymbolName
  1     1.71   50964 make(int, NodePool&)
  2     0.50   14860 apr_palloc
  3     0.47   13905 main._omp_fn.0
  4     0.30    8969 GOMP_parallel
  5     0.26    7776 vmxarea
  6     0.22    6496 Node::check() const
  7     0.01     279 apr_pool_clear
  8     0.01     185 main
  9     0.01     185 __libc_start_main
 10     0.00      47 apr_allocator_destroy
 11     0.00      25 apr_pool_destroy
 12     0.00      11 __munmap
 13     0.00       2 __vsprintf_chk
 14     0.00       2 mmap
 15     0.00       1 _IO_default_xsputn
 16     0.00       1 vfprintf

top 11 exclusive symbol names:
No. Overhead Samples SymbolName
  1     0.50   14860 apr_palloc
  2     0.23    6793 make(int, NodePool&)
  3     0.19    5749 Node::check() const
  4     0.04    1122 main._omp_fn.0
  5     0.01     279 apr_pool_clear
  6     0.00      47 apr_allocator_destroy
  7     0.00      25 apr_pool_destroy
  8     0.00      11 __munmap
  9     0.00       2 mmap
 10     0.00       1 _IO_default_xsputn
 11     0.00       1 vfprintf

Because this project is a library, you may be more interested in how this example program is written,
let's see the code:

#include <iostream>
#include <iomanip>
#include <LiveProfiler/Analyzers/CpuSampleFrequencyAnalyzer.hpp>
#include <LiveProfiler/Profiler/Profiler.hpp>
#include <LiveProfiler/Collectors/CpuSampleLinuxCollector.hpp>
#include <LiveProfiler/Interceptors/CpuSampleLinuxSymbolResolveInterceptor.hpp>

namespace {
	using namespace LiveProfiler;

	void printTopSymbolNames(
		const std::vector<CpuSampleFrequencyAnalyzer::SymbolNameAndCountType>& symbolNameAndCounts,
		std::size_t totalSampleCount) {
		std::cout << "No. Overhead Samples SymbolName" << std::endl;
		for (std::size_t i = 0; i < symbolNameAndCounts.size(); ++i) {
			auto& symbolNameAndCount = symbolNameAndCounts[i];
			std::cout << std::setw(3) << i+1 << " " <<
				std::setw(8) << std::fixed << std::setprecision(2) <<
				static_cast<double>(symbolNameAndCount.second) / totalSampleCount << " " <<
				std::setw(7) << symbolNameAndCount.second << " " <<
				symbolNameAndCount.first->getName() << std::endl;
		}
	}
}

int main(int argc, char** argv) {
	using namespace LiveProfiler;
	if (argc < 3) {
		std::cerr << "Usage: ./a.out ProcessName CollectTimeInMilliseconds" << std::endl;
		return -1;
	}
	auto processName = argv[1];
	auto collectTime = std::stoi(argv[2]);
	
	Profiler<CpuSampleModel> profiler;
	auto collector = profiler.useCollector<CpuSampleLinuxCollector>();
	auto analyzer = profiler.addAnalyzer<CpuSampleFrequencyAnalyzer>();
	auto interceptor = profiler.addInterceptor<CpuSampleLinuxSymbolResolveInterceptor>();
	collector->filterProcessByName(processName);
	std::cout << "collect for " << processName << " in " << collectTime << " ms" << std::endl;
	profiler.collectFor(std::chrono::milliseconds(collectTime));
	
	static std::size_t topInclusive = 100;
	static std::size_t topExclusive = 100;
	auto result = analyzer->getResult(topInclusive, topExclusive);
	auto& topInclusiveSymbolNames = result.getTopInclusiveSymbolNames();
	auto& topExclusiveSymbolNames = result.getTopExclusiveSymbolNames();
	std::cout << "top " << topInclusiveSymbolNames.size() << " inclusive symbol names:" << std::endl;
	printTopSymbolNames(topInclusiveSymbolNames, result.getTotalSampleCount());
	std::cout << std::endl;
	std::cout << "top " << topExclusiveSymbolNames.size() << " exclusive symbol names:" << std::endl;
	printTopSymbolNames(topExclusiveSymbolNames, result.getTotalSampleCount());
	return 0;
}

Function "printTopSymbolNames" is only used to output the report, it doesn't matter.
The second part of the main function is important, let's break it down one step at a time:

First decide which model to use, in this case it's "CpuSampleModel", which represent a point of execution:

Profiler<CpuSampleModel> profiler;

Next decide who provided these model data, in this case it's "CpuSampleLinuxCollector":

auto collector = profiler.useCollector<CpuSampleLinuxCollector>();

Then decide who analyzes these model data, in this case it's "CpuSampleFrequencyAnalyzer":

auto analyzer = profiler.addAnalyzer<CpuSampleFrequencyAnalyzer>();

Because "CpuSampleFrequencyAnalyzer" requires function symbol names,
and "CpuSampleLinuxCollector" only provides memory address,
a third party is need to convert memory address to function symbol name:

auto interceptor = profiler.addInterceptor<CpuSampleLinuxSymbolResolveInterceptor>();

Before start the collecting, we need to tell "CpuSampleLinuxCollector" which processes is interested,
processName can be "a.out", "python3", "java" or whatever, here it takes from command line:

collector->filterProcessByName(processName);

Now everything is ready, start collecting the data for the specified time.
Function "collectFor" can be called multiple times, and the data will be accumulated.

profiler.collectFor(std::chrono::milliseconds(collectTime));

Finally, enough data has been collected, we can start the analysis,
different analyzers give different types of results,
"CpuSampleFrequencyAnalyzer" will give the top inclusive and exclusive symbol names:

auto result = analyzer->getResult(topInclusive, topExclusive);

To compile this code, use the following command (also see it in run.sh):

g++ -Wall -Wextra --std=c++14 -O3 -g -I../../include Main.cpp -lbfd

Now you should be able to write a minimal profiler,
you can find more detailed information from the following documents.

Documents

Environement Setup & Testing

Profiler

Models

Collectors

Analyzers

Interceptors

Coding Standards

You should follow the rules below if you want to contribute.

  • Use tabs instead of spaces
  • For class names, use camel case and start with a upper case (e.g. SomeClass)
  • For function names, use camel case and start with a lower case (e.g. someFunction)
  • For local variable names, use camel case and start with a lower case (e.g. someInt)
  • For global variable names, use camel case and start with a upper case (e.g. SomeGlobalValue)
  • For class member names, use camel case and start with a lower case and ends with _ (e.g. someMember_)
  • Write comments for every public class and function, make code simple
  • Exceptions thrown should be based on ProfilerException, and the message should contains function name
  • Avoid memory allocation as much as possible, use FreeListAllocator to reuse instances

License

LICENSE: MIT LICENSE
Copyright © 2017 303248153@github
If you have any license issue please contact 303248153@qq.com.