openstreetmap/mod_tile

CPU/IO not fully utilized when under heavy load

pnorman opened this issue · 8 comments

The OSMF got a new tile server, and in scaling it up, I've observed some odd behavior.

When I throw a heavy load onto the server, a queue builds up. This is semi-expected, but the average time per tile, as computed by TimeRenderedZoom/ZoomRendered becomes approximately 4.2 seconds. Normally it ranges from <1s for high zooms to 5-10s for zoom 13.

This can be seen in prometheus

image

With a bit of work I was able to see this with a partially filled queue
although in this case it's just the high zooms.

image

Looking at the code that produces the render time and number of tiles stats, they're set in

if (render_time > 0) {
switch (request->originatedQueue) {
case queueRequestPrio: {
queue->stats.timeReqPrioRender += render_time;
break;
}
case queueRequest: {
queue->stats.timeReqRender += render_time;
break;
}
case queueRequestLow: {
queue->stats.timeReqLowRender += render_time;
break;
}
case queueDirty: {
queue->stats.timeReqDirty += render_time;
break;
}
case queueRequestBulk: {
queue->stats.timeReqBulkRender += render_time;
break;
}
}
queue->stats.noZoomRender[request->req.z]++;
queue->stats.timeZoomRender[request->req.z] += render_time;
}
with render_time coming from
render_time = t2 - t1;
, which is essentially the call time of maps[i].store->tile_stat and render, in ms.

I'm somewhat stumped here, but a few thoughts are

  • this is limiting render capacity well below what it should be
  • the machine has idle capacity throughout, particularly CPU
  • why 4.2s?
  • how does the average zoom 13 time per tile decrease?

Tied in with this is the fact it's starting to queue tiles when CPU is under 50%, while num_threads is approximately equal to the number of hardware threads.

In an attempt to isolate what's going on, I left the load normal and used render_list to render z17 tiles. It started off behaving as above, with 4 seconds per tile rendered and CPU usage <50%. Then without changing anything, throughput spiked up, time/tile went down, and CPU usage went to 100%.

image

render_list was with -n 80 and num_threads is 120, and the machine has 80 hardware threads

In an effort to reduce the problem down, I shut off apache2/mod_tile and tried render_list with an assortment of -n. Increasing -n doesn't increase throughput or CPU usage, except that sometimes it goes from not running at capacity to generating more tiles and being CPU bottlenecked.

I'm wondering, could some kind of lock be limiting how many requests per second it can deal with? But that wouldn't explain the bursts of good performance.

image
image

I didn't change anything, but sometimes it goes from 28 tiles/second to 92 tiles/second

I wonder if this is related to

while (qLen == QMAX) {
int ret = pthread_cond_wait(&qCondNotFull, &qLock);
if (ret != 0) {
fprintf(stderr, "pthread_cond_wait(qCondNotFull): %s\n", strerror(ret));
}
}

This limits the queue size to 32, which might be too low modern hardware. But as far as I can tell, this file is only included by the render_* helper programs.

There are also a bunch of constants for mod_tile that might not make sense on modern hardware:

#define DELAY_HASHTABLE_SIZE 100057
#define DELAY_HASHTABLE_WHITELIST_SIZE 13
/*Number of tiles in the bucket */
#define AVAILABLE_TILE_BUCKET_SIZE 5000
/*Number of render request in the bucket */
#define AVAILABLE_RENDER_BUCKET_SIZE 65
/*Number of microseconds per render request. Currently set at no more than 1 render request per 5 seconds on average */
#define RENDER_TOPUP_RATE 5000000l
/*Number of microseconds per render request. Currently set at no more than 1 request per second on average */
#define TILE_TOPUP_RATE 1000000l
, or the queue size constants
#define QUEUE_MAX (64)
#define REQ_LIMIT (32)
#define DIRTY_LIMIT (1000)

We did some debugging on piasa today.

Increasing QMAX to 256 had no impact. Increasing REQ_LIMIT and DIRTY_LIMIT to 256 and 8000 had no impact. The following patchset had no impact
download.compton.nu_tmp_queue-concurrency.patch.txt.
The testing with the patchset and with the REQ_LIMIT and DIRTY_LIMIT changes was not identical to the previous testing, because it was built with march=native.

Running for longer established that the spikes of working at 100% happen every 8.4 minutes. In tests on the 18th, they were happening 7.25 minutes, and on the 15th, every 7.5 minutes.

image

I attached gdb and got some useful information with a backtrace of all the threads. Of the 81 threads, 61 of them were at futex_wait within mapnik::freetype_engine::create_face_impl, inside the rendering of an image. 9 threads were doing other Mapnik work, 4 were in __pthread_cond_wait_common with no Mapnik calls, 2 were within PQgetResult, and the last two were overall renderd threads.

Thread 80 (Thread 0x7f97cbfff640 (LWP 495657) "renderd"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55a35fb6c26c) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55a35fb6c26c) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55a35fb6c26c, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f98de693ac1 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55a35fb6c218, cond=0x55a35fb6c240) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x55a35fb6c240, mutex=0x55a35fb6c218) at ./nptl/pthread_cond_wait.c:627
#5  0x000055a35e730a1b in ?? ()
#6  0x000055a35e7343cc in ?? ()
#7  0x00007f98de694b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#8  0x00007f98de726a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81


Thread 78 (Thread 0x7f97dd7fa640 (LWP 495655) "renderd"):
#0  futex_wait (private=0, expected=2, futex_word=0x7f98df96b060 <mapnik::singleton<mapnik::freetype_engine, mapnik::CreateUsingNew>::mutex_>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7f98df96b060 <mapnik::singleton<mapnik::freetype_engine, mapnik::CreateUsingNew>::mutex_>, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007f98de698082 in lll_mutex_lock_optimized (mutex=0x7f98df96b060 <mapnik::singleton<mapnik::freetype_engine, mapnik::CreateUsingNew>::mutex_>) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7f98df96b060 <mapnik::singleton<mapnik::freetype_engine, mapnik::CreateUsingNew>::mutex_>) at ./nptl/pthread_mutex_lock.c:93
#4  0x00007f98def6cf13 in mapnik::freetype_engine::create_face_impl(...) () from target:/lib/libmapnik.so.3.1
#5  0x00007f98def6d3a1 in mapnik::freetype_engine::create_face(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mapnik::font_library&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::pair<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::unique_ptr<char [], std::default_delete<char []> >, unsigned long>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::pair<std::unique_ptr<char [], std::default_delete<char []> >, unsigned long> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::pair<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::unique_ptr<char [], std::default_delete<char []> >, unsigned long>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::pair<std::unique_ptr<char [], std::default_delete<char []> >, unsigned long> > > >&) () from target:/lib/libmapnik.so.3.1
#6  0x00007f98def6d817 in mapnik::face_manager::get_face(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from target:/lib/libmapnik.so.3.1
#7  0x00007f98def6dc63 in mapnik::face_manager::get_face_set(mapnik::font_set const&) () from target:/lib/libmapnik.so.3.1
#8  0x00007f98def6de9e in mapnik::face_manager::get_face_set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::optional<mapnik::font_set>) () from target:/lib/libmapnik.so.3.1
#9  0x00007f98df333038 in ?? () from target:/lib/libmapnik.so.3.1
#10 0x00007f98df32dcc6 in ?? () from target:/lib/libmapnik.so.3.1
#11 0x00007f98df32ea31 in ?? () from target:/lib/libmapnik.so.3.1
#12 0x00007f98df32eceb in ?? () from target:/lib/libmapnik.so.3.1
#13 0x00007f98df338868 in ?? () from target:/lib/libmapnik.so.3.1
#14 0x00007f98df3faf45 in ?? () from target:/lib/libmapnik.so.3.1
#15 0x00007f98df74817a in mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4>::process(mapnik::text_symbolizer const&, mapnik::feature_impl&, mapnik::proj_transform const&) () from target:/lib/libmapnik.so.3.1
#16 0x00007f98def67885 in mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4> >::render_style(mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4>&, mapnik::feature_type_style const*, mapnik::rule_cache const&, std::shared_ptr<mapnik::Featureset>, mapnik::proj_transform const&) () from target:/lib/libmapnik.so.3.1
#17 0x00007f98def68225 in mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4> >::render_material(mapnik::layer_rendering_material const&, mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4>&) () from target:/lib/libmapnik.so.3.1
#18 0x00007f98def68eee in mapnik::feature_style_processor<mapnik::agg_renderer<mapnik::image<mapnik::rgba8_t>, mapnik::label_collision_detector4> >::apply(double) () from target:/lib/libmapnik.so.3.1
#19 0x000055a35e7350c7 in ?? ()
#20 0x00007f98de694b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#21 0x00007f98de726a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Hi @pnorman, I'm seeing a pull request which seems to address this issue, #334. As well as another related pull request, #335. Let me know if you have yet had a chance to try out those changes and if they have resolved this issue for you so that this issue can be closed out (or not.) I am currently planning to get a new release put together soon and so just wanted to gather some information and close some issues & pull requests out beforehand.

Ya, we're seeing much better CPU usage now. #334 was the big change, #335 helped minorly.