LogDevice client API read slow than expected
zhengxiaochuan-3 opened this issue · 9 comments
I use bufferwriter to write logs to logdevice, one line one record per second, write for 14 hours, and then read it through the ldcat tool, total 5MB data takes 12 seconds.
Is this normal? What I'm missed?
std::unique_ptr<facebook::logdevice::Reader> reader = client->createReader(1);
...
reader->startReading(log, facebook::logdevice::LSN_OLDEST, until_lsn);
...
ssize_t nread = reader->read(100, &data, &gap);
...
Should I use multi-threads to read, as one thread per hour-range, in my client program?
looks like the log has many small records (14*3600=50400), so it could be bottlenecked by moving the flow control window between client and server (requires 1RTT for moving the window). One thing you can try is to increase the read buffer/window size, which can be done either through: 1) specify the buffer_size in Client::createReader()
, or 2) set the client-read-buffer-size
in Client setting (the default is 512): e.g., ClientFactory().setSetting("client-read-buffer-size", "8192").create(...).
Also, if you don't care about append/delivery latency, you can try batch more on the buffer writer by setting a bigger time trigger to get a bigger batch size which may improve efficiency.
My bufferWriter time trigger is 1 second, and I do care about append/delivery latency.
I set facebook::logdevice::ClientFactory().setSetting("client-read-buffer-size", "8192").create( );
,and read still costs 12 seconds. @runemaster
Set time trigger to 5 seconds, read costs 2 seconds to read all the records(50400).
I can only do this for now.
So it seems like the issue that you're having a lot of small records as @runemaster said earlier :)
But this is a compromised way, the biggest latency is 5 second, users can not get real-time logs.
@MohamedBassem
Ok, I need some debugging metrics from you in the old mode (1 second time trigger with client-read-buffer-size set to 8192) to be able to debug what's going on.
- How long does it take you from the start of binary until you reach the read loop?
- How long on average do you stay blocked in the
read
function call? (basically how long are you waiting for data from logdevice). - How many gaps do you encounter?
- Do you mind also try calling
waitOnlyWhenNoData()
on your reader before starting to read?
I add some timestamps, find that the reader->read(100, &data, &gap);
costs most of time.
Change read100 to read1000, make no change for all records read time.
Add waitOnlyWhenNoData()
make no change too.
Hey, @zhengxiaochuan-3 . Is this issue still still affecting you?
In case it is, it would be helpful to know your current settings both for the specific log you're reading and for the cluster. It would also be great to know the types of gaps you are seeing (if you're seeing them).
There is a chance that a slow node might be increasing latency. In that case, it is possible that you will see improvements if you call reader->forceNoSingleCopyDelivery()
. Note, however, that this increases network utilization since you will get all replicas of every record.