TumblrArchive/TMCache

Why do the synchronous methods use dispatch_async?

plivesey opened this issue · 5 comments

I am writing a library to combine different cached responses so it will need to do multiple cache requests per method. So, I wanted to go to another thread just once, run all my cache blocks and then return. It's pretty important that this library is fast.

But, the synchronous methods in TMCache all seem to actually run in a queue. Is there a good reason for this? Why don't they just run on the thread they were called on? I don't want to wait for other cache requests to be completed, I want the data ASAP.
It seems like this is maybe to run everything on the same thread, but what in this library is not thread safe?

Thanks.

They don't belong, as the library does indeed need to enforce its own thread safety via queue confinement.

Yeah, so I understand the motivation. But is there a specific reason why the code is not thread safe by default? It seems relatively trivial to make the in-memory cache operations atomic and the disk reading should be thread safe since NSFileManager is thread safe?

Is it just to make the code easier to write?

Thanks for your help btw.

jstn commented

The asynchronous methods are the "real" methods, the synchronous methods are just convenient sugar to make your code cleaner when you need to wait on a response. Everything has to happen on the queue to keep the cache consistent and prevent contention.

However, in practice you'll rarely need to wait on the queue. Reads are concurrent and extremely fast, and the queue is only blocked when there's a write.

If you're not concerned with thread safety you might be better off with vanilla NSDictionary and the file manager. Otherwise, coalescing responses would be a great job for dispatch_group()

Haha. I'm definitely concerned with thread safety...I just was thinking that usually I expect serial methods to give control over the threading to the caller.
However, I think I was wrong to think this was a good idea for a few reasons:

  1. TMCache seems to be as parallel as possible while retaining consistency.
  2. I was worried that there would be a performance hit by lots of dispatch_async calls (spinning up threads is expensive). Turns out this was misguided as dispatch_async does not create a new thread every time you call it but runs off a shared thread pool (from my understanding).
  3. The threading is more complex that I imagined, and as a caller I shouldn't assume that I can do it correctly.

Anyway, most of this question was motivated by wanting to do batch requests quickly. I could see how the library may be able to handle batch requests in just 'one request'. As in, it doesn't need to dispatch_group and submits less blocks to GCD. The library could also expose an API method that did the grouping logic for the caller for convenience.
However, from what I've read above, this doesn't seem necessary for now. I'll test it out with my use case, and if I find any performance problems that could be solved by smarter batching, I'll open another issue to discuss. I doubt that will be the case though.

Since my question was answered, I'll close the issue for now.
Thanks.