dwightwatson/rememberable

Proposal: New Major Version

ameliaikeda opened this issue · 4 comments

Sticking this one in an issue so it's a little easier to keep track of.

There's a few things that would be really nice for this project that would require a new major version, and one of my PRs requires that anyway, so I thought I'd bring up discussion here.

  • Switch from md5 to sha256 for cache keys (#27)
  • Use a (configurable) cache prefix (#28)
  • Add tests for the Query Builder
  • Configure travis-ci

Thanks - I'm still thinking over the MD5/SHA256 change.

I think I'm leaning towards staying with MD5 for a couple of reasons; it looks like the potential for a collision is impossibly small, and it also begs the question - why SHA256 as the replacement? Why not SHA-1, 2 or go to the other end of the spectrum and use Bcrypt (as an example)?

If we were to leave that change aside, the other changes could probably go into a minor release. Adding a cache prefix won't break anything, just orphan some cache data to expire.

As far as tests go, I've tried to tackle that a couple of times, I couldn't find a good way to test how I've wrapped methods on the Laravel builder (like, how can I call get() on my builder and then check that it was passed up to it's parent?). A PR for that would be great, and would happily setup CircleCI/Travis-CI for that if we had a test suite.

I picked SHA-256 because it's a fast hash and widespread; bcrypt would just be utterly silly. Also, your link is talking about SHA-256, not MD5. It has a footnote saying that MD5 is trivial to cause collisions with

For the point I was going for, you have a known prefix for every query you execute (the query itself, with ? in place of bindings). If you have a known prefix between two user-controlled strings you can deliberately cause a collision, because you know the boundary where each block of the MD5 is. (For the record, what I'm describing here is a Chosen Prefix Attack. It's not really that exploitable in the slightest, but it can still be used to deliberately screw with someone on a production service.

Also, going by the stale data not being a breaking change argument, wouldn't that mean the hash change is also non-breaking? Both changes only affected the cache key.

As for tests: if I figure out a way to add them, I'll PR them :)

Haha - good catch! I was looking up MD5 articles, don't know how I missed that. I understand we're after fast hashes with less potential for collision, but why does that rule out SHA-1? Are we drawing a line in the sand by saying this is the point we're happy to tradeoff speed for collision potential? Just trying to check that if we are going to change the hash we're picking the right one to change it to.

Cool, the details of that attack vector totally go over my head, but I suppose it's one more thing we avoid by switching to a different hash. I'm not sure why you would go about such a thing with a cache, maybe to trick it into serving the result of another users cache? So I can see how reducing the potential for collision will help there too.

You're probably right, it might not be great to just dump half a user's cache in an update. There could be people with smaller cache clusters or setups that can't handle having their cache size double instantly. By making it a major version bump there's a better chance they'll we aware of the update.

Closing this as some of this stuff was implemented and you ended up forking your own version, so all good now.