jvanasco/pyramid_session_redis

Make sure you don't have race conditions or at least document it if you do [session locking]

a3kov opened this issue · 7 comments

a3kov commented

Just found your fork.
Original version is naive about concurrency and doesn't implement any locking.
I haven't looked at your code but from what I understand you implemented batch updates.
Batch updates significantly increase race condition windows so I thought it would be good idea to mention it here.
ericrasmussen/pyramid_redis_sessions#80

Thanks for the tip. I'll update the documentation that race conditions are possible ASAP - though this is possible/likely in all but a few session library implementations across frameworks/languages. I personally handle race conditions for most "actions" in developer code, via timestamps and "tickets".

A few things worth noting:
• the 'change detection' feature in this library will minimize most rewrites (this library only rewrites the session if a change actually is made, eliminating rewrites).
• locking on sessions can be extremely tricky and cause odd situations. many people use tabbed browsing/multiple windows/async javascript/etc. locking a session for a request can cause pileups; locking a session for a unit-of-work can cause different browser "instances" to affect one another.

a3kov commented

this is possible/likely in all but a few session library implementations across frameworks/languages

This is not true. Most frameworks have open/close or similar interface methods where implementation can lock/unlock. Even Beaker in the Python world has pretty advanced locking implementation. Pyramid ISession interface is just a naive minimalistic version of what a proper session should be, that's why the original author of this library did this mistake I think.
I have impression that Pyramid in general favors minimalism over everything else.
I tried to change the situation but the core developers don't like the idea Pylons/pyramid#3041

Those are all server-side sessions. Pyramid's session implementation is minimalist, because the project leaders insist on supporting client-side and server-side equally (and they tend to personally favor client-side).

Thats why you don't see hooks for common standards in server-side sessions, like accessing the "session id". There was a long argument a few years back where those of us who use server-side sessions wanted ISession to have a standard session_id property, for implementors to write to. That would make it easier to swap backends/plugins. We lost, with the winning argument being "cookie-based sessions don't have session id".

Compared to libraries that handle server+client side (which is how I know think of sessions), it is very rare to see locking. Many newer server-side libraries and plugins are also focused on eliminating locking, because it often creates unnecessary bottlenecks (esp in PHP) and most people don't want to lock.

Anyways, in terms of Python projects...

Beaker's hooks for lock/unlock have been marked for deprecation for quite some time (7+years, at least) and I'm not sure they will work to handle race conditions. Looking quickly at their tests, I don't see that being explicitly tested (and they historically have had insanely good test coverage for use-cases). The automatic locking on the session portion of code is limited to the read/write units of work -- not the duration of a session. The advanced bits of their locking code are focused on dogpile effects for caching (much of which was rewritten in the dogpile library).

Django's sessions didn't lock for a long time; looking at their code, that's still not supported in the default shipment. The various flask server-side session implementations i've seen don't lock either.

a3kov commented

Well if someone ignores locking then they just ignore the problem, it will not go away. If a user will open several tabs and they load one by one the world will not end, but if he opens several tabs and his session is losing data is much worse in my opinion. Developer can adapt by storing non-important data but that just defeats the whole idea a bit imho..you can always use cookie directly for non-important data and it's perfect for that

It's really about having the right tool for the right job. Choosing a session strategy and implementation is not a one-size-fits-all concept.

Many developers would rather have the performance improvements of non-locking sessions in a read-heavy environment. If you're in a write-heavy environment or dealing with transactions, advanced locking is definitely something you want to look into.

Encrypted cookies are good for small payloads of data. However large payloads of read-only data can be costly to decrypt and add to traffic concerns. Some sensitive data you may not want to share client-side either.

The aggregate size of cookies sent to your domain can be a problem too. I work a lot with large online publishers. The effect of large cookie payloads is often a problem with performance and scaling. A single 4k cookie can make ajax API optimizations meaningless. Multiple 4k cookies will still pound a load-balancer on every hit to an unchanged front page. (The amount of sites I've seen have to battle this problem is HIGH)/

a3kov commented

Btw, I suspect it might be easy to implement optimistic concurrency control for redis. Just need to make sure that session version when flushing the batch is same as it was when reading at the beginning, and to raise a retryable exception for pyramid_retry otherwise. This way there's no locking and reads will not wait or conflict. Redis doesn't have "compare and set" but recommends to implement optimistic locking using WATCH.

docs updated on #b8f25e402a81696c8127b1c9d7e1d1a8bbe12a26