given a list of users, their limits, see if the user can continue with that request.
keep a queue of requests and users. on every request, remove the elements that are not within the last hour. Update the users' current limit users' limits are kept in a dictionary.
Alternatively keep requests of users in separate queues, remove the elements that are not within the last hour
One optimization is to only update the current request count when we hit the limit. Another optimization is to use a index on the queue to search the time with extra O(n) space Investigate use the fast pointer algorithm: http://www.ijcsit.com/docs/Volume%205/vol5issue02/ijcsit20140502215.pdf OR skip list??
Once the hour limit is found, drop all the elements before it.
Brute force: checks each entry in the queue. Tracking this in a db will introduce a huge response delay. It makes sense to aim towards an in-memory centric application with scheduled backups for redundancy.
Losing cache locality with using a queue and a dictionary. Consider implementing an extension module hash table and std::vector
Findout about the metrics of user behavior: Does the limit ever change, if not how often does the number of users change.
How to introduce a new user
Make some back of the envolope calculations assuming the profile of the requests.
Memory footprint looks like; for 1M users with 10K max limits: around ~80GB. sounds reasonable but how could we distribute the load?
consistency: will all executions of reads and writes seen by all nodes be atomic or linearizably consistent.
availability: will a request made to the data store always eventually complete. System is said to be available iff all messages have a valid response but allows unbounded amount of delay.
partition tolerence: network(internal messaging between distributed data) is allowed to drop any messages. partiton ex: if the system is dependent on a single component and it fails, messages between some components will be lost/dropped.
CAP gaurantees that there is some circumstance(let's call it critical condition) in which a system must give up C or A. It doesn't say how likely the critical condition is. A single inconsistent read or unavailable write means that we reached it Until critical condition system can be available and consistent.
You cannot build or choose partition tolerence. if your system may experience partitions, you cannot always be C and A
YOU CANNOT CHOOSE IF YOUR SYSTEM WILL HAVE PARTITIONS. EITHER IT WILL OR IT WONT depending on the topology and dependence. If you have a single application with a single dtabase, you can say you have partition tolerence but then it wouldn't be a distributed system eh hence YOU CANNOT CA in distributed systems!! only cp or ap
Usually then, choose between consistency and availability.
Replicate data so that we have high availability, we now have the problem of synchronization.
Put some cache in front of it. Consider caching strategy. Keep a hot cache of more active users for example. Go nuts.
Put some smart routing(LB) depending on your users.
hw/vm elasticity / performance metrics? (auto instance invocation upon increased load)
Imagine wrapping this as a middleware. If you want to globally limit the rates, simply introduce a single user. for group limiting, add groups as users
How to deploy this middleware? Database migrations Start with a single application server instance, monitor and make sure the behavior is matching the expected calculations slowly let it in.
Specify time frames for every limit but this version excludes them for brevity.
Reconsider load distribution if time frames and max rates change.