formigarafa/zxcvbn-rb

Thread safety issue

zarqman opened this issue · 1 comments

Hi there, and thanks for building this gem. It's great to have something in native Ruby that's fully compatible with the JS version!

It seems that the gem isn't currently thread-safe. Specifically, mutating the constant Matching::RANKED_DICTIONARIES in

def self.user_input_dictionary=(ordered_list)
ranked_dict = build_ranked_dict(ordered_list.dup)
RANKED_DICTIONARIES["user_inputs"] = ranked_dict
RANKED_DICTIONARIES_MAX_WORD_SIZE["user_inputs"] = ranked_dict.keys.max_by(&:size)&.size || 0
end
which is called from https://github.com/formigarafa/zxcvbn-rb/blob/master/lib/zxcvbn.rb#L21 can result in RANKED_DICTIONARIES containing data for a different thread if 2 threads call zxcvbn in parallel.

This is only an issue when the user_inputs argument to Zxcvbn.zxcvbn() is used. When it's always the default value of [], then it won't matter.

I'm guessing this comes from the original JS code, which isn't concerned about multi-thread access while running in a browser.

My first thought would be to rework parts of Matching to no longer be a singleton, so that @user_inputs could be stored separately per caller. Alternatively, the call to Zxcvbn.zxcvbn() itself can be wrapped with a Mutex (which is also a viable workaround for the existing gem version).

Yes, you are correct.
This is a behaviour inherited from the original source code.

On server side this starts to become a concern on the very rare case you have many user registrations to validate in parallel.
I saw some valid reasons to have it also working on login but at this stage the password already exists and in general there is no much more than username to be used. In which case user_inputs just can't add much.
Other people use the tool to score generated passwords, again, no use for user_input.

I thought about tackling this from several angles:
Using a disposable tester instance like zxcvbn-ruby have available.
Or make it more functional style where every execution would have it's own immutable scope.
The mutex option I dismissed, because the lock would need to be held while until the execution of the function has finished and the function can be very slow depending on the password.

I also agree with you that only parts of Maching module is affected but I take this further only RANKED_DICTIONARIES['user_inputs'] is affected.

It should not be very difficult to pass the result of build_ranked_dict(user_inputs) around where it is needed.