realrolfje/anonimatron

Dont randomize the first charachters

Closed this issue · 23 comments

Hi,

great tool, works fine!
Unfortunately I miss a feature. I would like to keep the first 3 chars/digits in a string.

5550123456789 --> 5551743025698

Is there any way to achieve this with the current code? If not, is there any plan you will deploy this feature later?

Cheers,
T

Hi Balogh, Thanks for trying out Anonimatron! The feature you are asking about is something that can be easily realized by making a simple Anonymizer and puting that on the classpath of Anonimatron. I haven't documented it very well, this may be a good trigger to do just that.

If you have a bit of Java knowledge, you should be able to build a class like this which can generate the strings you want.

Let me know if this helps.

Hello. Can I take care of this task?

You are more than welcome to. If you need help just give a shout!

Hello. Can I take care of this task?

Thank you that would be great. Let me know if I can help in testing.

Hi @realrolfje
please give me advice - should I create new class for that function or just add new method to CharacterStringAnonymizer class?

Personally I'd start with a separate class, maybe use the CharacterStringAnonymizer as a superclass to get you started. In our case, I see we also need digits.

Maybe @baloght can tell us what type of data this is, like "phone number" or "address" or "name", so we know if we need only digits, only upper case characters, or mixed case.

#77

@baloght Does it meet your needs?
@realrolfje Can you give me code review?

Great work @BElluu , I've added my comments in a code review.

@realrolfje where can I see you comments? Beucase in my pull request I do not see any conversation. In files changes do not see any comments too :)

---EDIT---
nvm. I see your comments. I will check that tomorrow. Thanks! :)

Yes sorry my fault, I posted the comment before I saved the review. Take your time, I haven't got a release schedule to keep :-)

Hi @realrolfje Sorry but I only had time today to improve it. Could you verify if I did it correctly?

phone number

Hi, sorry for the late reply. The column contains phone numbers so data always will be digit.

Hi,

thanks for working on it.
Just a note:
Correct me if Im wrong but based on the pull request the digits are changed to totally random values.
In this case if we randomize the same phone-number at the same run we get different results.
I would like to keep the original RANDOMDIGITS behaviour where you always get the same result with the same salt string.
Wouldn't be easier to extend type RANDOMDIGITS with parameter options?

Hi Baloght, Getting the same output for the same input is handled by Anonimatron, not the anonymizer. This is because it knows (loads) it's synonyms for each run.

Working on getting it working for you. It generates consecutive digit strings now, keeping the first x digits of the original string. I have two questions:

  1. How would you call this, e.g. what are you anonymizing, is it a "Finland Phone Number" for instane?
  2. Is the format always consecutve digits, or are the dashes and spaces possible, and do you want those in the output?

I would like to call the function for msisdns (phone numbers from different countries) where the first x digits comes from the country code and provider ID. I want to keep the original country codes and provider IDs.
eg: 436701234567
36201234567

In my case the format is always only digits, but if you think it could worth to add other options for special characters. Maybe those will come in handy for others, and the tool would be even more customizable.

Maybe you don't like the idea, but I think that would be a nice feature, if users can add pattern as parameter for a given anonymization type.
For example this could be the pattern: XXXOOXXOOO
where X means keep the original character and O means replace to anonymized character.

Then the digits 5553433478 would be 5552133659 after the anonymization. I think this method would be extremely customizable instead a 'keep first or last 5 characters' method.
But again, this highly customizable feature is not needed for me, just an idea.

That is actually a brilliant idea! It makes it more flexible and also fixes my original question: If there is a dash or space in there, you can just mask it out and it will not be replaced with a number. I'll see if I can change it to your suggestions. I need to rename the class and type too, it will be worth it I think.

I added it as a feature to the DigitStringAnonymizer we already have. Have a look at the javadoc of that method, does that look usable for you?

Yes, it seems pretty fine.
I will try to build and test it in the next day.

@realrolfje I saw you merged my branch. Sorry but I was too busy to doing something ;/

Hi @BElluu no problem, I was just a bit impatient, sorry ;-) Did you see Balogh's ideas about extending the DigitStringAnonymizer? It is a more flexible solution to Balogh's request. I may re-implement the loop to be faster, but I think this will do the trick. If you check out the feature/partial-character-anonymizer-cleanup branch you can play with it and let me know what you think.

Re-implented the loop: 3 times faster :-) Happy with that.

Released! Enjoy your new Anonymizer!