SurveyMan/SMPy

Find out if we can use Jeeves to help with security/information assurance

Opened this issue · 5 comments

@mmcmahon13 : if you have the bandwidth to attend @jeanqasaur's talk in early December, you should. If you have additional bandwidth and/or interest, look into using Jeeves for the Python lib. I'll be back in the office Dec. 15.

I see the Java implementation running as a service. However, the Python front-end has potential security and privacy issues, since users may be asking sensitive questions and/or getting sensitive/identifying responses. Having some kind of information flow analysis and a plan for handling sensitive data could help with adoption -- it would be a nice feature for potential users to report when getting IRB approval.

Using Jeeves might be able to address security issues on the researcher's end. On the other side of the spectrum (i.e., from the perspective of the user), a well-known problem with collecting sensitive data is the belief on the part of the respondent that it can be traced back to that respondent. Here is a paper @emeryberger and I heard about during OOPSLA that addresses the issue in crowdsourcing. Here's a summary.

What are you thinking about using Jeeves for? Happy to talk more.

@etosch Sorry I haven't been around for a while, I've been tied up with class stuff/finals. Should be done in about a week though, then I can start looking at this. I'll be back on campus in early January, maybe I can come in at some point.

What talk is it that I should go to?

@mmcmahon13 : take a look at the Jeeves documentation. As we expand the scope of the surveyman work, we'll want to keep other aspects in mind -- so far, we've been concerned about the integrity of the survey instrument in the face of the data. Other aspects have an impact on usability, though. Surveys contain identifying information. Finding ways of protecting that information might increase adoption.

@jeanqasaur : how difficult would it be to extend Jeeves so that sensitive values are the product of a computation, rather than a value? Using the RAPPOR work as an example, I'm thinking of something that returns the true value of a variable in safe settings and a randomly chosen value in unsafe settings.

@etosch It should be straightforward to make sensitive values the product of a computation. For instance sensitive values can be functions, so one way to do it is to just store functions and propagate them. There might be nicer ways of doing it. Let me know if you want me to think harder about this!