semgrep/bento

Information for starters

rvanlaar opened this issue · 10 comments

I hope your product becomes a success

Is your feature request related to a problem? Please describe.

This project will need a lot of work for me to start using it. There are too many loose ends,
mainly in the documentation. I expect better from a security company.

Describe the solution you’d like
A clear and concise description of what you want to happen.

  • Doc: Add how it is installed, what it installs and where it installs
  • Add option to not use tracking feature
  • Doc: Explain when it is run, and how it is run,
  • Doc: Explain that is does a ?git stash?
  • Add option to work on non staged files, why wait for check in time
  • Add option that lists all enabled checks
  • Use ~/.config/ directory instead of a new ~/.bento one

+1 on how to disable the tracking.
For me at least, it looks like the tracking server is down anyhow: ```

urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='bento.r2c.dev', port=443): Read timed out. (read timeout=1)

Thank you for filing a ticket and giving us this feedback @rvanlaar (and @rotten)!

On the telemetry side, is it a "hard no" or is there something specific about what is tracked that makes you uncomfortable (details in PRIVACY.md)? Have you seen any other tools that you think have gotten this "right" that we might model ourselves after? For our early users we all agreed on the utility of this data to develop the tool, but we're happy to revisit the topic with more folks adopting Bento. These comments are timely!

Have you both seen semgrep? We've put most of our energy there these past few weeks and are currently getting Bento into a state where it uses semgrep as its underlying engine. Telemetry changes can happen concurrently.

@rvanlaar re other feedback on docs and misc. improvements, I'll open up separate tracking tickets.

The tracking makes me uncomfortable without being able to actually see what it is sending in spite of the pretty language in the privacy statement. I suppose I could read through the source code or catch the traffic on its way out, so it is merely obfuscated instead of secret (like some other products we use every day). Documentation that included a sample tracking report would still be helpful and go a long ways. Also could probably not use this on a "real" code base without being able to disable it for more sensitive employers and clients. Lastly, the whole thing seems to blow up if your internet connection is down when you run it.

Thanks for addressing these points.

The repo hash and commit hash felt not good, especially since it's not shown how it's hashed. The IP address made it into a hard no.

The Privacy statement is also bad in that regard. It mentions general 'Usage data'.
Only after looking at the actual JSON does it tell you it exposes the client_ip.

In the end it is about trust. Do I trust a security company that is not upfront about collection User Identifiable data? No.

@rotten I removed tracking in this branch: https://github.com/rvanlaar/bento

@rvanlaar I spoke with the other maintainers and we're all open to making telemetry (ideally) opt-in or removing it all together. Up for opening a PR against Bento for us to discuss and merge? Happy to iterate async over a PR or jump on a call to talk through the changes. Looking through your fork I think we'd want to keep the logic that checks for and alerts when a new version of Bento is available, it has proven really helpful for users.

In the end it is about trust. Do I trust a security company that is not upfront about collection User Identifiable data? No.

@rvanlaar I appreciate this feedback. It pains me because of how much we've discussed privacy amongst the maintainers and the work we did to broadcast to prospective users what, when, and where we collect information. Do you feel like you were led to use the tool without having an opportunity to review telemetry?

If you're open to making a PR that improves PRIVACY.md I'd review and incorporate your edits.

I could read through the source code or catch the traffic on its way out, so it is merely obfuscated instead of secret (like some other products we use every day).

@rotten Our hope was that public source, a clear data policy, and an explicit agreement upon new installation didn't feel like obfuscation. It sounds like we can do better. On the secret telemetry side, the maintainer group mostly use Little Snitch and have found it helpful.

Documentation that included a sample tracking report would still be helpful and go a long ways.

@rotten Agreed. Any thoughts to improve PRIVACY.md#examples?

Also could probably not use this on a "real" code base without being able to disable it for more sensitive employers and clients.

+1, opt-in or no telemetry seems like the way to go.

Thank you both for the continued discussion.

Searching for examples of other OSS tools with telemetry, what do you think of Gastby's Telemetry and Brew's Analytics write-ups?

I do agree that a version check is a good thing, just make it easy to silence.

Gatsby's seems clear. They don't require an e-mail and an ipaddress.
Same for Brew, they even include how to change your installaion id.

About merging, feel free to use my code.

I'm also interested in contributing code to semgrep (which uses bento) but this telemetry thing just put a hard brake on my intentions.

I'm willing to be a participating community member that shares ideas, bugs and time but I'm not willing to have this calling home no matter what the data is.

Opt-in or at least a very easy opt-out (like dotnet that can be disabled with a DOTNET_CLI_TELEMETRY_OPTOUT=1 env variable) would be great.

I appreciate the transparency and I get that this is 100x better than all the tools that do it silently, but there has to be a way to disable it.

Thanks for the feedback @dee-see. For clarification, Semgrep itself doesn’t have any dependency on or use Bento.

We do use Bento in the Semgrep GitHub Action, but that’s being rewritten and won’t be true in the near future. Is that what you were referring to?

I agree with your points on opt-in or easy-to-opt out telemetry. Based on all of this feedback we're moving away from mandatory or opt-out telemetry.

I was referring to this page that makes it seem like a build requirement.

Glad to hear about the new direction!