psf/gh-migration

Notify bpo users once the migration is done

ezio-melotti opened this issue · 16 comments

After the migration, we should send users an email informing them that the migration happened and listing issues that have been created by them, assigned to them, and followed by them.

In order to do this, we should write a tool that goes through all the users, gathers the data, formats the messages, and sends them to the users. This summary could be made available on bpo too, and the email could contain a link in addition to or instead of the lists (these summaries are already available in the sidebar for logged-in users).

This works well for occasional contributors that are involved in less than one or two dozens of issues, since they can go through them, review them, and resubscribe manually (if they are still interested). However it doesn't scale too well for people that follow hundreds of issues, and having a way to resubscribe users to issues that they were following would be better (see #5 under "nosy list").

Even if we find a way to preserve the nosy list during the migration, the email would still be useful because:

  • it will inform every bpo member that the migration happened (unless we want to limit the recipients to people that have been active recently)
  • it will give them a chance to review and possible update their old issues
  • it will provide a quick way to find their issues on GH, even if they lost access to bpo

The exact wording and format of the email still needs to be determined:

  • it should include a paragraph or two explaining that the issues have been migrated.
  • it should include 3 lists: created by you, assigned to you, followed by you. These matches the 3 summaries available in the sidebar of bpo. Links to the summaries and total number of issues in each list could also be included.
  • it should include the issue titles, links to the GH issues, and possibly links to the corresponding bpo issues too (not essential since they are already linked at the top of the GH issues).
  • it should encourage the users to review and possibly close old issues if they are no longer relevant.
  • it could list all issues, or be limited to a maximum amount in case someone is following hundreds or thousands of issues.
  • it could be sorted by different fields depending on the list, or just sorted by date/descending
  • it could be limited to issues that are still open, or also include closed issues (possibly in separate lists).
  • it could include additional metadata (possibly similar to the weekly report).

Given the number of users, we might have to take some care in sending out a large number of emails at once, since it might be seen as spam.

In order to do this I can write a script that goes through all the users and prepares custom list of issues for each one of them. We might also exclude users that are not following any (open) issues, or users that haven't been active in the last few years (that might be tricky to determine though). Then it could send the mails using an adapted version of https://github.com/psf/bpo-tracker-cpython/blob/6d29cfba9efa7410cc199b93de55f189811180b4/scripts/roundup-summary#L703-L728

Another option is to use mail or sendmail directly. I tried mail -s 'Test mail from bpo' -r no-reply@bugs.python.org <myemail>, typed a message, and I received it in my inbox.

@ewdurbin considering that we have 35k users, is there anything I should take into account (e.g. limit the number of messages per minute, set specific addresses/headers, etc.)? FWIW if we send a mail per second, it will take about 10h to send them all.

I worry that spamming this many users (presumably many of whom couldn't care less about bpo, even if in the past they reported or contributed to an issue) would seriously damage the "reputation" of the host from which you send it.

Do you know how many emails bpo typically sends out during a day? (I don't, but I doubt it's more than a few 100.)

I can pull stats for current sending tomorrow early US/Eastern. BPO sends through PSF-infra mailgun account, which at least shields the wider python.org mail somewhat. I would also worry about damaging the reputation of that account given that some critical things also rely on that for sending.

As far as rate of sending goes... I might need to check rate limits with mailgun, but also our current account volume limits.

But, after I sleep 😴

We could also do this in batches over time. The main reason for doing this is that GitHub doesn't allow us to auto-subscribe users to issues, so this mail would give them a chance to resubscribe manually (even though it's not optimal for very active users and I'm exploring other solutions too). If they don't resubscribe to issues they won't be notified if anything happens on the issue, so it's important to notify active users first -- users that haven't been active in a while can wait.

I'm not planning to send out any email until next week. We still have to do the migration, prepare the script, and I'll anyway wait for your confirmation before sending.

GitHub doesn't allow us to auto-subscribe users to issues, so this mail would give them a chance to resubscribe manually

Can a bot post a message that mentions people?

I have a half-baked idea about writing a GitHub action that lazily updates migrated issues when someone interacts with them.

In the body of the message we already have @mentions and PRs, but since they aren't evaluated during the migration they don't trigger subscription or PR linking. I'd have to do some more testing, but ISTM that editing the original message is not enough to re-trigger evaluation, but editing the line that includes the @mentions and/or PRs might.

Of course this can't be done en masse, or it will send out thousands of notifications, but if there is a way for an action to detect activity on an issue (e.g. someone adds a comment, or changes some metadata), the action could then try to edit the message body to re-trigger @mentions/PRs and notify people in the nosy. Failing that I guess it could post a new message with explicit @mentions, even though that causes some extra noise.

The problem to avoid is sending one message per person per ticket, right?
Wonder if it could be easy to get some info (like numbers of people subscribed to <20 tickets, people with between 20 and 50, etc) to see what’s the situation.

Maybe one email per person with a list of subscribed tickets (new link, title, maybe component) would be ok, and for people with more than hundreds of subscription, split the list over multiple messages.

Or another idea: if the roundup tickets get updated with their github ID, then there could be a github column in lists of search results. People can then decide which to open to subscribe on the new site.

The problem to avoid is sending one message per person per ticket, right?

The main problem is that the nosy list/subscriptions can't be migrated, so we need a way to let people know which issues they were following so that they can resubscribe manually and/or @mention them on the migrated issue. The former can be done from Roundup with one mail per user, the latter requires one @mention/mail per issue. However, if we mention people only when some activity happens on a migrated issue for the first time, it won't be different from the notification they would have gotten already from bpo if we hadn't migrated, so I think both solutions can be adopted.

Wonder if it could be easy to get some info (like numbers of people subscribed to <20 tickets, people with between 20 and 50, etc) to see what’s the situation.

This could be done. I expect a lot of people to be subscribed to only a few issues that they reported, but there are also core devs and active contributors subscribed to thousands of issues.

Maybe one email per person with a list of subscribed tickets (new link, title, maybe component) would be ok, and for people with more than hundreds of subscription, split the list over multiple messages.

I'm planning to list some issues for each category (created, followed, assigned), but if there are too many I could just include links with the search results on bpo, or point to the summaries in the sidebar. I'm not planning to send more than one mail per user from bpo.

Or another idea: if the roundup tickets get updated with their github ID, then there could be a github column in lists of search results.

This is already part of the plan :)
See the 4th checkbox in #15.

By doing all these things, users that don't follow bpo closely will know that the migration happened and they will also be reminded of the issues they created/followed. This will give them a chance to subscribe to/update their issues if they still care. Active users can either open the GH links in the email and subscribe, or doing it from the list of issues from bpo. If they linked their username, they will still be @mentioned when someone updates the issue, even if they didn't follow the issue manually.

In total bpo sent 59,529 emails in the last 90 days. Highest volume day was 1,373 on January 25, and lowest was 136 on December 25.

Looks like average is around 600 emails per day.

Currently we see ~7.36% suppression due to bouncing email addresses.

Thanks for looking into this! A few questions:

  • Does a single mail sent to multiple people in cc/bcc still count as one?
  • If a mail bounces, does it mean the address is inactive?
  • If so, can we prepare a list of inactive addresses that can be skipped?
  • Does a single mail sent to multiple people in cc/bcc still count as one?

No, each recipient is considered a send.

  • If a mail bounces, does it mean the address is inactive?

I think that's the most common reason, though sometimes mail servers bounce to perform grey listing (valid mail servers will retry later, spammers tend to just move on)

  • If so, can we prepare a list of inactive addresses that can be skipped?

Yes. I can manually load those into the suppression list. but that may not be a good idea since those addresses may be in use for other reasons. So I guess it depends on what you mean by inactive addresses.

If there is no way to know for sure if an address is inactive (and with inactive I mean unreachable), then I guess we can just try to send and see what happens. bpo also supports alternate email addresses that we could try but that's a bit more involved, since we would need a way to know which mails bounced, who the recipient was, check if they have alternate addresses on their profile, and try again with those.

Sorry to wake up this 2 year old issue long after things have already been migrated. But, I was just revisiting some history in BPO and noticed that my account information isn't at all tied to my GH account. It's probably not too important, but I'd still like to consolidate all my stray accounts/contributions from the prior decade to the same point of contact.

Apparently my username in bpo is "Ali Rizvi-Santiago" (with the same email as listed on the account). However, the password reset forms and everything on bpo don't appear to be sending email anymore which means that I'm unable to authenticate to it in order to reference a GH id of some kind. I also verified to ensure that I hadn't missed any related mail about the migration or related...but didn't spot anything.

Regardless, I'd assume this to be a manual process anyways since apparently the migration had already happened. Is there some way of resolving this perhaps?

Unfortunately it's only possible to link BPO accounts to GitHub accounts that are part of the https://github.com/python organisation.

Whilst we're here, is there any more to do here, or can we close this?

I think enough people should know about the migration by now, and all BPO issues have links to the corresponding GitHub one.

@hugovk, Awesome. That's fair. Appreciate the response.

Whilst we're here, is there any more to do here, or can we close this?

Yeah, I agree. Really, since the migration has been done it'd probably be better to archive the repository, or at least close all open issues that aren't scheduled for completion. I only found this issue while trying to figure out what had happened.