Things I tried as team lead: #1
Opened this issue · 82 comments
Feedback form, Learning sessions, sharing credit, Rotation of scrum master, Rally updates, PR reminders
Video calls
Tech Lead Feedback
Learning Sessions
Sharing general updates, what's happening around us, future roadmap with the team
Thank you's
Achievements Journal,
Team Responsibilities
Work assignment, holiday plan
Assign work for each resource each day
Have some docs in backlog that could be picked up if blocked on any other issues
Support team with processes rather that as an individual e.g. docs
Discussion Points:
[1] Lack of motivation within the team
[2] Collaboration with external teams
[3] US delivery management without supervision/support
[4] Pre mortem goals
Build architecture patterns for the cloud
More automation around Infrastructure Provisioning
Roll-out Well Architected framework for - AWS, Kubernetes
Increased collaboration with other cloud teams
Evolving Platform Team concept - Increased collaboration with Platform Tech Leads
Building self-sufficient team
[5] External Comms
[6] Individual feedback
[7] Support tasks
[8] Exposure to team around - Support, Debugging, Prod Release
[9] Rotation of responsiblities
[10] PR's
- Teams to be trained on the technology they will be working on
- Teams to have at least one person who is an SME on topics e.g. Spark or have SME group for each topic/platform teams who will fill this need
- Dedicated team/guild/architect analysing and setting up standards/recommendations to be followed
- TPO’s working with business to have adequate timeline for iterative delivery to meet production standards
- Tech Leads pushing back on TPO to get adequate timeline for iterative delivery to meet production standards
- Not just trying to deliver fast to have nice metrics
- Not forcing teams with tight deadlines and moving to another work item without considering the fact that the previous work item needs to be maintained by someone on a recurring basis, whether it’s fully productionized etc.
Also, things to be considered for cloud migration –
- Someone has to manage the infrastructure in the cloud – moving to cloud doesn’t mean AWS will take care of everything
- Adequate funding/training/skillset needs to be given so that resources can stabilize the platform and avoid major P1 incidents down the line in terms of
- Having fine grained access controls in place
- Auditing
- Monitoring/Alerting
- Well architected design and many more
- Above shouldn’t require a justification – it needs to be quite obvious
Aspect | Standalone Application | Capability on Platform |
---|---|---|
Architecture and Security Reviews | Get proposed arch and infrastructure patterns reviewed and approved by Security Team | Adopt the existing security patterns and go for security review in case of any major arch changes |
Infrastructure | Build new environments from the ground up | Leverage existing and provision additional resources as necessary |
Implementation Strategy (CI/CD, Code, Test, etc.) | Set it up as you see fit for team. Can adopt patterns from the existing teams as well. | Embrace the practices from the platform engineering team and add additional ones if necessary |
Code Starter Kits: | Build one news and/or adopt and customize existing ones from other teams | Adopt existing ones from platform teams and customize components if necessary |
Community Support | Team will get up to speed on the UHG ecosystem, procedures and controls but can get additional support as needed | Help on offer from Platform team. Can't expect Platform team to support us in every step of the way though! |
UHC.OPEN/Code reusability | Isolated applications have limited opportunities for inner source. Right mindset can still get things done. | Platform Engineering opens up synergies for collaboration and reuse. |
Technology Upgrades | No dependency on other teams to pilot things. Free to execute disruptive things as long as business is not impacted | Collaborate with Platform team on new implementations. Need to take the entire platform ecosystem into consideration for impact! |
Cost | It depends on what we can reuse? Should also take time invested by team in learning and building everything from the ground up | Reusability plays a big role in cost numbers |
Speed to Market | It depends on what we can reuse? Should also take time invested by team in learning and building everything from the ground up | Reusability plays a big role in speed to market |
Enterprise Direction | Isolated applications are still in use, depends on the use case and timeline. | Platform Engineering is the new normal! |
https://threadreaderapp.com/thread/1542061516912037890.html
Questions to ask to determine whether something is really urgent
https://blog.pragmaticengineer.com/oncall-compensation/
A: Oncall for software engineers is additional.
-
“Being oncall is your one and only job.”
-
“It’s not part of the job outside business hours.
-
“It’s not part of the job outside business hours, but we might still try to reach you during those times.”
-
“It’s part of the job for all software engineers and we operate in regions which regulate how it needs to be compensated with pay and time off.”
-
“It’s part of the job, but we recognize the disruption with pay and additional time off.”
-
“It’s voluntary for most people, and we encourage it with pay and time off.”
B: Oncall is part of the job:
- “It’s part of the job for all software engineers and not paid additional.”
Being oncall can be quite disruptive in two major ways:
It disrupts your personal plans, outside of work.
It disrupts your sleep.
Compensation approaches
Flat rate per week or per day of being oncall
Flat rate for standby, plus pay for hours worked outside core hours
Only pay for incidents worked on out-of-hours
Several engineers working at the company told me oncall operational load is high, teams are understaffed, oncall is not paid, and someone even used the term “oncall prison,” as quoted above.
What is the reason for the high oncall lead?
Growing too fast
Too many custom systems
Attrition for experienced people
No backfills
A barely acknowledged tech debt problem
Light at the end of the tunnel
Why are poor oncall practices painful?
They can directly impact software engineer attrition and wellbeing. Simply put, poor oncall practices will lead to more engineers quitting, more people getting burnt out and fewer people recommending a company.
Takeaways
Oncall for software engineers is part of the job. Many companies operate like this, most notably Big Tech – save for Google – and many high-growth startups. The more an employer compensates software engineers, the more likely they expect oncall to be a given.
Oncall for software engineers is additional. Companies which care either about healthy oncall practices or want to minimize attrition for software engineers, make it clear oncall is additional and offer some sort of compensation. Compensation may be cash, or it could be time, or it could be lightening the load with dedicated SREs or DevOps people, or making the rotations voluntary.
Ideal State | Current State | Problem Root Cause | Potential Solutions | |
---|---|---|---|---|
Guild Meetings | Occurs as planned and gets cancelled occasionally due to overlaps with any other important meetings such as Town Hall | Sometimes guilds get cancelled in successive weeks without reason | Guild Owners are busy with their project delivery/other reasons/lack of agenda | Maybe Guilds should be run by a group of team members rather than one person/replacements |
Agenda | Agenda is either well planned in advance or topics are gathered & discussed on the fly (lean meetings) | Sometimes no agenda/lack of audience/no active engagement from everyone | Guild Owners are busy with their project delivery/other reasons so no agenda sometimes Very less participation in the guilds due to lack of awareness/culture/tight delivery deadlines/time zone etc. | Below suggestion from Kevin Guild cohort should reach out to teams and build out an agenda based on their pain points/invite them to share their pain points/try different approaches and see what works |
Purpose | Stays on track with the Guild’s mission/goal statements | Sometimes goes off-track discussing items which aren’t relevant to the purpose of that particular guild | Guild Owners are passionate to discuss about what interests them/lack of existing space to discuss things (busy calendar for everyone) so Guild is used as the place to discuss anything | Rename Guilds to make them more generic/send agenda in advance so only interested/relevant folks join/stay focused on the goals |
Outcomes | Outcomes are assigned to Team members/Teams Outcomes are tracked Achieved outcomes are demoed | Some items no owner is identifiable | Lack of interest/lack of clarity as to who should own/deliver on it | Assign items to Platform teams backlog/interested teams/individuals – give them the needed support (moral + funding + time) as well |
Participation | At least teams/members relevant to the guild participate | Siloed conversations e.g. A team actively developing an API in our platform doesn’t participate in the guild e.g. A team has already built a solution for a problem that the guild is actively discussing about | Lack of participation due to less awareness/culture/tight delivery deadlines/time zone etc. Lack of targeted communication | Project leads/TLs should encourage their teams to participate Guild should ensure relevant scum teams are part of the conversation |
Meetings:
- Everyone spoke and gave their opinions and as importantly were heard
- Everyone was interested and passionate about the subject
- Everyone had genuine positive intentions to make things better and solve problems
- Meeting notes (or a transcript) were captured
- Actionable items and clear takeaways were captured (with folks volunteering to take ownership of those takeaways)
- The meeting started on time and ended on time
- The phrase “that doesn’t make sense” was not used once (I am guilty of that one – have to work on my phrasing)
- At least two “dumb” questions were asked