Lead Dev 2016 London: How not to burn your monitoring/production team

Issues:

  • time spend inefficiently
  • repetitive tasks
  • working alone
  • yak shaving

This will lead people to leave the company.

Ideas:

  • need to sleep: have a mandatory half a day off after a bad night
  • allocate time to resolve/automate small but recurring issues
  • make a knowledge matrix (people/competence) in order to just wake up the right people, and not everyone
  • don’t alert on all errors (causes “alert fatigue”). Alert on actionnable and on business breaking issues
  • alert just the right people (for instance page duty is good at it)
  • use AWS EC2 auto scaling (less human intervention)