Smarsh
Cambridge, UK
As the Site Reliability Engineer Manager (SRE Manager) for the SaaS Operations team you'll be part of a group who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can. What will you bring? Experience leading Site Reliability Engineering teams in SaaS and data streaming context Experience working in cloud native platforms and PaaS Experience working with containers and container orchestration platforms Experience with declarative IaC frameworks: BOSH, Terraform, Puppet/Chef/Ansible/Salt Experience working inside modern observability platforms Centralized logging (ELK, Splunk) APM (AppDynamics, Dynatrace, New Relic) Platform telemetry (DataDog, Nagios) Experience working delivering with CI/CD pipelines (Concourse, Bamboo, Jenkins) Strong Linux...