Staff Software Engineer, Platform Salary Upgrade to Premium to se... Related skills datadog aws kubernetes eks observability ???? DescriptionOwn and drive SRE strategy in observability, incidents, reliability, and platform opsServe as go-to consultant for infrastructure and reliability across teamsLead architecture decisions for monitoring, alerting, and SLO frameworks; RFCsProvide L2 on-call support for complex incidents; build incident response capabilityDefine multi-quarter SRE initiatives with cross-team dependenciesDefine and maintain SLIs/SLOs for tier-1 flows: contributions, disbursements, reportingContribute to ActBlue's reliability roadmap; anticipate upstream decisionsPrefer automation over manual processes; reduce toil through tooling???? Requirements8+ years in SRE, DevOps, or systems/infrastructure engineeringDeep expertise in observability tooling (Datadog)Strong Kubernetes and cloud-native infra (AWS EKS)Experience defining and operating SLIs and SLOs in productionProven ability to lead cross-functional reliability initiativesStrong incident management: on-call, post-mortems, blameless culture???? BenefitsFlexible schedules and unlimited time offComprehensive health, dental, vision insurance for you and family401K with employer matchPaid medical, family and parental leaveHome-office setup allowance for remote employeesSnacks and digital subscriptions perks Meet JobCopilot: Your Personal AI Job Hunter Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.