We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Site Reliability Engineer, Consultant

Blue Shield of CA
United States, California, Oakland
601 12th Street (Show on map)
Nov 11, 2025

Your Role

We are seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across our production systems. In this role, you will blend software engineering, automation, and systems operations to ensure that our platforms are resilient, efficient, and continuously improving.
You will be part of a cross-functional team responsible for designing, implementing, and maintaining reliable systems that support millions of requests daily. This position requires a deep understanding of distributed systems, cloud infrastructure, automation, and incident response.

Your Knowledge and Experience

Education & Experience

  • Requires a Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience); Master's degree a plus.
  • 7+ years of experience in building, supporting, and improving production systems and infrastructure.

Cloud Platforms

  • Minimum 5 years of hands-on experience with Azure, AWS, or GCP.
  • Demonstrated expertise in virtual machines (VMs), containers, cloud networking, identity and access management (IAM), monitoring, storage, and serverless functions.
  • Comfortable deploying and managing cloud-native services and infrastructure.

Programming & Scripting

  • Proficiency in one or more languages such as Python, Go, Java, Bash, PowerShell, or similar.
  • Ability to write clean, maintainable code for automation and tooling.

Containerization & Orchestration

  • Experience working with Kubernetes, Docker, and tools like Helm or Red Hat OpenShift.
  • Familiarity with managing containerized applications in production environments.

Monitoring & Observability

  • Working knowledge of tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, SolarWinds.
  • Ability to set up dashboards, alerts, and metrics to ensure system health and performance.

CI/CD & Configuration Management

  • Experience with CI/CD pipelines using tools like Jenkins, GitHub Actions, GitLab CI, Argo CD, Spinnaker.
  • Familiarity with configuration management tools such as Ansible, Chef, Puppet.

Automation & Emerging Technologies

  • Understanding of Agentic AI systems and automation frameworks for incident response and infrastructure optimization is a plus.
  • Interest in exploring intelligent automation to improve reliability and reduce manual toil.

Testing & Deployment Expertise

  • Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey) and methodologies.
  • Hands-on knowledge of Blue/Green and Canary deployment strategies in cloud-native environments.

#LI-EB1

Applied = 0

(web-f6fc48fb5-ggfjh)