We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Principal Site Reliability Engineer, Observability

Okta
vision insurance, flexible benefit account, parental leave, 401(k)
United States, Washington, Bellevue
Aug 02, 2025

Get to know Okta

Okta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we're looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We're building a world where Identity belongs to you.

We're searching for a Principal Site Reliability Engineer (SRE) with a profound passion for observability to join our team. This isn't just a hands-on role; you'll be a thought leader, shaping the strategy and execution of our observability services-logs, metrics, and tracing-both within the Observability team and across the broader organization. We're looking for someone who can help us see clearly when things get cloudy!

Your expertise in Kubernetes will be crucial as we undergo a significant replatforming initiative. You will guide the design, implementation, and operation of our advanced observability capabilities on the new platform.

A cornerstone of this role is your exceptional ability to manage and influence stakeholders, ensuring their needs are met, expectations are managed, and they're delighted with the insights our observability services provide. We believe that our important stakeholders deserve metric-ulous attention.


What You'll Be Doing

  • Becoming deeply familiar with all corners of a critical SaaS platform utilized by millions of customers daily, with an eye towards providing unparalleled observability insights into its behavior and performance.
  • Engaging with stakeholders across the group to not only understand their component boundaries and dependencies but also to drive the adoption of observability best practices as a guide and coach for your teammates and the wider engineering organization.
  • Championing the evolution of our SDLC: defining how we ideate, onboard, operate, and scale microservices and features in a secure, performant, always-on manner, with observability (logs, metrics, tracing) as a foundational element from inception.
  • Identifying, understanding, and automating away manual processes through clever code and smart architecture, particularly focusing on how automation can enhance the collection, analysis, and actionability of observability data.
  • Supporting a 24x7 online environment as part of a global on-call rotation, leveraging your deep observability expertise to rapidly identify, diagnose, and resolve the most complex incidents.
  • Advocating for and establishing best practices for scalable, reliable, and resilient systems and services across all of WIC engineering, with a strong emphasis on fostering an observability-driven culture.


What You'll Bring to the Role

  • 9+ years of experience as a site reliability or platform engineer, preferably in a fast-scaling environment, with a significant and demonstrable track record in leading observability initiatives.
  • 2+ years of experience designing, scaling, and operating observability solutions for applications within a Kubernetes environment. You'll be adept at leveraging Kubernetes capabilities to gain insights into workload performance and health.
  • Familiarity with large-scale containerized deployments, both microservice and monolithic, coupled with a deep understanding of their unique observability challenges and solutions.
  • A proactive and tenacious mindset: always willing to go the extra mile to identify a problem and drive its resolution, especially when it pertains to improving system visibility and reliability.
  • A strong passion for mentoring and encouraging the development of engineering peers, leading by example in adopting and promoting robust observability practices.
  • Deep knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and Internet protocols, applied strategically to build resilient and observable systems.
  • Strong skills in multiple operational tooling languages such as Python, Rust, or Go, for automating sophisticated observability tasks and integrations.
  • Proven ability to effectively manage and influence diverse stakeholders, translating complex technical observability concepts into clear, actionable insights, and ensuring high levels of satisfaction with observability services.
  • Expert proficiency with Splunk or similar for large-scale log management and advanced analysis.
  • Extensive experience with Grafana for designing and implementing sophisticated dashboards and visualizations of critical metrics.


This role requires in-person onboarding and travel to our San Francisco, CA HQ office during the first week of employment.

#LI-LSS1

Below is the annual base salary range for candidates located in California, Colorado, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: https://rewards.okta.com/us.

The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, New York, and Washington is between:
$194,000 $290,000 USD

What you can look forward to as a Full-Time Okta employee!



  • Amazing Benefits
  • Making Social Impact
  • Developing Talent and Fostering Connection + Community at Okta


Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.

Some roles may require travel to one of our office locations for in-person onboarding.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy Policy at https://www.okta.com/privacy-policy/.

Applied = 0

(web-6886664d94-nm6rc)