Systems Engineer (Observability)
![]() | |
![]() | |
![]() United States, Washington, Redmond | |
![]() 4600 150th Avenue Northeast (Show on map) | |
![]() | |
Nintendo of America Inc. About Nintendo of America: From the launch of the Nintendo Entertainment System more than 30 years ago, Nintendo's mission has been to create smiles through unique entertainment experiences. Here at Nintendo of America Inc., we deliver on this mission by partnering closely with Nintendo Co., Ltd., to bring Nintendo's iconic and cherished franchises including Mario, Donkey Kong, The Legend of Zelda, Metroid, Animal Crossing, Pikmin and Splatoon across the Americas through our video games, hardware systems, and collaborations with partners on a range of other entertainment initiatives like feature films and theme parks. Based in Redmond, Washington, Nintendo of America (NOA) serves as headquarters for Nintendo's operations in the Americas. We are an equal opportunity employer offering a welcoming and inclusive environment in service to one another, our products, and the diverse consumers and communities we call home. For more information about Nintendo, please visit the company's website at https://www.nintendo.com/. Team/Job Summary: This role is within NOA's IT: Infrastructure & Operations Department and owns, maintains, and administers our Observability platform and Atlassian tools (Confluence and Jira). It also works with support teams to help define and build out appropriate monitoring strategies that support our corporate and global service offerings. Apply/Practice SaaS Governance Methodologies to the New Relic/Atlassian platform, emphasizing self-service and autonomy for internal customers. DESCRIPTION OF DUTIES: * Build, maintain, and improve our observability tools platform used across the company. * Collaborate with cross-functional teams, including software engineers and infrastructure teams, to identify and define observability requirements * Gathers requirements and functional specifications from support teams to build monitors and dashboards that provide insight into the health and performance of systems. * Provides collaboration with other engineering teams around application integration with observability tools through the build out of analytics (reporting), visualization (dashboards), and alerting (correlation). * Develop and implement best practices for creating and maintaining, alerting, and telemetry systems * Analyzes data collected to learn about the performance of infrastructure, applications and services. * Automates infrastructure that enables efficient builds, testing, deployments, and monitoring. * Responds to outages with prompt and efficient resolution and communicates issues to via the proper support channels * Provides tuning to existing monitors by identifying trends, oddities and potential bottlenecks and triggering thresholds or actionable alerts. * Learns new technologies and how to monitor them. * Advocate to the business on observability and monitoring concepts and capabilities and how to use these systems. * Provides on-call support for our toolsets. * Researches, evaluates, recommends new technologies, and provides a roadmap for future observability and monitoring capabilities. * Assists with planning and executing disaster recovery solutions and business continuity planning for our observability/monitoring and collaboration tools. * Contribute to the development of end-user documentation, runbooks for observability systems, and practices * Define and track key performance indicators (KPI) related to system availability, performance, and reliability * Provides technical guidance and training to the NOC staff around monitoring solutions and reviews monitoring tasks completed. * Administer Atlassian applications in a large, multi-methodology organization. SUMMARY OF REQUIREMENTS: * Bachelor of Science degree in Computer Science, Computer Engineering, Electrical Engineering, Information Technology, Information Systems, Industrial Engineering, or related field; or equivalent combination of education and experience. * 3-5 years of experience with Linux systems administration/engineering * 3-5 years of experience with observability and monitoring concepts; experience with mainstream, centralized, enterprise-class monitoring systems such as New Relic * Familiarity in Application Monitoring (APM) * 2-3 years of experience working with a configuration management system (Puppet, Chef, Ansible) * 2-3 years of experience with coding (Python or Golang). * Experience supporting Docker containers and web applications running on Apache/Tomcat in a live production environment * Proficiency using source control (Git)in a collaborative environment (GitHub) * Must have strong analytical and problem-solving skills. * Knowledge of ITIL best practices * Familiarity with Kubernetes, Prometheus, Splunk, Graphite, Influx DB, and Grafana preferred * Understanding of network fundamentals/core technology such as DNS, SNMP, and OSI model preferred * RHEL or other related professional certifications in cloud platforms or observability and monitoring tools a plus * Large scale Atlassian tools administration experience a plus Applicants must be legally eligible to work in the United States to be considered. Visa sponsorship is not available for this role. This position is HYBRID in Redmond, WA. Hybrid positions require regular onsite work following the schedule and guidelines for their division. This position is not open to fully remote status at this time. This position includes a base salary range of $102,500 - $164,000 annually, potential for a semi-annual discretionary performance bonus, and a comprehensive benefits package that includes medical, dental, vision, 401(k), and paid time off. Please see our Benefits & Perks page for more benefits information #LI-HYBRID |