ExamWorks is looking for a Senior Director of Site Reliability Engineering to join the team. This will be onsite in the Woodbury, NY office Monday - Friday standard business hours.
The Senior Director of Site Reliability Engineering is responsible for ensuring the stability, performance, and capability of all production environments that support business critical operations. This role provides leadership in designing resilient infrastructure, overseeing incident response, implementing proactive monitoring and maintenance practices, and driving continuous improvement in system reliability. The position partners closely with IT, security, operation leaders to safeguard system availability, support, business growth, and ensure alignment with organizational service.
ESSENTIAL FUNCTIONS
- Ensure high availability and performance across all production environments supporting scheduling, reporting, transcription, communication, and related operational workflows.
- Architect and maintain redundancy, backup, failover, and disaster recovery strategies to ensure business continuity.
- Oversee hybrid infrastructure (on-prem and cloud), including capacity planning, scalability, configuration consistency, network reliability (LAN/WAN/VPN), firewalls, load balancers, and telephony systems.
- Implement proactive monitoring, health checks, alerting, and dashboards to identify and address performance issues before they impact users.
- Lead incident response and serve as a senior escalation point for system disruptions. Maintain clear communication with IT leadership and business stakeholders during events.
- Drive route cause analysis (RCA) and implement permanent corrective actions to prevent recurrence; track and improve MTTD and MTTR.
- Develop and maintain operational runbooks, exclamation paths, change logs, architectural documentation, and system diagrams to ensure transparency and operational readiness.
- Enforced discipline change management processes to ensure production stability, quality control, and auditable deployment practices.
- Coordinate planned maintenance system upgrades, patching, and downtime windows in partnership with operational teams.
- Manage vendor and carrier relationships to optimize service delivery and reliability.
- Partner with Information Security to ensure systems meet data privacy, compliance, and audit requirements.
- Report key reliability metrics and ensure alignment with business defined SLAs.
ESSENTIAL MANAGERIAL RESPONSIBILITIES
- Carrying out all responsibilities in accordance with the company's standards, policies, and all applicable employment laws.
- Managing and monitoring workflow and providing support, training, and techniques to assist staff in achieving department daily/weekly/monthly goals and standards.
- Encouraging positive morale, maintaining harmony among staff, and resolving grievances when necessary.
- Overseeing the completion and approval of employee timecards and coordinating overtime needs with management and staff as needed.
- Actively participating in the department's staffing requirements including hiring, onboarding, and separating of employees.
- Creating and implementing plans to meet department's goals and metrics based on workload and client needs.
- Communicating effectively and supporting those affected by change.
- Managing insubordinate staff when warranted and initiating coaching or corrective actions as required and/or directed by upper management.
- Evaluating staff needs and performance, providing periodic feedback to staff and reporting any performance concerns and/or recommendations growth opportunities to management.
- Actively participating and successfully conducting annual performance evaluations.
- Bachelor's degree in computer science, IT, or related field preferred
- Minimum 10 years of experience in site reliability, infrastructure operations, or systems engineering roles
- Must have proven success in leading site reliability, or infrastructure operations in high-volume, mission critical environments (healthcare, insurance, logistics, or financial services preferred)
- Must have deep understanding of system availability, operational continuity, and incident response best practices
- Must have demonstrated ability to diagnose and resolve complex, technical issues and drive permanent improvements
- Must have strong focus on documentation, process discipline, and reproducibility to ensure control and consistency
- Must have exceptional leadership, coordination, communication skills across IT, operations, and vendor teams
- Must be skilled at translating complex technical issues into clear business terms for stakeholders
- Must have ability to manage competing priorities, meet deadlines under pressure, and lead teams through change
- Must have commitment to professionalism, confidentiality, and a collaborative work culture
- Ability to follow all company policies and procedures in effect at time of hire and as they may change or be added from time to time
WHO WE ARE ExamWorks is a leading provider of innovative healthcare services including independent medical examinations, peer reviews, bill reviews, Medicare compliance, case management, record retrieval, document management and related services. Our clients include property and casualty insurance carriers, law firms, third-party claim administrators and government agencies that use independent services to confirm the veracity of claims by sick or injured individuals under automotive, disability, liability and workers' compensation insurance coverages. Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, pregnancy, genetic information, disability, status as a protected veteran, or any other protected category under applicable federal, state, and local laws. Equal Opportunity Employer - Minorities/Females/Disabled/Veterans
ExamWorks offers a fast-paced team atmosphere with competitive benefits (medical, vision, dental), paid time off, and 401k.
|