Sr. Observability Engineer

Job#: 2009933

Job Description:

Sr. Observability Engineer

Summary
The Senior Observability Engineer will design, implement, and maintain comprehensive observability solutions for complex systems and applications. This position requires a deep understanding of monitoring and observability practices, as well as expertise in using various tools and technologies to collect and analyze performance, logging, and metrics data. If you are interested in applying for this role please email your resume to our recruiter Hunter at [email protected]
Experience & Qualifications: 

  • Overall background in Infrastructure, Employee Experience, Synthetic Monitoring, and Application Performance Monitoring.
  • Experience working in or in close partnership with Site Reliability Engineering teams (SRE) is strongly desired.
  • Tools Experience Preferred: SolarWinds, Grafana Cloud, ThoursandEyes, Big Panda IO, Azure Observability Stack and other observability tools
  • Experience with an Enterprise Event Management System
  • Monitoring Setup and Configuration: Set up and configure the monitoring tools to collect data from various systems, applications, and network components. This involves defining monitoring metrics, configuring data collection agents or agents, and ensuring proper connectivity and access. 
  • Alert Management: Monitor alerts generated by the tools and perform triage to identify critical issues. Analyze alert patterns, fine-tune alert thresholds, and configure alert escalation workflows to ensure timely response and resolution.
  • Performance Analysis and Troubleshooting: Utilize the tools features and functionalities to analyze performance metrics, logs, and traces. Conduct investigations and root cause analysis to troubleshoot and resolve performance issues, identifying bottlenecks and areas for optimization. 
  • Incident Response: Collaborate with cross-functional teams to respond to and resolve incidents in a timely manner. Engage in incident management processes, including incident triage, communication, and coordination with relevant stakeholders, and participate in post-incident reviews to identify areas for improvement. 
  • Dashboard and Visualization: Create and maintain dashboards and visualizations using tools like Grafana, providing a consolidated view of system health, performance, and key metrics. Customize dashboards to meet specific business and operational requirements and share them with relevant teams and stakeholders. 
  • Capacity Planning and Scalability: Monitor resource utilization and performance trends to forecast capacity requirements. Collaborate with capacity planning teams to plan and provision resources based on anticipated growth and workload patterns, ensuring scalability and optimal performance. 
  • Tool Integration and Automation: Integrate observability tools with other systems and workflows, such as ticketing systems, incident management platforms, and automation frameworks. Automate monitoring configurations, data collection, and reporting processes to improve efficiency and reduce manual effort.
  • Continuous Improvement and Research: Stay updated with the latest developments in observability practices and technologies. Research and evaluate new tools and techniques that could enhance the monitoring and observability capabilities of the organization. Continuously improve existing monitoring setups, workflows, and processes to align with industry best practices. 

 

 

 

 

 

 

 

 

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or 844-463-6178.

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.

Employee Type:
Contract

Location:
Johnston, IA, US

Job Type:

Date Posted:
January 5, 2024