
Site Reliability Engineering Specialist
3 days ago
About the Position
We are seeking a Site Reliability Engineering Lead with extensive experience in IT Service Management and Application Performance Monitoring.
The ideal candidate will have hands-on expertise with Datadog, a strong grasp of IT operations, and the ability to implement workflow automation and drive operational excellence through data and KPIs.
Key Responsibilities:
- ITIL Expertise: Apply deep knowledge of Information Technology Infrastructure Library (ITIL v4) and ITSM platforms. Certification is preferred.
- Performance Monitoring: Use Datadog to monitor performance, infrastructure, and digital experience (RUM, Synthetic Monitoring, etc.).
- Process Automation: Implement complex process workflows and track performance using metrics-driven reporting.
- IT Operations: Demonstrate a strong understanding of IT Operations and its impact on application reliability.
- Communication: Communicate technical concepts clearly and concisely to both technical teams and executive leadership.
- Strategic Relationships: Build strategic relationships across teams, departments, business stakeholders, and external partners.
- KPI Development: Translate business requirements into measurable KPIs that reflect application stability and provide business insights.
- Troubleshooting: Troubleshoot recurring issues with a focus on incident reduction and operational automation.
- Automation Opportunities: Identify Toil (manual, repetitive work) and propose automation opportunities.
- Time-Sensitive Issues: React quickly to time-sensitive issues with strong problem-solving and decision-making skills.
Skill and Experience Requirements
- ITIL/ITSM Management: 7+ years of experience in ITIL/ITSM management.
- Datadog APM Tools: 3+ years working with Datadog APM tools, including infrastructure monitoring, logs, and digital experience components.
- Datadog Administration: Proven experience in administering the Datadog platform across its various features.
- Similar Roles: Prior experience in a similar application support or SRE leadership role.
- Monitoring Tools: Familiarity with additional monitoring tools and modern observability technologies.
- Analytical Skills: Excellent analytical, troubleshooting, and problem-solving skills.
- Communication Skills: Strong communication and organizational capabilities.
- Task Management: Ability to manage multiple tasks while prioritizing effectively.
-
Site Reliability Engineer
6 days ago
Hyderabad City Taluka, Pakistan GSPANN Technologies, Inc Full timeSite Reliability Engineering (SRE), Python, Django, FastAPI, Flask, SQL, RESTful, pytestDescriptionGSPANN is hiring a Site Reliability Engineer with to ensure high availability and performance of critical systems using tools like Prometheus and Nagios. The role involves developing reliable Python code, managing APIs, and optimizing system efficiency across...
-
Principal Site Reliability Lead Engineer
4 hours ago
Hyderabad City Taluka, Pakistan beBee Careers Full timeAbout the RoleWe are seeking a skilled Principal Site Reliability Engineer to join our team.This role offers an exceptional opportunity for professionals with expertise in site reliability, software engineering, and leadership to elevate their careers and contribute significantly to our community.
-
Senior Lead Site Reliability Engineer
2 days ago
Hyderabad City Taluka, Pakistan JP Morgan Chase Full timeElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you will work with your stakeholders to define non-functional requirements (NFRs)...
-
Site Reliability Engineer
2 weeks ago
Hyderabad City Taluka, Pakistan GSPANN Technologies, Inc Full timeSplunk, Information Technology Infrastructure Library (ITIL), IT Service Management (ITSM)DescriptionGSPANN is hiring an experienced Site Reliability Engineer (SRE) with 8+ years in IT Service Management (ITSM) and hands-on expertise in Application Performance Monitoring (APM) tools like Datadog and Splunk. We're looking for a self-driven professional who...
-
Lead Site Reliability Engineer
2 days ago
Hyderabad City Taluka, Pakistan JP Morgan Chase Full timeAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple...
-
Site Reliability Engineering Lead
6 days ago
Hyderabad City Taluka, Pakistan GSPANN Technologies, Inc Full timeWorkflows, Information Technology Infrastructure Library (ITIL), IT Service Management (ITSM), Splunk, IT Operations Management (ITOM)DescriptionGSPANN is hiring a Site Reliability Engineering (SRE) Lead with 10+ years of experience in IT Service Management (ITSM) and Application Performance Monitoring (APM). The ideal candidate will have hands-on expertise...
-
Reliability Engineering Lead
4 hours ago
Hyderabad City Taluka, Pakistan beBee Careers Full timeAbout the Role:Take on a critical leadership position, defining the future of a global organization and driving significant impact in site reliability.Key Responsibilities:Demonstrate and champion site reliability culture and practices, exerting technical influence across your team.Lead initiatives to improve application and platform reliability, leveraging...
-
Architect, Site Reliability Engineer
7 days ago
Hyderabad City Taluka, Pakistan Zscaler Full timeAbout ZscalerServing thousands of enterprise customers around the world including 40% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world's largest security cloud, Zscaler accelerates digital...
-
Hyderabad City Taluka, Pakistan beBee Careers Full timeAbout UsBehind a vast portfolio of iconic content and beloved brands, are the storytellers bringing characters to life.We offer career-defining opportunities and thoughtfully curated benefits.The DTC Global Tech OrganizationThe organization has many software engineering teams building applications for various devices.SRE Roles and ResponsibilitiesDrive...
-
Software Reliability Specialist
10 hours ago
Hyderabad City Taluka, Pakistan beBee Careers Full time**Job Title:** Software Reliability SpecialistWe are seeking a highly skilled Software Reliability Specialist to ensure the high availability and performance of our critical systems. The role involves designing and implementing monitoring systems, analyzing system performance, and optimizing efficiency.The ideal candidate will have hands-on experience in...