Senior Customer Reliability Engineer, Infrastructure

1 week ago


Hyderabad City Taluka, Pakistan Astronomer Full time

Astronomer designed Astro, an industry-leading, orchestration-first DataOps platform for data teams. Powered by Airflow, Astro accelerates building reliable data products that unlock insights, unleash AI value, and drive data-driven applications.

We're a globally-distributed and rapidly growing venture-backed team of learners, innovators and collaborators. Our mission is to empower data teams to bring mission-critical analytics, AI, and software to life. As a member of our team, you will be at the forefront of the industry as we strive to deliver the world's data.

Your background may be unconventional; as long as you have the essential qualifications, we encourage you to apply. While having "bonus" qualifications makes for a strong candidate, Astronomer values diverse experiences. Many of us at Astronomer haven't followed traditional career paths, and we welcome it if yours hasn't either.

About this role:

The Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers' usage of our managed Airflow service.

The CRE are responsible for operating, monitoring, and maintaining the platform to ensure availability, predictability, and reliable operations.

As an infrastructure specialist within the team, you will focus on the reliability of the underlying cloud infrastructure and Kubernetes clusters. This entails responding to incidents either raised by a customer, or from our monitoring system and then taking further steps to ensure problems are permanently resolved or monitored. As owners of the observability platform, CRE has unlimited potential to improve the reliability of the product and deliver the best possible outcome for our customers.

This role is directly customer-facing and gives exposure to very diverse problems and requirements. The CRE get the opportunity to interface with customers from a variety of industries across different cloud providers, and all with different expectations. Your contributions will directly impact customers' success with using the Astronomer products, and you will be able to help make meaningful improvements to the customer experience.

What you get to do:
  1. Provide solutions to customers to make them successful using our products.

  2. Troubleshoot Customer environments and engage in active triaging with customers.

  3. Provide feedback to the product development teams on customer needs and pain points.

  4. Build out our monitoring and alerting systems.

  5. Build and maintain automation to ensure daily operational tasks are handled as efficiently as possible.

  6. Help direct the architecture of the products and contribute where possible.

  7. Own the customer experience, working directly with customers to prioritize and solve issues, meet SLAs, and provide "white glove" guidance on the path to production.

  8. Participate remotely within a fully distributed team.

  9. Enhance and enrich customer documentation.

  10. Work on a modern, sophisticated, cloud-native product that customers use to connect to dozens of other systems.

  11. Help maintain 24x7 coverage through a specified 6-hour pager period during your work day.

  12. Participate in paid on-call rotation for weekend coverage.

What you bring to the role:
  1. 5+ years of experience, preferably with large, complex cloud infrastructures operating at scale.

  2. 3+ years of experience with Kubernetes.

  3. Experience managing a Production distributed system with at least one major cloud provider (one or all: AWS, GCP, Azure).

  4. Strong Network Experience with one of the major Clouds.

  5. Strong Linux experience.

  6. Knowledge of how to operate and monitor issues for distributed systems.

  7. Experience with Observability tools.

  8. Previous experience in handling customer issues (internal and external).

  9. Strong Communication Skills.

  10. DevOps or CI/CD experience.

  11. Python scripting.

  12. Good troubleshooting Skills.

Bonus points if you have:
  1. Experience as a Site Reliability Engineer.

  2. Worked with Kubernetes Custom Resources.

  3. Depth of knowledge with Azure.

  4. Airflow/Big Data Orchestration experience.

  5. IaC experience.

#LI-Remote

At Astronomer, we value diversity. We are an equal opportunity employer: we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Astronomer is a remote-first company.

#J-18808-Ljbffr

  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    Cloud Infrastructure EngineerWe are seeking a highly skilled and driven Cloud Infrastructure Engineer with 3+ years of experience in cloud infrastructure, automation, and software development. This role focuses on building and maintaining secure, scalable, and efficient cloud systems.The ideal candidate will have hands-on expertise in software development,...


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    About this RoleWe are seeking a skilled Architect Site Reliability Engineer to join our SRE Platform and Tooling team. As a key member of the team, you will be responsible for developing scalable, secure, and resilient SRE platform and tooling solutions to enhance reliability and performance across cloud, on-prem, and private cloud environments.Key...

  • Senior Cloud Engineer

    3 weeks ago


    Hyderabad City Taluka, Pakistan FANATICS INC Full time

    Job Description:We are seeking a highly skilled and driven Senior Cloud Engineer with 3+ years of experience in cloud infrastructure, automation, and software development. This role focuses on building and maintaining secure, scalable, and efficient cloud systems. The ideal candidate will have hands-on expertise in software development, infrastructure,...


  • Hyderabad City Taluka, Pakistan Workato Full time

    About WorkatoWorkato transforms technology complexity into business opportunity. As the leader in enterprise orchestration, Workato helps businesses globally streamline operations by connecting data, processes, applications, and experiences. Its AI-powered platform enables teams to navigate complex workflows in real-time, driving efficiency and...


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    We are seeking an exceptional Senior Infrastructure Engineer to join our team.",


  • Hyderabad City Taluka, Pakistan Workato Full time

    About WorkatoWorkato transforms technology complexity into business opportunity. As the leader in enterprise orchestration, Workato helps businesses globally streamline operations by connecting data, processes, applications, and experiences. Its AI-powered platform enables teams to navigate complex workflows in real-time, driving efficiency and...


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    Scalable Infrastructure EngineerWe are seeking an experienced engineer to join our infrastructure team and help us build a scalable and reliable platform. As a key member of the team, you will be responsible for designing, implementing, and maintaining the underlying infrastructure that supports our flagship product.


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    Senior Cloud Systems EngineerWe are seeking a highly skilled and experienced Senior Cloud Systems Engineer to join our team. As a Senior Cloud Systems Engineer, you will be responsible for designing, building, and maintaining secure, scalable, and efficient cloud systems.About the Job:Design and implement cloud infrastructure solutions for complex,...


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    Senior Data Platform Engineer needed. Develop, deploy, and manage scalable, secure, and reliable data platforms using Azure Databricks. Focus on resource utilization, cost optimization, performance tuning, and data isolation. Participate in refining the architecture, contributing to improving the product through automation-first approach. Strong...


  • Hyderabad City Taluka, Pakistan beBee Careers Full time

    Cloud Infrastructure EngineerThis role is designed for an experienced Cloud Infrastructure Engineer who will be responsible for enhancing scalability, performance, and reliability of our infrastructure. You will leverage your deep understanding of container orchestration (Kubernetes) and cloud platforms (AWS, Azure, GCP, Openshift) to streamline our...