Sr. Data Engineer Azure Databricks

4 weeks ago


Islamabad, Islamabad, Pakistan FuseMachines Full time

About Fusemachines

Fusemachines is a leading AI strategy, talent, and education services and products provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic) and more than 400 full-time employees, Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

About the Role

This is a remote, contract position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization, and Advanced Analytics).

We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps, and cloud-based large-scale data applications with a passion for data quality, performance, and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products in the Aviation Industry, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff, and collaboration with multi-disciplined teams to achieve project objectives.

Qualification & Experience

  • Must have a full-time Bachelor's degree in Computer Science or similar.
  • At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers.
  • 5+ years of experience with Azure DevOps, GitHub.
  • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrations.
  • Following certifications:
    • Databricks Certified Associate Developer for Apache Spark
    • Databricks Certified Data Engineer Associate
    • Microsoft Certified: Azure Fundamentals
    • Microsoft Certified: Azure Data Engineer Associate
    • Microsoft Exam: Designing and Implementing Microsoft DevOps Solutions (nice to have)

Required Skills/Competencies

  • Strong programming skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing, and manipulation.
  • Strong understanding and experience with SQL and writing advanced SQL queries.
  • Thorough understanding of big data principles, techniques, and best practices.
  • Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have: experience with Azure Databricks), DBT, and Kafka, to be able to handle large volumes of data.
  • Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment.
  • Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open-source solutions being able to develop custom integration solutions as needed.
  • Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.
  • Expertise in data cleansing, transformation, and validation.
  • Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table).
  • Good understanding of Data Modeling and Database Design Principles.
  • Strong experience in designing and implementing Data Warehousing, data lake, and data lake house solutions in Azure and Databricks.
  • Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).
  • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
  • Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub.
  • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC), configuration management, automated testing, performance tuning, and cost management and optimization.
  • Strong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics.
  • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
  • Strong knowledge of data structures and algorithms and good software engineering practices.
  • Proven experience migrating from Azure Synapse to Azure Data Lake, or other technologies.
  • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
  • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
  • Good understanding of Data Quality and Governance.
  • Experience with BI solutions including PowerBI is a plus.
  • Strong written and verbal communication skills.
  • Ability to document processes, procedures, and deployment configurations.
  • Understanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standards.
  • Ability to implement security controls and best practices within data and analytics solutions.
  • Self-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the team.
  • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
  • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
  • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.

Responsibilities

  • Architect, design, develop, test, and maintain high-performance, large-scale, complex data architectures.
  • Contribute to detailed design, architectural discussions, and customer requirements sessions.
  • Actively participate in the design, development, and testing of big data products.
  • Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform.
  • Migrate out of Azure Synapse to Azure Data Lake or other technologies.
  • Assess best practices and design schemas that match business needs for delivering a modern analytics solution.
  • Design and implement data models and schemas that support efficient data processing and analytics.
  • Design and develop clear, maintainable code with automated testing.
  • Collaborate with cross-functional teams to understand data requirements and develop data solutions.
  • Evaluate and implement new technologies and tools to improve data integration, data processing, storage, and analysis.
  • Evaluate, design, implement, and maintain data governance solutions.
  • Continuously monitor and fine-tune workloads and clusters to achieve optimal performance.
  • Provide guidance and mentorship to junior team members.
  • Maintain clear and comprehensive documentation of the solutions.
  • Promote and enforce best practices in data engineering, data governance, and data quality.
  • Ensure data quality and accuracy.
  • Design, implement, and maintain data security and privacy measures.
  • Be an active member of an Agile team.

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

#J-18808-Ljbffr

  • Islamabad, Islamabad, Pakistan CLUSTOX Full time

    About the job Principal Data Engineer- AzureAbout the ProjectWe are a mission-driven team of developers, architects, ML engineers, and data specialists building an innovative cloud-based platform to combat coral reef degradation caused by global warming. By leveraging real-time data pipelines, AI/ML models, and scalable cloud architecture, we aim to deliver...


  • Islamabad, Islamabad, Pakistan Clustox Inc. Full time

    About the ProjectWe are a mission-driven team of developers, architects, ML engineers, and data specialists building an innovative cloud-based platform to combat coral reef degradation caused by global warming. By leveraging real-time data pipelines, AI/ML models, and scalable cloud architecture, we aim to deliver actionable insights for marine...


  • Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the RoleWe are seeking a highly skilled Senior Data Engineer to join our team. As a key member of our data engineering team, you will be responsible for designing, building, and maintaining high-performance, large-scale complex data architectures using Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps, and cloud-based...


  • Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the JobThis Senior Data Engineer position requires expertise in designing, building, and maintaining high-performance, large-scale complex data architectures using Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps, and cloud-based large-scale data applications.The ideal candidate will have a strong background in data engineering...


  • Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the OpportunityWe are seeking a highly skilled Senior Data Engineer to join our team. As a key member of our data engineering team, you will be responsible for designing, building, and maintaining high-performance, large-scale complex data architectures using Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps, and cloud-based...

  • Data Architect Lead

    2 weeks ago


    Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the RoleThis position requires designing, building, and maintaining high-performance, large-scale complex data architectures using Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps, and cloud-based large-scale data applications.Key responsibilities include architecting, designing, developing, testing, and maintaining big data...

  • Cloud Data Engineer

    1 week ago


    Islamabad, Islamabad, Pakistan beBee Careers Full time

    Job DescriptionThis is an exciting opportunity to contribute to a growing data and analytics team. As a Data Engineer – Azure & AI Integration, you will be responsible for developing and maintaining scalable data pipelines, implementing machine learning workflows, and ensuring a robust data architecture leveraging the Azure ecosystem.Main...


  • Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the PositionWe are seeking a Lead Big Data Engineer to join our team. The successful candidate will be responsible for architecting, designing, and developing large-scale data systems using big data technologies such as Apache Spark and Azure Databricks.The ideal candidate will have strong experience with cloud-based large-scale data applications and...


  • Islamabad, Islamabad, Pakistan beBee Careers Full time

    About the RoleWe are seeking a skilled Cloud Data Engineer to join our mission-driven team, building an innovative cloud-based platform to combat coral reef degradation caused by global warming.Job Description:Data System Design: Design and optimize data systems that power our conservation efforts, ensuring scalability, reliability, and...

  • Cloud Data Engineer

    1 week ago


    Islamabad, Islamabad, Pakistan Nisum Full time

    2 weeks ago Be among the first 25 applicantsJob DescriptionNisum is a leading global digital commerce firm headquartered in California, with services spanning digital strategy and transformation, insights and analytics, blockchain, business agility, and custom software development. Founded in 2000 with the customer-centric motto "Building Success Together,"...