Zorba AI logo

Machine Learning Ops Engineer_6+ years

Zorba AI

Bengaluru, Karnataka, IndiaSENIOR

Job Description

Senior Data & ML Operations Engineer managing end-to-end ML pipelines.

Responsibilities

  • Monitor end-to-end data pipeline execution and ensure successful daily operations.
  • Identify, troubleshoot, and resolve pipeline failures, performance bottlenecks, and production issues.
  • Execute reruns and recovery procedures to minimize downtime and maintain SLA compliance.
  • Collaborate with cross-functional teams to resolve dependencies, blockers, and integration issues.
  • Implement preventive health checks, monitoring frameworks, and robust logging mechanisms.
  • Design and maintain dashboards for data validation, reconciliation, and quality monitoring.
  • Perform data quality assessments and ensure integrity, consistency, and accuracy of pipeline outputs.
  • Develop automated validation frameworks and quality checks across data workflows.
  • Build alerts and notification systems for pipeline failures, data anomalies, and operational issues.
  • Monitor model performance using statistical and business metrics.
  • Detect and analyze data drift, feature drift, and concept drift across production models.
  • Support deployment, monitoring, maintenance, and lifecycle management of ML models.
  • Implement model explainability techniques and performance reporting frameworks.
  • Develop intelligent agent-based solutions for automated monitoring, troubleshooting, and debugging.
  • Leverage Generative AI technologies for operational insights, issue summarization, and root cause analysis.
  • Automate repetitive operational tasks to improve platform reliability and efficiency.
  • Design, enhance, and maintain CI/CD pipelines for data and ML workloads.
  • Implement secure authentication mechanisms, including data-based authentication workflows.
  • Build and optimize deployment pipelines, release processes, and infrastructure automation.
  • Support DevOps best practices for version control, testing, deployment, and monitoring.
  • Communicate project status, risks, incidents, and resolutions effectively to stakeholders.
  • Ensure timely delivery of operational and project commitments.
  • Participate in incident management, root cause analysis, and continuous improvement initiatives.

Qualifications

  • Python
  • Databricks
  • SQL
  • Data Engineering & Data Processing
  • Machine Learning Engineering
  • MLOps
  • CI/CD Pipeline Development
  • Monitoring & Production Support
  • Data Validation & Data Quality Management
  • Logging & Observability Tools
  • Dashboard Development & Reporting
  • Statistical Analysis & Model Monitoring
  • Model Explainability Techniques
  • Generative AI Applications
  • Automation & Agent-Based Systems
  • Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD tools
  • MLflow
  • Apache Spark / PySpark
  • Cloud Platforms (Azure, AWS, or GCP)
  • Monitoring tools such as Datadog, Grafana, Prometheus, or equivalent
  • Experience with LLMs and GenAI frameworks
  • Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
  • 5+ years of experience in Data Engineering, MLOps, Production Support, or ML Platform Engineering.
  • Proven experience managing production-scale data and machine learning systems.
  • Strong analytical, troubleshooting, and communication skills.

Interested in this role?

Sign up free to apply on FeedbackAI and get an AI match score for this job.

Machine Learning Ops Engineer_6+ years at Zorba AI | FeedbackAI