Descrição do trabalho
Job Description
Zendesk’s people have one goal in mind: to make Customer Experience better. Our products help more than 125,000 global brands (AirBnb, Uber, JetBrains, Slack, among others) make their billions of customers happy, every day.
Our team is responsible for helping Customer Experience teams to achieve their best, by intelligently solving repetitive work, so they can shift their focus to solving more sophisticated problems. We use the latest trends in Machine Learning and AI algorithms to help us on that mission, and we're passionate about empowering our customers.
As a Senior ML Engineer, you’ll be responsible for developing products in collaboration with our Research Scientists and other Machine Learning Engineers, and delivering high-quality ML and AI products to our customers, at a scale that most companies only dream of.
- What you get to do every day
- Write robust, maintainable, and production-grade code to deliver ML-powered features (e.g., intent detection, sentiment/language analysis, intelligent agent routing) that directly impact millions of end users.
- Design, build, and optimize scalable, reliable ML pipelines for processing large volumes of structured and unstructured text data (including real-time customer conversations).
- Collaborate with ML Scientists and Product teams to productionize new models, LLM-powered services, and experiment with emerging AI technologies in the context of intelligent triage.
- Lead and participate in technical design, code reviews, and architecture decisions for ML/AI systems.
- Develop and evolve MLOps processes (CI/CD, model versioning, monitoring, and observability), ensuring efficient model deployment and high system reliability.
- Mentor and support junior engineers; share knowledge of model development, deployment, and best practices.
- Investigate and resolve complex production issues, including model failures, data drift, and system performance bottlenecks.
- Key challenges / use cases
- How do we enrich customer service conversations with accurate language detection, intent recognition, and real-time sentiment analysis, to enable proactive customer engagement and optimal routing?
- How can we automate all customer service interactions as much as possible, from process automation to agent assistance and chatbots with a knowledge base?
- How do we optimize routing at scale—matching tickets or chats to the most appropriate agent/team in real-time across multiple languages and regions?
- How do we automate large-scale A/B testing and model evaluation (online and offline) to continually iterate and improve ML-driven triage and agent-assist tools?
- How do we extend our retrieval and information extraction platforms to support new conversational AI use cases?
- How do we efficiently serve and monitor large ML/LLM models in a high-throughput, low-latency production environment?
- How do we combine signals from conversation context, customer history, and external data to improve prediction and decision accuracy across our ML services?
- How do we ensure fairness, explainability, and compliance in ML-driven customer interactions?
- And many more!
- What You Bring To The Role
- Proven track record as a solid software engineer with a focus on Python-based software development.
- Advanced proficiency with scalable data processing frameworks and tools (e.g., Spark, AWS Batch, Airflow).
- Solid grasp of SQL, distributed database technologies, and data modeling for large/heterogeneous datasets.
- Experience with MLOps: CI/CD for ML, monitoring, model registries, automated retraining, and rollback.
- Familiarity with cloud environments (AWS preferred, but GCP/Azure experience valued), and microservices architectures (Kubernetes, Docker).
- Experience with modern NLP/LLM libraries and platforms (HuggingFace, OpenAI, etc.) and integrating LLMs into production workflows is a significant plus.
- A self-managed and dedicated approach with the ability to work independently.
- Strong problem-solving capabilities as well as the flexibility (of working style) to deal with changing and conflicting priorities.
- Knowledge of Machine Learning to keep up with the latest developments in the team and be part of all stages of the ML model development cycle..
- Ability to mentor, review code, and drive technical excellence within a multi-disciplinary team.
- What Our Tech Stack Looks Like
- Our code is written in Python and Ruby.
- Our servers live in AWS.
- Our machine learning models rely on PyTorch.
- Our ML pipelines use AWS Batch and MetaFlow.
- Our data is stored in S3, RDS MySQL, Redis, ElasticSearch, Snowflake and Aurora.
- Our services are deployed to Kubernetes using Docker, and use Kafk