Metric Design & Analysis: Collaborate with product managers and engineering teams to define and measure metrics for user engagement, satisfaction, and chatbot effectiveness;
Data Preparation: Perform data cleansing, transformation, and quality analysis on product data;
Evaluation and Testing: Assist and design robust evaluation metrics to assess agentic AI performance;
Prompt Engineering: Craft reusable templates and clear instructions to measure system behavior;
LLM Observability: Design monitoring systems to track LLM behavior in production;