Generative AI Research Scientist We are seeking a talented Generative AI Research Scientist to join our team in Portugal.
About the Role The successful candidate will play a crucial role in evaluating, measuring, and assessing Agentic systems that have as main goal improving the wellbeing of all users.
They will collaborate with a multidisciplinary team of product managers, engineers, and domain experts to create innovative AI applications that solve real-world problems.
Main Responsibilities Metric Design & Analysis: Collaborate with product managers and engineering teams to define and measure metrics for user engagement, satisfaction, and chatbot effectiveness; Data Preparation: Perform data cleansing, transformation, and quality analysis on product data; Evaluation and Testing: Assist and design robust evaluation metrics to assess agentic AI performance, combining automated evaluation (LLM-as-judge), adversarial testing, human-in-the-loop evaluations, and custom behavioral metrics; Prompt Engineering: Craft reusable templates and clear instructions to measure system behavior, optimizing for challenging edge cases; LLM Observability: Design monitoring systems that track LLM behavior in production, capturing key metrics around information retrieval, hallucinations, and latency; A/B testing: Design, run, and analyze A/B tests to optimize chatbot interactions and user engagement; Dashboard Development: Create and maintain interactive dashboards using BI tools (e.G., Tableau, Power BI, Looker, Metabase) for real-time visualization of performance metrics and insights.
Requirements Bachelor's degree in Computer Science, Data Science, Machine Learning, Statistics, or a related field; Proficiency in Python for data handling and analysis; Advanced NoSQL and SQL skills for querying large datasets; Strong problem-solving abilities, with a focus on experimental design and data analysis; Ability to work collaboratively in a team environment and communicate effectively across different departments; Excellent verbal and written communication skills, with the ability to explain complex technical concepts to non-technical stakeholders; Fluent in English.
Preferred Qualifications Familiarity with evaluation methodologies for generative AI, including human-in-the-loop assessments, LLM-as-a-judge approaches, and adversarial robustness testing; Proven track record of tracking and evaluating generative AI models in production; Experience in multi-agent systems and LLMs.
Benefits We're a wellness company that is committed to the health and well-being of our employees; Flexibility fosters a happier, healthier, and more productive work environment for everyone; Paid time off, parentalleave, and career growth opportunities; An exciting and supportive environment filled with passionate individuals from all over the world.