Data Science Leader
We are seeking an experienced Data Science Leader to join our Generative AI team in Portugal. This is a remote position, allowing you to work from anywhere within the country.
Multimodal Data Extraction: Utilize cutting-edge tools (OCR, vision-language models, document understanding frameworks) to interpret diverse input types;
Prompt Engineering: Develop and refine strategies for using LLMs to extract, summarize, and transform unstructured content into structured formats;
_Data Quality & Structuring_: Clean, validate, and transform messy, unstructured data into well-defined schemas ready for use in training or analytics pipelines;
Content Filtering: Define standards and build systems for cleaning, validating, and filtering data to ensure accuracy, reduce bias, and align with ethical/safety guidelines;
Human-in-the-Loop Feedback: Design feedback loops where experts validate or enrich data, improving LLM-based extraction reliability;
Scalability & Optimization: Architect cost-efficient, high-throughput data pipelines that are robust to noisy or incomplete sources;
Research & Prototyping: Experiment with emerging tools and methods in the LLM + multimodal space, exploring new ways to enhance information coverage and extraction reliability;
Collaboration: Partner with data engineers and other data scientists to integrate collected data into larger AI and analytics systems; Main Requirements:
Masters degree (or Ph D) in Computer Science, Data Science, Machine Learning, Statistics, or a related field;
Proficiency in Python and experience with libraries for web scraping, OCR (e.g., Tesseract, Easy OCR), and NLP (e.g., Hugging Face Transformers);
Deep understanding of LLM capabilities in multimodal and extraction contexts, including prompt engineering and few-shot learning;
Strong background in unstructured data processing: APIs, web scraping, HTML parsing, OCR, image/document analysis;
Strong analytical problem-solving skills, with a track record of turning noisy data into high-quality datasets for ML;
Excellent communication and documentation skills, with the ability to influence across technical and product teams.