Senior Data Scientist Job Description
We are seeking an experienced and skilled Senior Data Scientist to join our team in Portugal. This is a remote position, allowing you to work from anywhere within the country.
About the Role:
* Multimodal Extraction: Apply state-of-the-art tools (OCR, vision-language models, document understanding frameworks) to interpret diverse input types;
* Prompt Engineering: Develop and refine strategies for using LLMs to extract, summarize, and transform unstructured content into structured formats;
* Data Quality & Structuring: Clean, validate, and transform messy, unstructured data into well-defined schemas ready for use in training or analytics pipelines;
* Content Filtering: Define standards and build systems for cleaning, validating, and filtering data to ensure accuracy, reduce bias, and align with ethical/safety guidelines;
* Human-in-the-Loop Feedback: Design feedback loops where experts validate or enrich data, improving LLM-based extraction reliability;
* Scalability & Optimization: Architect cost-efficient, high-throughput data pipelines that are robust to noisy or incomplete sources;
* Research & Prototyping: Experiment with emerging tools and methods in the LLM + multimodal space, exploring new ways to enhance information coverage and extraction reliability;
* Collaboration: Partner with data engineers and other data scientists to integrate collected data into larger AI and analytics systems;
About You:
* A Master's degree (or PhD) in Computer Science, Data Science, Machine Learning, Statistics, or a related field;
* Proficiency in Python and experience with libraries for web scraping, OCR (e.g., Tesseract, Easy OCR), and NLP (e.g., Hugging Face Transformers);
* A deep understanding of LLM capabilities in multimodal and extraction contexts, including prompt engineering and few-shot learning;
* A strong background in unstructured data processing: APIs, web scraping, HTML parsing, OCR, image/document analysis;
* Strong analytical problem-solving skills, with a track record of turning noisy data into high-quality datasets for ML;
* Excellent communication and documentation skills, with the ability to influence across technical and product teams.
Benefits:
* Fully paid parental leave for all new parents;
* 25 days paid holiday per year with an additional day for each year of tenure (up to 5) in addition to annual holidays (including an extra holiday on your birthday);
* A home office stipend and a monthly flexible work allowance to help cover the costs of working from home;
* An extended career growth plan offering outstanding opportunities for personal and career development;
* Full remote work policy with flexible scheduling options;
* Fitness subsidies for access to onsite gyms and fitness studios;
* Digital wellness resources, such as online meditation classes and mental health support services;
Culture:
We value diversity, equity, and belonging at our company. We strive to create a collaborative and inclusive environment where everyone feels welcome and empowered to contribute their best work.