AI/ML Engineer - Web Data Quality - Remote
1 month ago Be among the first 25 applicants
About Us
At Zyte, we eat data for breakfast and you can work from anywhere. Founded in 2010, we are a globally distributed team of over 250 Zytans across 28 countries, on a mission to enable our customers to extract the data they need to innovate and grow. We believe all businesses deserve a smooth pathway to data and lead the way in building powerful, easy‑to‑use tools to collect, format, and deliver web data quickly, dependably, and at scale.
Roles & Responsibilities
* Design and implement AI‑driven quality checks: build models to detect anomalies, identify schema drift, and classify data errors in real time.
* Automate and scale QA: replace manual and rule‑based validation with ML‑powered solutions that continuously improve.
* Leverage GenAI for validation: use embedding models, LLMs, and prompt‑driven pipelines to perform semantic checks on scraped data.
* Develop monitoring & alerting pipelines: quantify data quality via KPIs, dashboards, and automated reports for stakeholders.
* Experiment & innovate: research and prototype new AI techniques for QA, e.g. using embeddings, synthetic data, and reinforcement learning to stress‑test scrapers.
* Collaborate cross‑functionally: work with developers, product managers, and account teams to integrate AI‑based QA into production workflows.
* Communicate insights: present findings with clear visualizations, metrics, and evidence‑based recommendations to technical and non‑technical audiences.
Requirements
* Proficiency in Python & PyData stack (NumPy, pandas, scikit‑learn, PyTorch/TensorFlow preferred).
* 3+ years in a data science, applied ML, or data engineering role (ideally with exposure to QA or data validation at scale).
* Hands‑on experience with GenAI tools: LLM APIs (OpenAI, Anthropic, Google), prompt engineering, cost/token optimization.
* Strong ML fundamentals: anomaly detection, classification, clustering, embeddings, evaluation metrics.
* Experience with big data frameworks (Spark, BigQuery, or similar).
* Ability to work with very large datasets (millions+ of records).
* Version control skills (GitHub/Bitbucket).
* Excellent communication in English, both technical and non‑technical.
Desired Skills
* Prior experience in data quality automation or web data QA.
* Familiarity with LangChain, MCP, Marvin, or similar orchestration frameworks.
* Experience building QA dashboards or visualization layers.
* Background in statistics or applied mathematics.
* Previous remote/distributed work experience.
Benefits
As a new Zytan, you will:
* Become part of a self‑motivated, progressive, multi‑cultural team.
* Have the freedom and flexibility to work from where you do your best work.
* Attend conferences and meet team members from across the globe.
* Work with cutting‑edge open source technologies and tools.
Seniority level
Mid‑Senior level
Employment type
Full‑time
Job function
Quality Assurance
Industries
IT Services and IT Consulting
#J-18808-Ljbffr