As a senior data engineer, you will be responsible for transforming raw data into high-quality features used in machine learning models. This involves collaborating with cross-functional teams to deliver AI solutions that drive business value.
Key Responsibilities
* Design and implement robust data infrastructure for large-scale data processing, ensuring scalability and reliability.
* Create Python-based microservices for delivering processed data and features, leveraging Apache Spark and Py Spark for efficient data processing.
* Develop and maintain internal systems for CI/CD workflows, experimental tracking, and versioning of data, utilizing cloud-based ecosystems for optimal performance.
* Apply data quality measures to ensure accuracy and reliability in data processing workflows, identifying areas for improvement and implementing corrective actions.
Requirements:
* Demonstrated expertise in developing software with Python in large-scale environments, with a strong focus on data engineering principles.
* Strong proficiency in designing and managing structured and unstructured datasets, ensuring data consistency and integrity.
* Practical experience with Apache Kafka, cloud-based ecosystems, and infrastructure solutions, enabling seamless data integration and processing.
* Advanced knowledge of distributed system design and architecture, allowing for efficient data processing and scaling.
* Comprehensive hands-on experience with Apache Spark and Py Spark, driving data insights and informing business decisions.