Design, build, and maintain scalable data platforms;
Collect, process, and analyze large and complex data sets from various sources;
Develop and implement data processing workflows using data processing framework technologies such as Spark, and Apache Beam;
Collaborate with cross-functional teams to ensure data accuracy and integrity;
Ensure data security and privacy through proper implementation of access controls and data encryption;
Extraction of data from various sources, including databases, file systems, and APIs;
Monitor system performance and optimize for high availability and scalability.
Qualifications:
Experience with cloud platforms and services for data engineering (GCP);
Proficiency in programming languages like Python, Java, or Scala;
Use of Big Data Tools as Spark, Flink, Kafka, Elastic Search, Hadoop, Hive, Sqoop, Flume, Impala, Kafka Streams and Connect, Druid, etc.;
Knowledge of data modeling and database design principles;
Familiarity with data integration and ETL tools (e.g., Apache Kafka, Talend);
Understanding of distributed systems and data processing architectures; Strong SQL skills and experience with relational and NoSQL databases;
Familiarity with cloud platforms and services for data engineering (e.g., AWS S3, Azure Data Factory);
Experience with version control tools such as Git.