BSc, MSc in Computer Science, Electrical/Computer Engineering or any related technical discipline.
Minimum 5 years of production-level experience in big data manipulation, using a high-level programming language, e.g. Python/Java/Scala, solving complex problems and delivering quality outcomes (we use Python).
Working experience in building robust data pipelines using open source distributed computing frameworks (Apache Spark, Apache Flink, Dask).
Working experience in designing, constructing, cataloging and optimizing data lake infrastructures (e.g. MinIO / Amazon S3, Hive Metastore / Glue Data Catalog).
Experience with Cloud Technologies and Serverless Computing (we use AWS).
Familiarity in using Docker for local development and tuning applications deployed on a Kubernetes cluster.
Familiarity with performing SQL analytic workloads against cloud data warehouses (e.g. Amazon Redshift) or data lakes (e.g. Presto, Amazon Athena).
Excellent understanding of software testing, agile development methodology and version control.
Excellent understanding of Big Data File Formats (Apache Parquet/Avro/ORC) and how to leverage the power of their metadata information.
We are a multinational company, fluency in English is a must.
We thrive through team collaboration, we are on the lookout for team players.
We encourage everyone to think out of the box;curiosity and willingness to learn new technologies and evolve as an individual and as a team member is highly appreciated
What it would be great to have (a strong plus)
Working experience in building scalable data streaming applications (e.g. Spark Streaming, Apache Flink, Amazon Kinesis Data Streams).
Working experience with a workflow orchestration tool (e.g Airflow, Luigi).
Professional exposure to SQS/SNS, Apache Kafka or other brokers.
Knowledge of NoSQL databases, mainly key-value data stores (Redis) and document-oriented databases (MongoDB).