Senior Platform Engineer (Kubernetes & Data Infrastructure)

09, January 2026

Descrição do trabalho

About Sybilion

Sybilion builds AI-driven market forecasting for process industries (chemicals, packaging, pulp & paper, textiles, and broader manufacturing). We help procurement, supply chain, and commercial teams make better buy/sell decisions by turning messy external signals and internal operational data into clear, defensible forecasts that teams trust and act on.

Our stack includes Python-based microservices, PostgreSQL data infrastructure, and ML/AI workflows that support forecasting models and decision tooling.

About the Role

We’re hiring someone to own both our platform and data infrastructure: Kubernetes administration, Linux systems, CI/CD, observability, and PostgreSQL administration for our data lakes and ML pipelines. You’ll keep production reliable, fast, secure, and scalable, while supporting the day-to-day needs of our engineers and ML workflows.

This is an on-site role in Maia (Porto). We value in-person collaboration and move quickly.

What You’ll Do

  • Platform / Kubernetes / Systems
  • Design, deploy, and operate Kubernetes clusters in production (networking, storage, security)
  • Operate Linux server infrastructure (Ubuntu/RHEL), patching, hardening, and reliability
  • Manage Docker image lifecycle (builds, optimisation, registry management, security scanning)
  • Implement and maintain CI/CD pipelines for microservices deployments and infrastructure changes
  • Build and maintain Infrastructure as Code (Terraform, Ansible, Helm) and Git workflows
  • Operate and improve monitoring, logging, and alerting (Prometheus/Grafana, ELK/EFK/Loki, etc.)
  • Manage secrets and credentials securely (Vault, Sealed Secrets, or equivalent)
  • Ensure high availability, capacity planning, incident response, and disaster recovery readiness
  • Support GPU-enabled workloads and ML/LLM deployments (resource allocation, utilisation, scaling)
  • PostgreSQL / Data Infrastructure
  • Administer and optimise PostgreSQL databases and data lake infrastructure (performance, reliability, cost)
  • Own backup/recovery and disaster recovery procedures (including point-in-time recovery)
  • Design schemas, indexing strategies, and query optimisation approaches; analyse execution plans
  • Manage migrations and versioning (schema changes, rollout strategies, rollback plans)
  • Implement replication/failover/clustering patterns for high availability
  • Own database security: access controls, encryption at rest/in transit, audit logging, compliance needs
  • Python Microservices / Data Pipelines / ML Workflows
  • Support deployment and troubleshooting of Python microservices (FastAPI/Flask/Django or similar)
  • Help maintain Python environments and dependency management (pip/poetry/conda/mamba)
  • Support ETL/ELT pipelines feeding our data lake and ML training workflows
  • Implement data quality checks and validation where needed
  • Partner with engineers and ML team to improve runtime performance, reliability, and operational visibility
  • Must-Have Experience (Required)
  • 5+ years of hands-on production experience in: Linux, Docker, Kubernetes, and PostgreSQL
  • Strong Kubernetes administration skills (clusters, networking, ingress, storage, RBAC, security)
  • Strong PostgreSQL administration skills (performance tuning, backups, replication/HA, security)
  • Strong Linux systems skills (operations, troubleshooting, hardening)
  • CI/CD experience (GitHub Actions/GitLab CI/Jenkins or similar)
  • Infrastructure as Code experience (Terraform and/or Ansible; Helm for Kubernetes)
  • Observability experience (metrics, logs, alerting; root-cause analysis)
  • Solid Python literacy for debugging services and automating operational tasks
  • Strong communication skills in English and comfort working independently end-to-end
  • Willingness to participate in an on-call rotation for critical systems
  • Preferred (Nice to Have)
  • Startup background (you’ve worked in small teams, moved fast, and owned outcomes end-to-end)
  • Experience running ML infrastructure (MLflow, Kubeflow, Airflow, KServe/TorchServe, etc.)
  • GPU cluster experience (NVIDIA GPU Operator or similar) and model serving optimisation
  • Experience with service mesh (Istio/Linkerd)
  • Experience with cloud managed databases (AWS RDS, GCP Cloud SQL, Azure Database)
  • Familiarity with data lake / warehouse patterns and data versioning (DVC/MLflow tracking)
  • Experience with Redis/MongoDB or other complementary data systems
  • Soft Skills We Value
  • Strong problem-solving and analytical mindset
  • Calm, structured incident handling and good judgement under pressure
  • Proactive improvement orientation (you spot issues before they become outage