Description

Après plusieurs années consacrées à la construction d'applications web riches, je me spécialise depuis 5 ans dans le développement d’applications data-centric à haute valeur ajoutée.

Mon expertise porte sur les différentes étapes du cycle de vie de la donnée, de son entrée dans le système à sa préparation, exposition et exploitation pour des usages variés tels la business intelligence, l'expérimentation data-science ou encore les traitements temps réel.

J’ai ainsi participé en tant que tech lead au développement d’une application d’ingestion de données en streaming, permettant aux opérateurs de traiter des informations de suivi marketing au fil de l’eau. Nous avons basé notre architecture sur Apache Kafka, dont la résilience nous a permis de bâtir un système distribué réactif et performant.

J’ai également supervisé comme Data Team Leader l’équipe en charge de l’exposition des données de consommation vidéo pour un grand groupe média français. Une architecture basée sur Spark et Delta Lake, monitorée via Superset, nous a assuré la maîtrise et la qualité des ingestions de données brutes.

L’orchestration des tâches via Apache Airflow nous a permis de bénéficier de l'élasticité du cloud Amazon et de lancer des clusters EMR à la demande afin de garantir la fraîcheur des données, besoin critique pour nos clients.

Garantir la robustesse, la justesse d'une application, mais aussi son évolutivité ne peut se faire sans de solides fondations de design logiciel et d'architecture, c'est pourquoi j'applique et aime à partager les bonnes pratiques de Software craftsmanship. Test Driven Design, loose coupling, Architecture hexagonale et Intégration Continue sont pour moi des notions essentielles que je mets quotidiennement en pratique, et fait partager aux équipes que j’accompagne.

Languages

English
Fluent
French
Native or bilingual

Workplace preferences

Can work on-site

Lyon (up to 10km)

Bloom social analytics
Lead Data Engineer
SOCIAL NETWORKS
May 2021 - March 2023 (1 year and 10 months)
Manage the development of the new data processing platform within a team of 4 developers.
The new platform is scalable to perform data analysis, enrichment and graph computation of multiples projects in parallel, each one containing from 1M to 40M documents to be processed.
With the new architecture, the average processing time decreased from 14 hours to 3 hours, strongly reducing the number of failures during the processing workflow and allowing the end users to be more comfortable in their work of data analyze, allowing them to process more data by providing fully processed data in a reasonable delivery time.
The new architecture mixes streaming and batch processing to provide a very fast orchestration of each analysis step.

## Streaming architecture

Microservices following the streaming enrichment pattern using Kafka as data source and output
Subset of data flowing through each microservice can be easily invalidated without performing manual operation nor topic cleaning
0 data loss with a at least once consuming strategy
Autoscaling managed by Kubernetes

## batch analysis architecture

Highly scalable data platform using AWS EMR autoscaling
Predictable workflows with failure recovery using airflow pipelines
Handle multiple datasources like Amazon S3, RDS/PostgreSQL, Elasticsearch, Kafka
Idempotent jobs
Apache Spark Apache Kafka GO Scala AWS Python Kubernetes Airflow Hadoop MongoDB Elasticsearch AWS S3 Docker
Pernod ricard
Cloud Data Architect
WINE AND SPIRITS
January 2021 - May 2021 (4 months)
Provide infrastructure support, guidelines and best practices to the data scientists teams in the building of their data platforms.
As a result of their ambition to become an innovative data driven company, Pernod Ricard created the Data Center of Excellence in 2020 to increase their capabilities of developing well suited cloud data platforms.
As a Data architect, I was in charge of providing the good architecture and tooling to allow data science teams to reduce their delivery time by speeding up their development and model training phase, and to design a robust architecture, to move from an advanced MVP to a scalable production grade product.
Microsoft Azure Snowflake Python 3 Azure Databricks Azure Data Lake Azure Functions Docker Azure DevOps
Bedrock
Data Team Leader
FILM AND AV
September 2019 - November 2020 (1 year and 2 months)
Lyon, France
Lead de la Team core-data :
design d'architecture technique
organisation et suivi des développements :
ingestion des données raw : Spark / Delta Lake (Scala) / AWS EMR
intégration données partenaires : Spark / Scala / AWS EMR / AWS EKS
transformations des données raw en données Core (Gold) et expositions de ces données aux clients: Apache Airflow (Python), Amazon Athena, Apache Superset
accompagnement prestataire externe sur mise en œuvre d'algorithmes d'alerting en vue du renforcement de la data-quality
évaluations solution spécifique de DataWarehouse (Snowflake) pour l'offre self-service analytics

Data Engineer Team A/B test (Spark/Scala/EMR) :
refonte application A/B test sur le modèle Functional Data Engineering
mise en place d'un système générique de calcul des KPI
migration Hadoop vers AWS
Spark aws Delta Lake Airflow EMR Scala Hadoop Athena Hive Java