About me
I am a research and development data scientist at Metafora biosystems a biotechnology company based at the Cochin Hospital in Paris. i’m working on a variety of R&D problem for Metaflow an analysis solution for flow-cytometry and mass-cytometry dataset.
Former PhD student in Machine Learning / Statistic at University Paris-Saclay at the institut de mathématiques d’orsay under the supervision of Gilles Blanchard and Marc Glisse. Was also part of the Datashape team (INRIA).
My thesis was in collaboration with Metafora.
Research Interest
- Flow cytometry data.
- Kernel Mean Embedding and kernel methods in general.
- Wasserstein metrics for Machine Learning.
- Label Shift and Quantification Learning.
Thesis abstract
In supervised classification, it is not uncommon that the information sought is not local, meaning the label associated to each data point, but global: obtaining the proportions of the different labels within the sample directly.
This problem, which we have chosen to refer to as label shift quantification but which is also known by many other names in the literature, has seen a proliferation of publications since the mid-2000s.
However, these works often proceed in parallel, coming from communities with limited dialogue, resulting in a scattered bibliography.
In this manuscript, we first provide an overview of these diverse works with a twofold aim: first, to bridge the gap between these communities by presenting results from different research areas, and on the other hand, contextualise the subsequent work, particularly focusing on efforts to unify methods.
Second, we propose a framework that unifies several classical methods from the literature based on mean vectorisations. We examine the theoretical guarantees of these methods and demonstrate their robustness when the central assumption of label shift is violated. We also extend this work by focusing on kernel-based vectorisations using covariance information rather than just the mean.
Finally, we explore the use of a specific vectorisation based on Random Fourier Features in applications related to flow cytometry.
The manuscript is available here.
Publications
Complete list on ArXiv.
- Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching (with G.Blanchard and B.Chérief-Abdellatif) ArXiv preprint and Journal version. (RT Track – Best Student Paper)
Talks
Invited talks
- DataShape Seminar, 2024 (slides)
- Journées de Statistique de la Société Francaise de Statistique, 2024 (slides) </li>
- Journées de Statistique de la Société Francaise de Statistique, 2023 (slides)
- DataShape Seminar, 2023 (slides)
- Séminaire des doctorants de l'équipe Probabilité et Statististiques de l'Institut de Mathématiques d'Orsay, 2023 (slides)
- ECML/PKDD 2023, Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching (RT Track – Best Student Paper) (slides)
- Workshop Efficient Statistical Testing for high-dimensional model (FAST-BIG) (slides)
PhD thesis defence (slides).
Poster presentations
- ECML/PKDD 2023, Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching (RT Track – Best Student Paper) (Poster)
Teaching
From September 2022 to June 2023, I was a teaching assistant at IUT Sceaux (Part of University Paris-Saclay)
- Outil Mathématiques de gestion 1, B.U.T. GEA, IUT Sceaux, 2022-2023, taught by Patrick Pamphile.
- Outil Statistiques de gestion 2, B.U.T. GEA, IUT Sceaux, 2022-2023, taught by Patrick Pamphile.
Seminar
I Co-organize the seminar for master students in Statistics and Machine Learning at Université Paris-Saclay.
CV
Education
- 2024--, R&D Data Scientist at Metafora Biosystems
- 2021--2024, PhD, University Paris-Saclay
- 2020--2021, MSc, University Paris-Saclay (Master Mathématiques de l’Intelligence Artificielle)