About me

I am a research and development data scientist at Metafora biosystems a biotechnology company based at the Cochin Hospital in Paris. i’m working on a variety of R&D problem for Metaflow an analysis solution for flow-cytometry and mass-cytometry dataset.

Former PhD student in Machine Learning / Statistic at University Paris-Saclay at the institut de mathématiques d’orsay under the supervision of Gilles Blanchard and Marc Glisse. Was also part of the Datashape team (INRIA).
My thesis was in collaboration with Metafora.

Research Interest

  • Flow cytometry data.
  • Kernel Mean Embedding and kernel methods in general.
  • Wasserstein metrics for Machine Learning.
  • Label Shift and Quantification Learning.

Thesis abstract

In supervised classification, it is not uncommon that the information sought is not local, meaning the label associated to each data point, but global: obtaining the proportions of the different labels within the sample directly.
This problem, which we have chosen to refer to as label shift quantification but which is also known by many other names in the literature, has seen a proliferation of publications since the mid-2000s.
However, these works often proceed in parallel, coming from communities with limited dialogue, resulting in a scattered bibliography.
In this manuscript, we first provide an overview of these diverse works with a twofold aim: first, to bridge the gap between these communities by presenting results from different research areas, and on the other hand, contextualise the subsequent work, particularly focusing on efforts to unify methods.
Second, we propose a framework that unifies several classical methods from the literature based on mean vectorisations. We examine the theoretical guarantees of these methods and demonstrate their robustness when the central assumption of label shift is violated. We also extend this work by focusing on kernel-based vectorisations using covariance information rather than just the mean.
Finally, we explore the use of a specific vectorisation based on Random Fourier Features in applications related to flow cytometry.

The manuscript is available here.

Publications

Complete list on ArXiv.

  • Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching (with G.Blanchard and B.Chérief-Abdellatif) ArXiv preprint and Journal version. (RT Track – Best Student Paper)

Talks

Invited talks

PhD thesis defence (slides).

Poster presentations

  • ECML/PKDD 2023, Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching (RT Track – Best Student Paper) (Poster)

Teaching

From September 2022 to June 2023, I was a teaching assistant at IUT Sceaux (Part of University Paris-Saclay)

  • Outil Mathématiques de gestion 1, B.U.T. GEA, IUT Sceaux, 2022-2023, taught by Patrick Pamphile.
  • Outil Statistiques de gestion 2, B.U.T. GEA, IUT Sceaux, 2022-2023, taught by Patrick Pamphile.

Seminar

I Co-organize the seminar for master students in Statistics and Machine Learning at Université Paris-Saclay.

CV

Curriculum Vitae

Education

  • 2024--, R&D Data Scientist at Metafora Biosystems
  • 2021--2024, PhD, University Paris-Saclay
  • 2020--2021, MSc, University Paris-Saclay (Master Mathématiques de l’Intelligence Artificielle)