Research & Internship

Topology-Aware Machine Learning for Biomedical Data

Context

Chronic Lymphocytic Leukemia (CLL) is a complex and heterogeneous disease with highly variable clinical evolution. Traditional statistical approaches often struggle to capture the underlying structure of high-dimensional biological data such as genomic or transcriptomic signals.

This project is positioned at the intersection of applied mathematics, machine learning, and biomedical data analysis.


Objective

The goal of this internship is to develop an integrative analysis pipeline combining Topological Data Analysis (TDA) and Machine Learning (ML) to model relationships between biological markers and clinical outcomes.

The project explores both directions:

  • Prediction (Biology → Clinical)
    Extract robust topological features from biological data and use them to predict disease progression and clinical outcomes.

  • Inference (Clinical → Biology)
    Reconstruct plausible biological representations from observed clinical trajectories using generative and inverse modeling approaches.


Methodology

  • Integration of multi-omics and clinical datasets
  • Construction of topological representations (simplicial complexes, persistence diagrams)
  • Transformation into vectorized features (persistence images, landscapes)
  • Machine learning models (random forests, neural networks, kernel methods)
  • Dimensionality reduction (UMAP, t-SNE)
  • Exploration of generative and invertible models (VAE, conditional GANs, INNs)

Skills & Tools

Mathematics - Topological Data Analysis (TDA)
- High-dimensional data analysis
- Dynamical systems (conceptual exposure)

Machine Learning - Representation learning
- Predictive modeling
- Model evaluation and validation

Tools - Python
- scikit-learn
- PyTorch / TensorFlow
- GUDHI, Ripser


Key Learning Outcomes

  • Working with complex, high-dimensional, and structured data
  • Designing hybrid pipelines combining mathematical methods and ML
  • Understanding the importance of representation in predictive performance
  • Bridging theoretical concepts with applied machine learning

Positioning

This work reflects my interest in applied research at the intersection of mathematics and machine learning, especially for complex real-world data.

It complements my portfolio projects, which focus on business-oriented machine learning (churn prediction, risk modeling, ML pipelines), by adding a strong research and modeling dimension.