Thesis•MSc Data Science, King’s College London
Enhancing Tree-Based Generative Modelling with Differential Privacy
A robust framework for generating high-fidelity synthetic tabular data using Adversarial Random Forests
London, UK•Jan 2024 – Sep 2024
- Contribution: Developed DP-ARF, a privacy-preserving generative model extending Adversarial Random Forests with calibrated Laplace and Exponential noise during density estimation to synthesize tabular data.
- Evaluation: Benchmarked against non-private models across Healthcare, Insurance, and Adult datasets using PCA, Wasserstein distance, and classifier accuracy; achieved optimal privacy-utility balance at epsilon=0.5 with significant resistance to Reconstruction Attacks.
PythonScikit-LearnPandasNumPySciPyMatplotlibSeabornarfpyDifferential Privacy (Laplace & Exponential Mechanisms)Adversarial Random Forests