Volume 10 - Issue 4
A Kernel Density Estimation Method to Generate Synthetic Shifted Datasets in Privacy-Preserving Task
- Muhammad Syafiq Mohd Pozi
School of Computing, Universiti Utara Malaysia, 06010, Sintok, Kedah, Malaysia
syafiq.pozi@uum.edu.my
- Mohd. Hasbullah Omar
School of Computing, Universiti Utara Malaysia, 06010, Sintok, Kedah, Malaysia
mhomar@uum.edu.my
Keywords: Privacy Preservation, Dataset Shift, Data Anonymization, Differential Privacy
Abstract
In order to perform comprehensive analytic task, it requires the availability of any particular complete
dataset in the first place. However, due to privacy concern, the specific demand on sharing full dataset
to third parties is hardly to be fulfilled. New methods using systematically synthetic data generation
in order to preserve the data privacy have recently been explored and identified as a suitable approach
to address the privacy concern. Throughout this work, a privacy-preserving probability based
synthetic data generation framework for supervised based data analytic is proposed. Using a generative
model that captures and represents the probability density function of dataset features, a new
privacy-preserving synthetic dataset is synthesized, such that, the new dataset is statistically different
from the original dataset. Then, we simulate a supervised learning task using two different machine
learning classifiers, as a method to compare the utility of original and the new privacy-preserving
synthesized dataset. From the experimental results, we found that the proposed synthetic generation
model can produces a new privacy-preserving synthesized dataset, that has similar data utility as to
the original dataset.