Connect with us


Betterdata uses synthetic data to keep real data safe

Betterdata, a Singapore-based startup that uses programmable synthetic data to keep real data secure, announced today it has raised $1.55 million. The seed round, which it says was oversubscribed, was led by Investible with participation from Franklin Templeton, Xcel Next, Singapore University of Technology and Design, Bon Auxilium, Tenity, Plug and Play and Entrepreneur First.

The startup was founded in 2021 by Dr. Uzair Javaid, its CEO, and chief technologist Kevin Yee, with the goal of making data sharing faster and more secure as data protection regulations increased around the world. The company is currently in research and development partnerships with two major universities in Singapore and the United States (it can’t publicly disclose who they are) and its clients include Shanghai Pudong Development Bank.

Betterdata says it is different from traditional data sharing methods that use data anonymization to destroy data because it utilizes generative AI and privacy engineering instead.

Yee explained to TechCrunch that programmatic synthetic data uses generative models, like deep learning models, including generative adversarial models used in deepfakes, transformers used in ChatGPT and diffusion models used in stable diffusion, to create and augment new datasets.

These synthetic datasets have similar characteristics and structure to real-world data without disclosing sensitive or private information about individuals.

“The idea is to create a fictional version of a real dataset that can be used safely for a variety of purposes including safeguarding confidential data, reducing bias and also improving machine learning models,” he said.

Programmatic synthetic data helps developers in many ways. A few examples include helping them protect sensitive data, comply with data protection regulations like GDPR and HIPAA, increase data availability between teams, create more data to train, test and validate machine learning models and address data imbalance issues by creating more records for underrepresented groups or classes.

Continue Reading