Results for ""
The buzzword of the technology world right now is none other than Machine Learning, which is also for good reasons. Companies deploy ML models for purposes including data security, predicting financial trading, precise healthcare diagnosis, marketing personalisation, detecting fraud, making recommendations, building autonomous cars and many a thing. Moreover, a lot of consumer data is fed into these ML models to perform various tasks.
AI models might sometimes remember specifics about the data they've been trained on and 'leak' these details later. Take, for example, during the pandemic, data collected from people's use of mobile phones, emails, banking, social media, and postal services, for example, was used to track the spread of the virus. However, if compromised, the same medical data may lead to COVID fraud scams, as experienced in the past.
Here comes the concept of differential privacy is a framework for assessing and limiting the risk of data leakage.
Some of the classical and conventional techniques, including hashing to ensure privacy is outdated. A real-world example, in 2005, the streaming service Netflix held a competition in which contestants were challenged to predict how a user would rate a film based on their previous movie ratings and the kinds of movies they had seen.
For issues of privacy, the competition had to be cancelled early, but why? Students from the University of Texas were able to identify the hashed users successfully. The Netflix team's belief that simply hashing user data would prevent privacy assaults was completely incorrect, as they failed to consider one of the essential components – the linkage attack.
Due to the over-parameterisation of deep-neural networks, Machine Learning models can sometimes inadvertently memorise individual samples, resulting in undesirable data breaches. Finally, it can be said with confidence that differential privacy is a delicate balancing act between privacy preservation and model utility or validity.
Differential privacy, to be precise, is a method of publicly giving out information about a dataset by explaining the patterns of groups within it while intentionally withholding information about individuals. The technique thus allows companies to customise levels of privacy and leave attackers with partially correct data only. As a result, it has some major advantages over traditional approaches:
However, certain limitations to the method exist. The method does not allow an analyst to understand data and learn details about a specific individual. Take, for instance; the method will remain of little use to banks looking for instances of fraudulent activities. Also, the inaccuracy or noise added via DP can be ignored for a large dataset. However, the same can severely impact the analysis of a small dataset.
Multiple differential privacy tools from big tech firms are open-sourced. These include Opacus from Facebook, TensorFlow Privacy from Google, Diffprivlib v0.4 from IBM, PyDP from OpenMined, and OpenDP from Harvard and Microsoft. According to a white paper published by the Simons Institute at the University of California, Berkeley, differential privacy is a viable alternative to traditional anonymisation techniques. Policymakers should collaborate closely with researchers to develop recommendations.