What can you tell me about work on Trust in AI that IBM India Research is driving? 

IBM is helping companies achieve greater trust, transparency and confidence in business predictions and outcomes by continuously developing state-of-the-art algorithms to build fair, explainable and protected AI.

IBM Research India is at the forefront of developing and delivering differentiation capabilities to infuse trust into the Data and AI lifecycle. IBM has pioneered a trusted AI infrastructure based on four pillars of trust – fairness, explainability, robustness, and assurance or lineage. The lab has co-led the development of the open source toolkit AI Fairness 360 which enables developers to detect and mitigate bias in AI models. AI Explainability 360 allows different personas to seek an explanation from the AI models. To further the mission of creating responsible AI-powered technology, IBM has moved these toolkits to Linux Foundation AI Foundation, open for all developers and data scientists.

These algorithms are made available to our enterprise customers through IBM products and services. IBM Research India works closely with product development teams to incorporate capabilities that continuously monitor if the models are acting in a biased fashion and how to remove model bias. Similarly, the customers can use AI Explainers available in IBM products and services. These are out of the box capabilities which can be invoked using GUI and without writing code. 

How do you make AI fair, explainable and protected?

Fairness has emerged as one of the core requirements for AI models. The business owners need to be sure that the model will not differentiate based on fairness attributes like age, gender, location etc. Typically, the fairness of the model is checked by metrics like disparate impact that computes if outcome follows similar patterns across different groups of the population. For example, take a model which approves or rejects loan applications. If the model approves 75% of the loans where Gender = MALE vs only 50% approval rate where Gender = FEMALE, then the model is acting in a biased fashion and needs to be investigated for bias mitigation.  

AI Explainability is the process of explaining the decisions made by AI models. Unless the AI outcomes can be explained, the business owners will find it tough to action on the decisions. Explainability is particularly challenging because different personas will have a different need for explanation. A customer will want to know why his loan application is rejected and what can be done to improve the chances next time. The explainer may point to a low credit score as the reason for rejection and also provide an indicative value to increase the approval probability. This is a very customized and local explanation that is valid for this customer only. However, a risk officer in the bank will want to look at the global explanation for over 100s of applications to see if, at an aggregate level model is paying attention to the correct variables like salary, credit score and not considering irrelevant attributes like age, location etc. These global explanations allows a risk officer get a broad idea of the working of the model and not get into specific customer details. 

The hosted models need to be protected against different types of threats like model extraction, evasion, inference, and poisoning, Model extraction refers to the attack where the adversary attempts to steal a model through queries whereas in a poisoning attack the adversary provides wrong feedback to confuse the models. The inference attack is of special significance since it can allow an attacker to learn sensitive and private information of the model's training data by accessing the trained model. IBM's Adversarial Robustness Toolkit (ART) provides capabilities to test the models across different classes of attacks to understand vulnerabilities in the model and ways to harden the model to protect against the adversaries. 

Can you give me a couple of examples of how you're achieving results in ensuring trustworthy AI? 

One of the largest retailers made use of AI models to make hiring decisions. They wanted to make sure that the AI model is fair and does not discriminate based on gender, ethnicity, etc. They use IBM Watson OpenScale to ensure that the AI model is fair and does not exhibit bias. They also use OpenScale to explain the AI model behaviour so that the hiring managers can trust the model recommendations. Using bias detection techniques, the hiring managers are more comfortable with the AI model's recommendations. 

One of the largest banks in the world wanted to govern the end-to-end lifecycle of their AI models. They wanted to ensure that their AI models are not biased and do not have the drift. Using IBM Watson OpenScale – which can automatically detect and mitigate harmful bias, the bank is alerted immediately if the models start acting in a biased manner or if their performance starts to deteriorate. Apart from bias, the AI explainers were used to explain the decisions for loan approvals.

In another instance, for customer care, an AI model was used to prioritize customer complaints. The model looked at the text of the complaint to assign priority. But how do we know if the model is assigning the correct priority? We used AI explainers to understand which words contribute to severity or prioritization. If added/removed, what words will cause the model to change communication from not a severe complaint to being not a complaint? Did communications with similar characteristics have the same outcome? 

The system was deployed at another large bank to increase confidence in AI models and increase overall business process efficiency.

What are the necessary prerequisites for IBM to get an endeavour like Trustworthy/ Reliable AI up and running?

IBM has followed a three pronged strategy to get market leader position for Trusted AI. 

Diverse Technical Teams - The trusted AI teams in IBM have team members with deep technical skills but with a diverse background ranging from core AI, optimization, Systems, High Performance Computing, Design, Social Science, Policy, Data etc. Each member is a technical expert in his/her field, so together they complement each other to create differentiated offerings and tools. Most of our teams are geographically dispersed which also helps garner a good representation of local issues in the system design. 


Participate in Policy Workgroups - We work with the government, and the industry ecosystem at large to help shape Trustworthy AI policies. These workgroups help bring out the learning that we have accumulated through working with customers in front of policymakers and other participants. Such dialogues and idea exchanges will eventually help in a better policy and regulatory framework for trusted AI.  


Engage with different stakeholders - As I mentioned earlier, we work with external developers and enterprise customers through Open Source and Product teams at IBM which helps us to evangelize the ideas and get feedback on our work. Additionally, we work with many academic partners to develop the new course curriculum and material for Trusted and Responsible AI. We regularly deliver lectures, keynotes, and tutorials on these topics to prepare the next generation of AI developers with the right skills.

 

What do you reckon is the turnaround time for making systems fair and unbiased? 

 With the right set of tools like AI Fairness 360 and Watson OpenScale, enterprises can set the right machinery for bias detection and mitigation very fast. One can do it without any programming by using the Watson OpenScale. However, infusing trust in AI should not be an afterthought and developers should think of these issues on day zero. There are multiple complexities that you encounter along the way. For example, most AI models do not consider Gender or Ethnicity while training could be biased because of other features (like income) which correlate with ethnicity. Therefore, any serious bias mitigation system should think of hidden and indirect biases. We should give fairness metric a fair chance! It should be considered as a first-class metric like accuracy, performance, precision, recall, etc.

What are the biggest challenges in this field right now? 

Overall, the topic of Trusted AI is getting a lot of attention from industry, academia, and policymakers which augurs well. There are plenty of tools out there that can help everyone to get started on this journey. However, what we lack is an end-to-end Data and AI Governance. Towards this, IBM Research has proposed the concept of AI Factsheets. Like nutrition labels for foods or information sheets for appliances, AI Factsheets provide information about the product’s important characteristics. A factsheet outlines the details about how it operates, how it was trained and tested, its performance metrics, fairness and robustness checks, intended uses, maintenance, and other critical details. Factsheets increase the transparency of AI models by allowing all stakeholders to see different facts about the data and AI Model. Standardizing and publicizing this information is essential to building consumer and enterprise trust in AI services across the industry. Another way to think about AI Factsheet is like nutritional labels on food products. Consumers can look at the nutrition content and decide if they want to consume the food or not. Our vision is to bring the same rigour and discipline to AI models through factsheets wherein developers, risk officers, LoB owners, etc can just look at AI Factsheet and decide confidently if the model suits their needs or not.

Another area that needs more investments is around helping and enabling different personas with Trusted AI tools. While there are multiple innovations on the algorithmic side (which typically help developers and scientists), there is a lack of rigor and tools for other personas like model validators, policy creators, data stewards, risk officers, and LoB owners.  

Can you touch upon some of the future-forward elements of Trust AI? What can be considered the next level to this solution? 

At IBM Research, we are working on two complementary aspects of AI Trust. 

First, Trust in Data Lifecycle. Approximately 80% of the time in the AI lifecycle is spent collecting, curating, cleaning, and preparing data for AI. There is a big gap in tools that help AI teams to prepare Data for AI tasks. To address this gap, we have already built a state-of-the-art toolkit that not only assesses the data quality but also provides a mechanism to improve data quality. The toolkits support time series and tabular data and we are constantly adding support for other types of data like text and images. With this toolkit, the issues in the data can be detected and corrected before the model training starts. The AI team can now have much more trust and confidence in the quality of the data.

Second, Model Testing. Before deploying models in production, it is essential to test them thoroughly to identify any shortcomings. While testing is well known for traditional codebases, it is in a very nascent stage for AI model testing. Our teams are working to address this challenge by building algorithms that generate millions of realistic test cases to test various properties like fairness, robustness, security, privacy, etc. of AI models. These test cases are then prioritized and executed to find failure points. We are also investing in techniques to help developers improve the model by focusing on the failed test cases.

We believe Data quality and Model testing toolkits add to the existing toolkits like AIF360 and AIX360. With all these toolkits, we will be able to infuse trust across the complete AI Lifecycle: Data, Model, and Deployments.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in