Explainable artificial intelligence

[1][2] The main focus is on the reasoning behind the decisions or predictions made by the AI algorithms,[3] to make them more understandable and transparent.

[5] XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

This is especially important in domains like medicine, defense, finance, and law, where it is crucial to understand decisions and build trust in the algorithms.

[11] Many researchers argue that, at least for supervised machine learning, the way forward is symbolic regression, where the algorithm searches the space of mathematical expressions to find the model that best fits a given dataset.

A human can audit rules in an XAI to get an idea of how likely the system is to generalize to future real-world data outside the test set.

[32][33] One transparency project, the DARPA XAI program, aims to produce "glass box" models that are explainable to a "human-in-the-loop" without greatly sacrificing AI performance.

These tools aim to ensure that the system operates in accordance with ethical and legal standards, and that its decision-making processes are transparent and accountable.

Treating the model as a black box and analyzing how marginal changes to the inputs affect the result sometimes provides a sufficient explanation.

[44] Scholars sometimes use the term "mechanistic interpretability" to refer to the process of reverse-engineering artificial neural networks to understand their internal decision-making mechanisms and components, similar to how one might analyze a complex machine or computer program.

[47] Studying the interpretability of the most advanced foundation models often involves searching for an automated way to identify "features" in generative pretrained transformers.

Enhancing the ability to identify and edit features is expected to significantly improve the safety of frontier AI models.

MYCIN, developed in the early 1970s as a research prototype for diagnosing bacteremia infections of the bloodstream, could explain[56] which of its hand-coded rules contributed to a diagnosis in a specific case.

For instance, SOPHIE could explain the qualitative reasoning behind its electronics troubleshooting, even though it ultimately relied on the SPICE circuit simulator.

"[58]: 164–165 By the 1990s researchers began studying whether it is possible to meaningfully extract the non-hand-coded rules being generated by opaque trained neural networks.

[59] Researchers in clinical expert systems creating[clarification needed] neural network-powered decision support for clinicians sought to develop dynamic explanations that allow these technologies to be more trusted and trustworthy in practice.

[9] In the 2010s public concerns about racial and other bias in the use of AI for criminal sentencing decisions and findings of creditworthiness may have led to increased demand for transparent artificial intelligence.

[63][17][16][64][65][66] This includes layerwise relevance propagation (LRP), a technique for determining which features in a particular input vector contribute most strongly to a neural network's output.

[67][68] Other techniques explain some particular prediction made by a (nonlinear) black-box model, a goal referred to as "local interpretability".

Several groups found that neurons can be aggregated into circuits that perform human-comprehensible functions, some of which reliably arise across different networks trained independently.

[88] As regulators, official bodies, and general users come to depend on AI-based dynamic systems, clearer accountability will be required for automated decision-making processes to ensure trust and transparency.

[90][91] The European Union introduced a right to explanation in the General Data Protection Regulation (GDPR) to address potential problems stemming from the rising importance of algorithms.

[92] In France the Loi pour une République numérique (Digital Republic Act) grants subjects the right to request and receive information pertaining to the implementation of algorithms that process data about them.

For example, competitor firms could replicate aspects of the original AI system in their own product, thus reducing competitive advantage.

[95] Many approaches that it uses provides explanation in general, it doesn't take account for the diverse backgrounds and knowledge level of the users.

While these explanations served to increase both their self-reported and objective understanding, it had no impact on their level of trust, which remained skeptical.

[100][101] Critiques of XAI rely on developed concepts of mechanistic and empiric reasoning from evidence-based medicine to suggest that AI technologies can be clinically validated even when their function cannot be understood by their operators.

The goals of XAI amount to a form of lossy compression that will become less effective as AI models grow in their number of parameters.

Peters, Procaccia, Psomas and Zhou[105] present an algorithm for explaining the outcomes of the Borda rule using O(m2) explanations, and prove that this is tight in the worst case.

Yang, Hausladen, Peters, Pournaras, Fricker and Helbing[106] present an empirical study of explainability in participatory budgeting.

Given a coalitional game, their algorithm decomposes it to sub-games, for which it is easy to generate verbal explanations based on the axioms characterizing the Shapley value.

Grokking is an example of phenomenon studied in interpretability. It involves a model that initially memorizes all the answers ( overfitting ), but later adopts an algorithm that generalizes to unseen data. [ 45 ]