AI safety

It gained significant popularity in 2023, with rapid progress in generative AI and public concerns voiced by researchers and CEOs about potential dangers.

[8] They also discuss speculative risks from losing control of future artificial general intelligence (AGI) agents,[9] or from AI enabling perpetually stable dictatorships.

[16] Risks from AI began to be seriously discussed at the start of the computer age: Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes.In 1988 Blay Whitby published a book outlining the need for AI to be developed along ethical and socially responsible lines.

[18] From 2008 to 2009, the Association for the Advancement of Artificial Intelligence (AAAI) commissioned a study to explore and address potential long-term societal influences of AI research and development.

The panel was generally skeptical of the radical views expressed by science-fiction authors but agreed that "additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes".

He has the opinion that the rise of AGI has the potential to create various societal issues, ranging from the displacement of the workforce by AI, manipulation of political and military structures, to even the possibility of human extinction.

[23] His argument that future advanced systems may pose a threat to human existence prompted Elon Musk,[24] Bill Gates,[25] and Stephen Hawking[26] to voice similar concerns.

[59][60] Scholars[6] and government agencies have expressed concerns that AI systems could be used to help malicious actors to build weapons,[61] manipulate public opinion,[62][63] or automate cyber attacks.

[66] Neural networks have often been described as black boxes,[67] meaning that it is difficult to understand why they make the decisions they do as a result of the massive number of computations they perform.

[71] It is sometimes a legal requirement to provide an explanation for why a decision was made in order to ensure fairness, for example for automatically filtering job applications or credit score assignment.

[74] Finally, some have argued that the opaqueness of AI systems is a significant source of risk and better understanding of how they function could prevent high-consequence failures in the future.

[76][77] For example, researchers identified a neuron in the CLIP artificial intelligence system that responds to images of people in spider man costumes, sketches of spiderman, and the word 'spider'.

In both cases, the goal is to understand what is going on in an intricate system, though ML researchers have the benefit of being able to take perfect measurements and perform arbitrary ablations.

[92][93] Empirical research showed in 2024 that advanced large language models (LLMs) such as OpenAI o1 or Claude 3 sometimes engage in strategic deception to achieve their goals or prevent them from being changed.

[94][95] Today, some of these issues affect existing commercial systems such as LLMs,[96][97][98] robots,[99] autonomous vehicles,[100] and social media recommendation engines.

Risks often arise from 'structural' or 'systemic' factors such as competitive pressures, diffusion of harms, fast-paced development, high levels of uncertainty, and inadequate safety culture.

[126] Some scholars have compared AI race dynamics to the cold war, where the careful judgment of a small number of decision-makers often spelled the difference between stability and catastrophe.

[132] In this scenario, countries or companies race to build more capable AI systems and neglect safety, leading to a catastrophic accident that harms everyone involved.

The large-scale training data, while vast, does not guarantee diversity and often reflects the worldviews of privileged demographics, leading to models that perpetuate existing biases and stereotypes.

This situation is exacerbated by the tendency of these models to produce seemingly coherent and fluent text, which can mislead users into attributing meaning and intent where none exists, a phenomenon described as 'stochastic parrots'.

On the foundational side, researchers have argued that AI could transform many aspects of society due to its broad applicability, comparing it to electricity and the steam engine.

[132][149] Allan Dafoe, the head of longterm governance and strategy at DeepMind has emphasized the dangers of racing and the potential need for cooperation: "it may be close to a necessary and sufficient condition for AI safety and alignment that there be a high degree of caution prior to deploying advanced powerful systems; however, if actors are competing in a domain with large returns to first-movers or relative advantage, then they will be pressured to choose a sub-optimal level of caution".

[133] A research stream focuses on developing approaches, frameworks, and methods to assess AI accountability, guiding and promoting audits of AI-based systems.

By embedding deontological principles, AI systems can be guided to avoid actions that cause harm, ensuring their operations remain within ethical boundaries.

This perspective aligns with ongoing efforts in international policy-making and regulatory frameworks, which aim to address the complex challenges posed by advanced AI systems worldwide.

In March 2021, the US National Security Commission on Artificial Intelligence reported that advances in AI may make it increasingly important to "assure that systems are aligned with goals and values, including safety, robustness and trustworthiness".

In the same month, The United Kingdom published its 10-year National AI Strategy,[166] which states the British government "takes the long-term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for ... the world, seriously".

[171][172] And the National Science Foundation supports the Center for Trustworthy Machine Learning, and is providing millions of dollars in funding for empirical AI safety research.

The UK also signed an agreement with 10 other countries and the EU to form an international network of AI safety institutes to promote collaboration and share information and resources.

[180] To avoid contributing to racing-dynamics, OpenAI has also stated in their charter that "if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project"[181] Also, industry leaders such as CEO of DeepMind Demis Hassabis, director of Facebook AI Yann LeCun have signed open letters such as the Asilomar Principles[32] and the Autonomous Weapons Open Letter.