AI Safety Unsolved Problems

Comment: This shouldn't exist on mainspace Whyiseverythingalreadyused (talk) 12:51, 23 October 2025 (UTC)

AI safety

Artificial intelligence (AI) safety is an interdisciplinary field focused on preventing accidents, misuse, risks, or other harmful consequences arising from AI systems. Problems here are considered unsolved if no answer is known or if there is significant disagreement among experts about a proposed solution.

Risk

How likely are the various pathways through which AI could cause significant, catastrophic, or existential harm? ^[1]^[2]

What follows after creating artificial general intelligence? ^[3]^[4]

What follows after creating superintelligence? ^[5]^[6]

Alignment

What are the human values or intentions that AI should be aligned to? ^[7]^[8]

How do we align increasingly capable systems? ^[9]^[10]

How can we understand and verify the objectives and reasoning processes of complex AI models? ^[11]^[12]

Control

Can a sufficiently intelligent AI be controlled? ^[5]^[13]^[14]

Ethics

How can algorithmic biases be overcome? ^[15]^[16]

How can the environmental impact of AI be reduced? ^[17]^[18]

How can the moral status of AI systems be evaluated?^[19]^[20]

Governance

How can AI be safely developed, evaluated, and deployed? ^[21]^[22]

How can society balance innovations in AI with the prevention of irreversible harms? ^[23]^[24]

Who is responsible for the actions of an AI model? ^[25]^[26]

References

↑ Turchin, Alexey; Denkenberger, David (2018-05-03). "Classification of global catastrophic risks connected with artificial intelligence". AI & Society. 35 (1): 147–163. doi:10.1007/s00146-018-0845-5. ISSN 0951-5666. Unknown parameter |s2cid= ignored (help)
↑ Chin, Ze Shen (2025). "Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks". arXiv:2508.06411 [cs.CY].
↑ Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books. p. 468. ISBN 9780316484916. Retrieved 29 October 2025. Search this book on
↑ McLean, Scott; Read, Gemma J. M.; Thompson, Jason; Baber, Chris; Stanton, Neville A.; Salmon, Paul M. (2021). "The risks associated with artificial general intelligence: a systematic review". Journal of Experimental & Theoretical Artificial Intelligence. 35 (4): 1–17. doi:10.1080/0952813X.2021.1964003. Retrieved 29 October 2025.
↑ ^5.0 ^5.1 Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112. Search this book on
↑ PauseAI. "The extinction risk of superintelligent AI". PauseAI. Retrieved 29 October 2025.
↑ World Economic Forum (8 October 2024). "AI Value Alignment: Guiding Artificial Intelligence Towards Shared Human Goals". World Economic Forum. Retrieved 27 October 2025.
↑ Mitchell, Melanie (13 December 2022). "What Does It Mean to Align AI With Human Values?". Quanta Magazine. Retrieved 29 October 2025.
↑ Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan (2023). "AI Alignment: A Comprehensive Survey". arXiv:2310.19852 [cs.AI].
↑ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Scalable Oversight". AI Safety Atlas. Retrieved 29 October 2025. This document uses hyperlinked citations throughout the text. Each citation is directly linked to its source using HTML hyperlinks rather than traditional numbered references.
↑ Tegmark, Max; Omohundro, Steve (2023). "Provably safe systems: the only path to controllable AGI". arXiv:2309.01933 [cs.CY].
↑ Grey, Markov; Segerie, Charbel-Raphaël (2025). "Chapter 9 – Interpretability". AI Safety Atlas. Retrieved 29 October 2025.
↑ Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.
↑ Yampolskiy, Roman V. (2020). "On Controllability of AI". arXiv:2008.04071 [cs.CY].
↑ Varsha, P. S. (2023). "How can we manage biases in artificial intelligence systems – A systematic literature review". International Journal of Information Management Data Insights. 3 (1). doi:10.1016/j.jjimei.2023.100165. Retrieved 30 October 2025. Unknown parameter |article-number= ignored (help)
↑ Ferrara, Emilio (2024). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies". Sci. 6 (1): 3. doi:10.3390/sci6010003.
↑ Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed – Issue Note (Report). United Nations Environment Programme. September 2024. Retrieved 30 October 2025.
↑ Ren, Shaolei; Wierman, Adam (15 July 2024). ""The Uneven Distribution of AI's Environmental Impacts"". Harvard Business Review. Retrieved 30 October 2025.
↑ "Moral Status of Digital Minds". 80,000 Hours. Centre for Effective Altruism. 2023. Retrieved 30 October 2025.
↑ Shulman, Carl; Bostrom, Nick (2021). ""Sharing the World with Digital Minds"". In Steve Clarke; Hazem Zohny; Julian Savulescu. Rethinking Moral Status. Oxford University Press. pp. 306–326. doi:10.1093/oso/9780192894076.003.0018. ISBN 978-0-19-289407-6. Retrieved 30 October 2025. Search this book on
↑ Ren, Richard; Basart, Steven; Khoja, Adam (2024). "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?". arXiv:2407.21792 [cs.LG].
↑ Papagiannidis, Emmanouil; Mikalef, Patrick; Conboy, Kieran (2025). "Responsible artificial intelligence governance: A review and research framework". Journal of Strategic Information Systems. 34 (2): 101885. doi:10.1016/j.jsis.2024.101885. Retrieved 27 October 2025.
↑ Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew (2024). et al. "Managing extreme AI risks amid rapid progress …". Science. 384 (6698): 842–845. arXiv:2310.17688. Bibcode:2024Sci...384..842B. doi:10.1126/science.adn0117. PMID 38768279 Check |pmid= value (help). Retrieved 26 October 2025.
↑ "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". UK Government. 2 November 2023. Retrieved 29 October 2025.
↑ Recommendation on the Ethics of Artificial Intelligence (Programme and meeting document). Paris: UNESCO. 2022. SHS/BIO/PI/2021/1. Retrieved 27 October 2025.
↑ Coeckelbergh, Mark (2020). "Artificial Intelligence, Responsibility, and Moral Status". AI & Society. 35 (4): 1033–1040. doi:10.1007/s00146-019-00931-5 (inactive 30 October 2025). Retrieved 30 October 2025.

This article "AI Safety Unsolved Problems" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:AI Safety Unsolved Problems. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[auto1-1] Turchin, Alexey; Denkenberger, David (2018-05-03). "Classification of global catastrophic risks connected with artificial intelligence". AI & Society. 35 (1): 147–163. doi:10.1007/s00146-018-0845-5. ISSN 0951-5666. Unknown parameter |s2cid= ignored (help)

[2] Chin, Ze Shen (2025). "Dimensional Characterization and Pathway Modeling for Catastrophic AI Risks". arXiv:2508.06411 [cs.CY].

[3] Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. New York: Hachette Books. p. 468. ISBN 9780316484916. Retrieved 29 October 2025. Search this book on

[4] McLean, Scott; Read, Gemma J. M.; Thompson, Jason; Baber, Chris; Stanton, Neville A.; Salmon, Paul M. (2021). "The risks associated with artificial general intelligence: a systematic review". Journal of Experimental & Theoretical Artificial Intelligence. 35 (4): 1–17. doi:10.1080/0952813X.2021.1964003. Retrieved 29 October 2025.

[superintelligence-5] 5.0 ^5.1 Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112. Search this book on

[6] PauseAI. "The extinction risk of superintelligent AI". PauseAI. Retrieved 29 October 2025.

[7] World Economic Forum (8 October 2024). "AI Value Alignment: Guiding Artificial Intelligence Towards Shared Human Goals". World Economic Forum. Retrieved 27 October 2025.

[8] Mitchell, Melanie (13 December 2022). "What Does It Mean to Align AI With Human Values?". Quanta Magazine. Retrieved 29 October 2025.

[9] Ji, Jiaming; Qiu, Tianyi; Chen, Boyuan (2023). "AI Alignment: A Comprehensive Survey". arXiv:2310.19852 [cs.AI].

[10] Grey, Markov; Segerie, Charbel-Raphaël (2025). "Scalable Oversight". AI Safety Atlas. Retrieved 29 October 2025. This document uses hyperlinked citations throughout the text. Each citation is directly linked to its source using HTML hyperlinks rather than traditional numbered references.

[11] Tegmark, Max; Omohundro, Steve (2023). "Provably safe systems: the only path to controllable AGI". arXiv:2309.01933 [cs.CY].

[12] Grey, Markov; Segerie, Charbel-Raphaël (2025). "Chapter 9 – Interpretability". AI Safety Atlas. Retrieved 29 October 2025.

[13] Shlegeris, Buck; Greenblatt, Ryan (7 May 2024). "The case for ensuring that powerful AIs are controlled". Redwood Research Blog. Retrieved 30 October 2025.

[14] Yampolskiy, Roman V. (2020). "On Controllability of AI". arXiv:2008.04071 [cs.CY].

[15] Varsha, P. S. (2023). "How can we manage biases in artificial intelligence systems – A systematic literature review". International Journal of Information Management Data Insights. 3 (1). doi:10.1016/j.jjimei.2023.100165. Retrieved 30 October 2025. Unknown parameter |article-number= ignored (help)

[16] Ferrara, Emilio (2024). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies". Sci. 6 (1): 3. doi:10.3390/sci6010003.

[17] Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed – Issue Note (Report). United Nations Environment Programme. September 2024. Retrieved 30 October 2025.

[18] Ren, Shaolei; Wierman, Adam (15 July 2024). ""The Uneven Distribution of AI's Environmental Impacts"". Harvard Business Review. Retrieved 30 October 2025.

[19] "Moral Status of Digital Minds". 80,000 Hours. Centre for Effective Altruism. 2023. Retrieved 30 October 2025.

[20] Shulman, Carl; Bostrom, Nick (2021). ""Sharing the World with Digital Minds"". In Steve Clarke; Hazem Zohny; Julian Savulescu. Rethinking Moral Status. Oxford University Press. pp. 306–326. doi:10.1093/oso/9780192894076.003.0018. ISBN 978-0-19-289407-6. Retrieved 30 October 2025. Search this book on

[21] Ren, Richard; Basart, Steven; Khoja, Adam (2024). "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?". arXiv:2407.21792 [cs.LG].

[22] Papagiannidis, Emmanouil; Mikalef, Patrick; Conboy, Kieran (2025). "Responsible artificial intelligence governance: A review and research framework". Journal of Strategic Information Systems. 34 (2): 101885. doi:10.1016/j.jsis.2024.101885. Retrieved 27 October 2025.

[23] Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew (2024). et al. "Managing extreme AI risks amid rapid progress …". Science. 384 (6698): 842–845. arXiv:2310.17688. Bibcode:2024Sci...384..842B. doi:10.1126/science.adn0117. PMID 38768279 Check |pmid= value (help). Retrieved 26 October 2025.

[Bletchley-24] "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". UK Government. 2 November 2023. Retrieved 29 October 2025.

[25] Recommendation on the Ethics of Artificial Intelligence (Programme and meeting document). Paris: UNESCO. 2022. SHS/BIO/PI/2021/1. Retrieved 27 October 2025.

[26] Coeckelbergh, Mark (2020). "Artificial Intelligence, Responsibility, and Moral Status". AI & Society. 35 (4): 1033–1040. doi:10.1007/s00146-019-00931-5 (inactive 30 October 2025). Retrieved 30 October 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]