Kantian Fallibilist Ethics for AI alignment

Chaly, Vadim

doi:10.22034/jpiut.2024.62766.3837

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

تعداد نشریات	45
تعداد شماره‌ها	1,417
تعداد مقالات	17,426
تعداد مشاهده مقاله	56,191,644
تعداد دریافت فایل اصل مقاله	18,516,887

	Kantian Fallibilist Ethics for AI alignment
مجله پژوهش های فلسفی
مقاله 18، دوره 18، شماره 47، مرداد 1403، صفحه 303-318 اصل مقاله (793.03 K)
نوع مقاله: مقاله علمی- پژوهشی
شناسه دیجیتال (DOI): 10.22034/jpiut.2024.62766.3837
نویسنده
Vadim Chaly^*
Lomonosov Moscow State University, Immanuel Kant Baltic Federal University, Russia
چکیده
The problem of AI alignment has parallels in Kantian ethics and can benefit from its concepts and arguments. The Kantian framework allows us to better answer the question of what exactly AI is being aligned to, what are the problems of alignment of rational agents in general, and what are the prospects for achieving a state of alignment. Having described the state of discussions about alignment in AI, I will reformulate them in Kantian terms. Thus, the process of alignment is captured by the concept of enlightenment, and for the final state of alignment in Kant’s lexicon there is the concept of the “kingdom of ends.” I will argue that the discourse of alignment and the Kantian ethical program 1) are devoted to the same general end of harmonizing the thinking and acting of rational agents, 2) encounter similar difficulties, well known in the Kantian discussions with its comparatively longer history, and 3) for a number of reasons lying on the side of humanity, do not have and, despite the hopes and attitudes of some participants in the AI discussions, will not have a theoretically rigorous, harmonious and practically implementable, conflict-free solution – alignment will remain a regulative idea in the Kantian sense, but will not become a reality.
کلیدواژه‌ها
AI alignment؛ moral deliberation؛ moral fallibilism specification gaming؛ kingdom of ends؛ categorical imperative؛ misgeneralization

مراجع
Baumann, M. (2019). Consequentializing and Underdetermination. Australasian Journal of Philosophy, 97 (3), 511–27. https://doi.org/10.1080/00048402.2018.1501078 Baumann, M. (2022). Moral Underdetermination and a New Skeptical Challenge. Synthese 200 (3), 208. https://doi.org/10.1007/s11229-022-03529-w Bennett, M. R., & Hacker. P. M. S. (2021). Philosophical Foundations of Neuroscience. John Wiley & Sons. Future of Life Institute. Asilomar AI Principles. Future of Life Institute (blog). https://futureoflife.org/open-letter/ai-principles/ Gabriel, I. (2020). Artificial Intelligence, Values, and Alignment. Minds and Machines, 30 (3), 411–37. https://doi.org/10.1007/s11023-020-09539-2 Grier, M. (2001). Kant’s Doctrine of Transcendental Illusion. Cambridge University Press. Hanna, R. & Michelle M. (2009). Embodied Minds in Action. Oxford University Press. Hegel, G. W. F. (1991). Elements of the Philosophy of Right. Edited by A W. Wood. Translated by H. B. Nisbet. Cambridge University Press. Herman, B. (1993). The Practice of Moral Judgment. Harvard University Press. Ji, & et al. (2024). AI Alignment: A Comprehensive Survey. arXiv. http://arxiv.org/abs/2310.19852 Kant, I. (1996). Practical Philosophy. Edited & translated by M. J. Gregor. Cambridge University Press. Kim, H. & Dieter S. (eds). (2022). Kant and Artificial Intelligence. Walter de Gruyter GmbH & Co KG. Klemperer, V. (2013). Language of the Third Reich. Bloomsbury Academic. Koons, R. C. (2022). Defeasible Reasoning. In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2022/entries/reasoning-defeasible/ Langosco, & et al. (2022). Goal Misgeneralization in Deep Reinforcement Learning. In Proceedings of the 39^th International Conference on Machine Learning, 12004–19. PMLR. https://proceedings.mlr.press/v162/langosco22a.html Leike, & et al. (2018). Scalable Agent Alignment via Reward Modeling: A Research Direction. arXiv. https://doi.org/10.48550/arXiv.1811.07871 MacIntyre, A. C. (1966) A Short History of Ethics. Macmillan. MacIntyre, A. C. (1988). Whose Justice? Which Rationality? University of Notre Dame Press. Massimi, M. (2017). What Is This Thing Called ‘Scientific Knowledge? – Kant on Imaginary Standpoints and the Regulative Role of Reason. Kant Yearbook 9 (1), 63–84. https://doi.org/10.1515/kantyb-2017-0004 Massimi, M. (2018). Points of View: Kant on Perspectival Knowledge. Synthese 198 (S13), 3279–96. https://doi.org/10.1007/s11229-018-1876-7 Muchnik, P. (2019). Laura Papish, Kant on Evil, Self-Deception, and Moral Reform, Oxford University Press, 2018 pp. Xvii + 280 Isbn 9780190692100 $85.00.” Kantian Review 24 (2), 316–22. https://doi.org/10.1017/s1369415419000104 O’Neill, O. (2013). Acting on Principle: An Essay on Kantian Ethics. 2^nd edition, Cambridge University Press. Papish, L. (2018). Kantian Self-Deception. In Kant on Evil, Self-Deception, and Moral Reform, edited by Laura Papish, Oxford University Press. https://doi.org/10.1093/oso/9780190692100.003.0004 Papyshev, G. & Migliorini, S. (2024). Developing a Liability Framework for Harms Arising out of Specification Gaming. In. https://openreview.net/forum?id=pU9QUQGsuc. Rawls, J. & Herman, B. (2000). Lectures on the History of Moral Philosophy. Harvard University Press. Rawls, J. (1989). Themes in Kant’s Moral Philosophy. In Kant’s Transcendental Deductions: The Three Critiques and the Opus Postumum, 80–113. Stanford University Press. Recanati, F. (2007). Perspectival Thought: A Plea for (Moderate) Relativism. Clarendon Press. Sneddon, A. (2011). A New Kantian Response to Maxim-Fiddling. Kantian Review 16 (1): 67–88. https://doi.org/10.1017/s1369415410000087 Sticker, M. (2019). Kant, Moral Overdemandingness and Self-Scrutiny. Noûs n/a (n/a): 1–24. https://doi.org/10.1111/nous.12308 Sticker, M. (2017). When the Reflective Watch-Dog Barks: Conscience and Self-Deception in Kant.” Journal of Value Inquiry 51 (1), 85–104. https://doi.org/10.1007/s10790-016-9559-4 Timmons, M. (2017). Significance and System: Essays in Kant’s Ethics. Oxford University Press. Wood, A. W. (2006). The Supreme Principle of Morality. In The Cambridge Companion to Kant and Modern Philosophy, Edited by P. Guyer, 342–80. Cambridge University Press. Чалый, В. А. (2022) К кантианскому моральному фаллибилизму: недоопределенность в рассуждениях по первой формуле категорического императива. Вестник Московского Университета. Серия 7. Философия 1, 105–14.
آمار تعداد مشاهده مقاله: 442 تعداد دریافت فایل اصل مقاله: 418

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

آمار

Kantian Fallibilist Ethics for AI alignment