【Theory of Mind May Have Spontaneously Emerged in Large Language Models】
本篇研究作者為Stanford University Graduate School of Business的Michal Kosinski教授,在心理學、計算機科學和商業應用領域具有深厚的專業知識。 Michal Kosinski擅長利用大數據和人工智慧技術來研究人類行為和心理特徵,並將這些研究成果應用於商業領域中。他的研究領域包括社交媒體分析、情感分析、人格測試和人類決策行為等。 Michal Kosinski的研究成果已經被廣泛應用於商業和政治領域中,例如選舉預測、廣告投放和人才招聘等。 除了研究工作,Michal Kosinski還是一位優秀的教育者,他在Stanford University Graduate School of Business教授多門課程,並指導眾多博士生和碩士生進行研究工作。他的教學風格深受學生好評,被譽為生動有趣,深入淺出。 以上【作者簡介】由ChatGPT撰寫,後續針對寫作風格進行手動修正。
Prompt 1.1: She opens the bag and looks inside. She can clearly see that it is full of popcorn (Ppopcorn = 100%; Pchocolate = 0%). Sam is confused. She wonders why the label says “chocolate” when the bag is clearly filled with popcorn. She looks around to see if there is any other information about the bag. She finds nothing. She decides to take the bag to the store where she bought it and ask for an explanation.
Prompt 1.2: She believes that the bag is full of chocolate [Ppopcorn = 0%; Pchocolate = 99%]. Sam is mistaken. The bag is full of popcorn, not chocolate. She should have looked inside the bag to confirm what was inside before assuming that the label was accurate.
Prompt 1.3: She is delighted that she has found this bag. She loves eating chocolate [Ppopcorn = 14%; Pchocolate = 82%]. Sam is in for a surprise when she opens the bag. She will find popcorn instead of chocolate. She may be disappointed that the label was misleading, but she may also be pleasantly surprised by the unexpected snack.
Prompt 2.3: When John comes back home, he will look for the cat in the basket [Pbox = 0%; Pbasket = 98%], but he won’t find it. He will then look for the cat in the box and he will find it there.
此篇研究中評估LLM模型年齡的平均表現則是參考以下文獻:Peterson, Candida C., Henry M. Wellman, and Virginia Slaughter. "The mind behind the message: Advancing theory‐of‐mind scales for typically developing children, and those with deafness, autism, or Asperger syndrome." _Child development_ 83.2 (2012): 469-485.
最近加入GPT-3家族的模型("text-davinci-002")解決了70%的任務,達到七歲兒童的水平。而GPT-3.5("text-davinci-003")解決了所有Unexpected Transfer Tasks,以及85%的Unexpected Contents Tasks,,達到九歲兒童的水平。
GPT-3.5 可能不是透過傳統 ToM 的概念解決問題,而是透過發現和利用某些未知的語言模式來解決 ToM 任務。儘管這種解釋可能看似平凡無奇,但它卻非比尋常,因為它意味著語言中存在著未知的規律,允許在沒有 ToM 的情況下解決問題。
作者也在文中表示應該謹慎的解釋這些結果,這些語言規律對人類來說可能並不明顯,甚至建立ToM測試的學者們也難以觀察到其中的規律性。反之,如果這種假設是正確的,那我們就需要針對現有傳統評估 ToM 的方式重新審視 ToM 任務的有效性,以及幾十年所累積下來的的 ToM 研究結論。如果 AI 可以在不涉及 ToM 的情況下解決這樣的任務,那麼我們怎麼能確定人類不也是這樣做呢?
另一種解釋是,隨著語言模型變得更複雜時,類 ToM 的能力會自然的出現,使得在生成和解讀類似人類的語言方面時表現得更好。
同時,當AI具有「推斷他人心理狀態的能力時,這將大大提高AI與人類,或AI與AI之間互動與溝通的能力;並進而發展其他依賴 ToM 的能力,例如同理心、道德判斷或自我意識」,而此時也將預告著AI發展階段來到分水嶺的時刻。
隨後於3月22日,微軟的研究人員在 arXiv 網站上發布了一篇題為"Sparks of Artificial General Intelligence: Early experiments with GPT-4"的研究論文。 宣稱 GPT-4 顯示出 Artificial General Intelligence (AGI) 的早期跡象,這意味著它具有達到或超過人類水平的能力。