简体 | 繁体
loading...
海外博客
    • 首页
    • 新闻
    • 读图
    • 财经
    • 教育
    • 家居
    • 健康
    • 美食
    • 时尚
    • 旅游
    • 影视
    • 博客
    • 群吧
    • 论坛
    • 电台
  • 热点
  • 原创
  • 时政
  • 旅游
  • 美食
  • 家居
  • 健康
  • 财经
  • 教育
  • 情感
  • 星座
  • 时尚
  • 娱乐
  • 历史
  • 文化
  • 社区
  • 帮助
您的位置: 文学城 » 博客 »GPT-4 passed the Turing test the imitation game\'Can machines th

GPT-4 passed the Turing test the imitation game\'Can machines th

2023-07-26 12:59:25

TJKCB

TJKCB
宁静纯我心 感得事物人 写朴实清新. 闲书闲话养闲心,闲笔闲写记闲人;人生无虞懂珍惜,以沫相濡字字真。
首页 文章页 文章列表 博文目录
给我悄悄话
打印 被阅读次数

GPT-4 passed the Turing test 

GPT-4 aced most of them,including reading comprehension, mathematics and coding,OpenAI reported4. 

the Turing test the imitation game"Can machines think’human judges,  to evaluate performance on specific capabilities, such as language ability, common-sense reasoning and mathematical capacity. Increasingly, teams are also turning to academic and professional examinations designed for people.

It’s the kind of game that researchers familiar with LLMs could probably still win, however. Chollet says he’d find it easy to detect an LLM — by taking advantage of known weaknesses of the systems. “If you put me in a situation where you asked me, ‘Am I chatting to an LLM right now?’ I would definitely be able to tell you,” says Chollet.

The key, he says, is to take the LLM outside of its comfort zone. He suggests presenting it with scenarios that are variations on ones the LLM will have seen a lot in its training data. In many cases, the LLM answers by spitting out words that are most likely to be associated with the original question in its training data, rather than by giving the correct answer to the new scenario.

** 

The company also set GPT-4 around 30 exams, including: various subject-specific tests designed for US high-school students, known as Advanced Placement; an exam to assess the current state of US physicians’ clinical knowledge; and a standard test used in the selection process for US graduate studies, called the GRE. In the Uniform Bar Exam, which forms part of the qualification process for lawyers in many US states, GPT-4 attained a score that would place it in the top 10% of people, OpenAI reported (see ‘AI system performance — selected results’).

 

The world’s best artificial intelligence (AI) systems can pass tough exams, write convincingly human essays and chat so fluently that many find their output indistinguishable from people’s. What can’t they do? Solve simple visual logic puzzles.

In a test consisting of a series of brightly coloured blocks arranged on a screen, most people can spot the connecting patterns. But GPT-4, the most advanced version of the AI system behind the chatbot ChatGPT and the search engine Bing, gets barely one-third of the puzzles right in one category of patterns and as little as 3% correct in another, according to a report by researchers this May1.

https://www.nature.com/articles/d41586-023-02361-7?

  • NEWS FEATURE
  • 25 July 2023

ChatGPT broke the Turing test — the race is on for new ways to assess AI

Large language models mimic human chatter, but scientists disagree on their ability to reason.
  • Celeste Biever  *** https://www.nature.com/articles/d41586-023-02366-2 ** Researchers whose first language is not English can spend around twice as long reading an English-language scientific journal article as native speakers. For a PhD student working on their thesis, that can mean spending up to 19 additional working days per year just reading papers. https://www.nature.com/articles/d41586-023-02320-2?   

    The team found that among scientists who had published only one paper in English, those from countries with generally low English proficiency spent a median of 29.8% more time writing it than did native speakers; those from countries with moderate English proficiency spent a median of 50.6% more time. Similarly, the researchers found that those from countries with generally low English proficiency spend a median of 90.8% more time reading scientific articles than do native speakers. They also learnt that non-native speakers spend more time preparing to give oral presentations at international conferences, and that many avoid this type of commitment owing to language barriers.

    Amano, who is Japanese, says he has always struggled to communicate in English. After many years working in the United Kingdom and Australia, his English is improving, and people might think his papers are similar to those written by a native English speaker. “But behind the scenes, I have to spend so much time to reach that level,” he says. That extra effort is exactly what he wanted to quantify in this study.  

    Heightened rejection

    Amano and his colleagues also examined the peer-review process. Non-native English speakers reported having their papers rejected specifically because of writing issues 2.5 times as often as native speakers. That sounds familiar to Lina Pérez-Angel, a Colombian palaeoclimatologist at Brown University in Providence, Rhode Island. “I have had reviewers that explicitly said that my English puts in doubt the quality of the research, or mostly gave me feedback on my English in a harsh way that made me think it was based on my Latinx/Hispanic-sounding last name,” she says.

    Conferences could consider allowing researchers to present in their native language, using a translator, and could publish abstracts in multiple languages. “Non-native English speakers constitute almost 95% of the world’s population,” Amano says. “If we don’t support those 95%, I’m sure we can’t solve many global challenges.”

    Nature 619, 678-679 (2023)

    doi: https://doi.org/10.1038/d41586-023-02320-2

 

  •  
登录后才可评论.
  • 文学城简介
  • 广告服务
  • 联系我们
  • 招聘信息
  • 注册笔名
  • 申请版主
  • 收藏文学城

WENXUECITY.COM does not represent or guarantee the truthfulness, accuracy, or reliability of any of communications posted by other users.

Copyright ©1998-2025 wenxuecity.com All rights reserved. Privacy Statement & Terms of Use & User Privacy Protection Policy

今日热点

  • 以伊战争已经进入垃圾时间2020的冬天
  • 2025回国 (15)尽显低调的奢华。。。。。(多图)菲儿天地
  • 美中购物, 咱的农民情结BeijingGirl1
  • 中国女孩的“哈佛八股”,为什么惹了众怒sandstone2
  • 老照片一组毛囡
  • Labubu富人的游戏,穷人看不懂的魔幻剧。旧山老松
  • 哈马斯版的珍珠港事件蓝天白云915LQB
  • 伊朗真正的王牌,以色列永远无法破坏流浪枪手
  • 别说我从来没给你写过情书铃兰听风
  • 工资单惊到我的房客搬走,一片狼籍山里人家168
  • 希腊第二大城市塞萨洛尼基和亚历山大大帝威伯
  • 【走向罗马】D36:温柔插曲抵卢卡三步两桥
  • 看电视连续剧《燃罪》晓青
  • 穿梭中美之间的生活体会 司徒Kwseeto

一周热点

  • 如果能一直活在四十岁多伦多橄榄树
  • 分享一下控糖减磅的理论与实践如山
  • 意淫的夏草: 吸精无底洞BeijingGirl1
  • 回国生活:如愿以偿后,逃回英国了我生活着
  • 买哪天的机票最便宜?谦谦美君子
  • 父亲的爱: 孝勤豁达,远离烂人烂事康赛欧
  • 今夏我那跑来跑去的行程:日本篇土笋冻
  • 当退休的时候,你富的流油了吗?mychina
  • 一个大骗子的应有下场!菲儿天地
  • 以伊战争已经进入垃圾时间2020的冬天
  • 日裔美国人集中营系列16——结束语(下)FrankTruce1
  • 没想到旷世奇才,竟是世俗渣男麦姐
  • 你算老几,也配当汉奸sandstone2
  • 这种房子不能买,太难卖!山里人家168
GPT-4 passed the...
切换到网页版
TJKCB

TJKCB

GPT-4 passed the Turing test the imitation game\'Can machines th

TJKCB (2023-07-26 12:59:25) 评论 (0)

GPT-4 passed the Turing test 

GPT-4 aced most of them,including reading comprehension, mathematics and coding,OpenAI reported4. 

the Turing test the imitation game"Can machines think’human judges,  to evaluate performance on specific capabilities, such as language ability, common-sense reasoning and mathematical capacity. Increasingly, teams are also turning to academic and professional examinations designed for people.

It’s the kind of game that researchers familiar with LLMs could probably still win, however. Chollet says he’d find it easy to detect an LLM — by taking advantage of known weaknesses of the systems. “If you put me in a situation where you asked me, ‘Am I chatting to an LLM right now?’ I would definitely be able to tell you,” says Chollet.

The key, he says, is to take the LLM outside of its comfort zone. He suggests presenting it with scenarios that are variations on ones the LLM will have seen a lot in its training data. In many cases, the LLM answers by spitting out words that are most likely to be associated with the original question in its training data, rather than by giving the correct answer to the new scenario.

** 

The company also set GPT-4 around 30 exams, including: various subject-specific tests designed for US high-school students, known as Advanced Placement; an exam to assess the current state of US physicians’ clinical knowledge; and a standard test used in the selection process for US graduate studies, called the GRE. In the Uniform Bar Exam, which forms part of the qualification process for lawyers in many US states, GPT-4 attained a score that would place it in the top 10% of people, OpenAI reported (see ‘AI system performance — selected results’).

 

The world’s best artificial intelligence (AI) systems can pass tough exams, write convincingly human essays and chat so fluently that many find their output indistinguishable from people’s. What can’t they do? Solve simple visual logic puzzles.

In a test consisting of a series of brightly coloured blocks arranged on a screen, most people can spot the connecting patterns. But GPT-4, the most advanced version of the AI system behind the chatbot ChatGPT and the search engine Bing, gets barely one-third of the puzzles right in one category of patterns and as little as 3% correct in another, according to a report by researchers this May1.

https://www.nature.com/articles/d41586-023-02361-7?

  • NEWS FEATURE
  • 25 July 2023

ChatGPT broke the Turing test — the race is on for new ways to assess AI

Large language models mimic human chatter, but scientists disagree on their ability to reason.
  • Celeste Biever  *** https://www.nature.com/articles/d41586-023-02366-2 ** Researchers whose first language is not English can spend around twice as long reading an English-language scientific journal article as native speakers. For a PhD student working on their thesis, that can mean spending up to 19 additional working days per year just reading papers. https://www.nature.com/articles/d41586-023-02320-2?   

    The team found that among scientists who had published only one paper in English, those from countries with generally low English proficiency spent a median of 29.8% more time writing it than did native speakers; those from countries with moderate English proficiency spent a median of 50.6% more time. Similarly, the researchers found that those from countries with generally low English proficiency spend a median of 90.8% more time reading scientific articles than do native speakers. They also learnt that non-native speakers spend more time preparing to give oral presentations at international conferences, and that many avoid this type of commitment owing to language barriers.

    Amano, who is Japanese, says he has always struggled to communicate in English. After many years working in the United Kingdom and Australia, his English is improving, and people might think his papers are similar to those written by a native English speaker. “But behind the scenes, I have to spend so much time to reach that level,” he says. That extra effort is exactly what he wanted to quantify in this study.  

    Heightened rejection

    Amano and his colleagues also examined the peer-review process. Non-native English speakers reported having their papers rejected specifically because of writing issues 2.5 times as often as native speakers. That sounds familiar to Lina Pérez-Angel, a Colombian palaeoclimatologist at Brown University in Providence, Rhode Island. “I have had reviewers that explicitly said that my English puts in doubt the quality of the research, or mostly gave me feedback on my English in a harsh way that made me think it was based on my Latinx/Hispanic-sounding last name,” she says.

    Conferences could consider allowing researchers to present in their native language, using a translator, and could publish abstracts in multiple languages. “Non-native English speakers constitute almost 95% of the world’s population,” Amano says. “If we don’t support those 95%, I’m sure we can’t solve many global challenges.”

    Nature 619, 678-679 (2023)

    doi: https://doi.org/10.1038/d41586-023-02320-2

 

  •