OpenAIs GPT-4 exhibits human-level performance on professional benchmarks

 OpenAIs GPT-4 exhibits human-level performance on professional benchmarks

generative pre-trained transformer news — OpenAIs GPT-4 exhibits human-level performance on professional benchmarks Multimodal AI model can process images and text, pass bar exams.

Benj Edwards – Mar 14, 2023 6:47 pm UTC EnlargeArs Technica reader comments 163 with Share this story Share on Facebook Share on Twitter Share on Reddit

On Tuesday, OpenAI announced GPT-4, a large multimodal model that can accept text and image inputs while returning text output that “exhibits human-level performance on various professional and academic benchmarks,” according to OpenAI. Also on Tuesday, Microsoft announced that Bing Chat has been running on GPT-4 all along. Further ReadingAI-powered Bing Chat loses its mind when fed Ars Technica article

If it performs as claimed, GPT-4 potentially represents the opening of a new era in artificial intelligence. “It passes a simulated bar exam with a score around the top 10% of test takers,” writes OpenAI in its announcement. “In contrast, GPT-3.5s score was around the bottom 10%.”

OpenAI plans to release GPT-4’s text capability through ChatGPT and its commercial API, but with a waitlist at first. GPT-4 is currently available to subscribers of ChatGPT Plus. Also, the firm is testing GPT-4’s image input capability with a single partner, Be My Eyes, an upcoming smartphone app that can recognize a scene and describe it.

Along with the introductory website, OpenAI also released a technical paper describing GPT-4’s capabilities and a system model card describing its limitations in detail. Enlarge / A screenshot of GPT-4’s introduction to ChatGPT Plus customers from March 14, 2023.Benj Edwards / Ars Technica

GPT stands for “generative pre-trained transformer,” and GPT-4 is part of a series of foundational language models extending back to the original GPT in 2018. Following the original release, OpenAI announced GPT-2 in 2019 and GPT-3 in 2020. A further refinement called GPT-3.5 arrived in 2022. In November, OpenAI released ChatGPT, which at that time was a fine-tuned conversational model based on GPT-3.5. Advertisement

AI models in the GPT series have been trained to predict the next token (a fragment of a word) in a sequence of tokens using a large body of text pulled largely from the Internet. During training, the neural network builds a statistical model that represents relationships between words and concepts. Over time, OpenAI has increased the size and complexity of each GPT model, which has resulted in generally better performance, model-over-model, compared to how a human would complete text in the same scenario, although it varies by task.

As far as tasks go, GPT-4’s performance is notable. As with its predecessors, it can follow complex instructions in natural language and generate technical or creative works, but it can do so with more depth: It supports generating and processing up to 32,768 tokens (around 25,000 words of text), which allows for much longer content creation or document analysis than previous models.

While analyzing GPT-4’s capabilities, OpenAI made the model take tests like the Uniform Bar Exam, the Law School Admission Test (LSAT), the Graduate Record Examination (GRE) Quantitative, and various AP subject tests. On many of the tasks, it scored at a human level. That means if GPT-4 were a person being judged solely on test-taking ability, it could get into law schooland likely many universities as well.

??Well this is something else.

GPT-4 passes basically every exam. And doesn’t just pass…
The Bar Exam: 90%
LSAT: 88%
GRE Quantitative: 80%, Verbal: 99%
Every AP, the SAT… Ethan Mollick (@emollick) March 14, 2023 Page: 1 2 Next → reader comments 163 with Share this story Share on Facebook Share on Twitter Share on Reddit Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. For over 16 years, he has written about technology and tech history for sites such as The Atlantic, Fast Company, PCMag, PCWorld, Macworld, How-To Geek, and Wired. In 2005, he created Vintage Computing and Gaming. He also hosted The Culture of Tech podcast and contributes to Retronauts. Mastodon: Twitter @benjedwards Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars