Tag Archives: multimodal AI

ChatGPT update enables its AI to “see, hear, and speak,“ according to OpenAI

reader comments 42 with On Monday, OpenAI announced a significant update to ChatGPT that enables its GPT-3.5 and GPT-4 AI models to analyze images and react to them as part of a text conversation. Also, the ChatGPT mobile app will add speech synthesis options that, when paired with its existing speech recognition features, will enable… Read More »

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

Getty Images reader comments 25 with On Tuesday, Meta announced SeamlessM4T, a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for “up to 100 languages,” according to Meta. Its goal is to help people who… Read More »

Google’s PaLM-E is a generalist robot brain that takes commands

Enlarge / A robotic arm controlled by PaLM-E reaches for a bag of chips in a demonstration video. Google Research reader comments 12 with Share this story On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that… Read More »

Microsoft unveils AI model that understands image content, solves visual puzzles

Enlarge / An AI-generated image of an electronic brain with an eyeball. Ars Technica reader comments 88 with Share this story On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. The… Read More »