Tag Archives: multimodal AI

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

Getty Images reader comments 25 with On Tuesday, Meta announced SeamlessM4T, a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for “up to 100 languages,” according to Meta. Its goal is to help people who… Read More »

Google’s PaLM-E is a generalist robot brain that takes commands

Enlarge / A robotic arm controlled by PaLM-E reaches for a bag of chips in a demonstration video. Google Research reader comments 12 with Share this story On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that… Read More »

Microsoft unveils AI model that understands image content, solves visual puzzles

Enlarge / An AI-generated image of an electronic brain with an eyeball. Ars Technica reader comments 88 with Share this story On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. The… Read More »