Tag Archives: large language models

AI coding assistant refuses to write code, tells user to learn programming instead

On Saturday, a developer using Cursor AI for a racing game project hit an unexpected roadblock when the programming assistant abruptly refused to continue generating code, instead offering some unsolicited career advice. According to a bug report on Cursor’s official forum, after producing approximately 750 to 800 lines of code (what the user calls “locs”),… Read More: AI coding assistant refuses to write code, tells user to… »

Anthropic CEO floats idea of giving AI a “quit job” button, sparking skepticism

Anthropic CEO Dario Amodei raised a few eyebrows on Monday after suggesting that advanced AI models might someday be provided with the ability to push a “button” to quit tasks they might find unpleasant. Amodei made the provocative remarks during an interview at the Council on Foreign Relations, acknowledging that the idea “sounds crazy.” “So… Read More: Anthropic CEO floats idea of giving AI a “quit job”… »

OpenAI pushes AI agent capabilities with new developer API

Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses. That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On… Read More: OpenAI pushes AI agent capabilities with new developer API »

Why extracting data from PDFs is still a nightmare for data experts

“The biggest [drawback] is that they are probabilistic prediction machines and will get it wrong in ways that aren’t just ‘that’s the wrong word’,” Willis explains. “LLMs will sometimes skip a line in larger documents where the layout repeats itself, I’ve found, where OCR isn’t likely to do that.” AI researcher and data journalist Simon… Read More: Why extracting data from PDFs is still a nightmare for… »

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained.

On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent—suggesting a leap in mathematical reasoning capabilities over the previous model. Benchmarks vs. real-world value Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling… Read More: What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan… »

Researchers surprised to find less-educated areas adopting AI writing tools faster

Corporate and diplomatic trends in AI writing According to the researchers, all sectors they analyzed (consumer complaints, corporate communications, job postings) showed similar adoption patterns: sharp increases beginning three to four months after ChatGPT’s November 2022 launch, followed by stabilization in late 2023. Organization age emerged as the strongest predictor of AI writing usage in… Read More: Researchers surprised to find less-educated areas adopting AI writing tools… »

New AI text diffusion models break speed barriers by pulling words from noise

These diffusion models maintain performance faster than or comparable to similarly sized conventional models. LLaDA’s researchers report their 8 billion parameter model performs similarly to LLaMA3 8B across various benchmarks, with competitive results on tasks like MMLU, ARC, and GSM8K. However, Mercury claims dramatic speed improvements. Their Mercury Coder Mini scores 88.0 percent on HumanEval… Read More: New AI text diffusion models break speed barriers by pulling… »

New hack uses prompt injection to corrupt Gemini’s long-term memory

[embedded content] Google Gemini: Hacking Memories with Prompt Injection and Delayed Tool Invocation. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account’s long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only… Read More: New hack uses prompt injection to corrupt Gemini’s long-term memory »

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download

Unlike conventional LLMs, these SR models take extra time to produce responses, and this extra time often increases performance on tasks involving math, physics, and science. And this latest open model is turning heads for apparently quickly catching up to OpenAI. For example, DeepSeek reports that R1 outperformed OpenAI’s o1 on several benchmarks and tests,… Read More: Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to… »

Nvidia unveils $3,000 desktop AI computer for home researchers

On Monday, Nvidia announced Project DIGITS, a small desktop computer aimed at researchers, data scientists, and students who want to experiment with AI models—such as chatbots like ChatGPT and image generators—at home. The $3,000 device, which contains Nvidia’s new GB10 Grace Blackwell Superchip, debuted at CES 2025 in Las Vegas. It will launch in May… Read More: Nvidia unveils $3,000 desktop AI computer for home researchers »