Leading AI models excel at basis tasks but lack scientific reasoning: Study

14 Oct 2025 14:55 IST

New Update

New Delhi, Oct 14 (PTI) Leading Artificial Intelligence (AI) models excel at basic tasks but lack scientific reasoning, researchers from Indian Institute of Technology (IIT)Delhi (IIT Delhi) and Friedrich Schiller University Jena (FSU Jena), Germany have found.

The research published in "Nature Computational Science", revealed that while leading AI models show promise in basic scientific tasks, they exhibit fundamental limitations that could pose risks if deployed without proper oversight in research environments.

The research team, led by NM Anoop Krishnan, associate professor at IIT Delhi, and Kevin Maik Jablonka, professor at FSU Jena, has developed "MaCBench" -- the first comprehensive benchmark specifically designed to evaluate how vision-language models handle real-world chemistry and materials science tasks. The results revealed a striking paradox.

While AI models achieved near-perfect performance in basic perception tasks like equipment identification, they struggled significantly with spatial reasoning, cross-modal information synthesis, and multi-step logical inference -- capabilities essential for genuine scientific discovery.

"Our findings represent a crucial reality check for the scientific community. While these AI systems show remarkable capabilities in routine data processing tasks, they are not yet ready for autonomous scientific reasoning. The strong correlation we observed between model performance and internet data availability suggests these systems may be relying more on pattern matching than genuine scientific understanding," Krishnan told PTI.

Krishnan explained that one of the most concerning findings emerged from laboratory safety assessments.

"While models excelled at identifying laboratory equipment with 77 pc accuracy, they performed poorly when evaluating safety hazards in similar laboratory setups, achieving only 46 pc accuracy. This disparity between equipment recognition and safety reasoning is particularly alarming," said Kevin Maik Jablonka.

"It suggests that current AI models cannot bridge the gaps in tacit knowledge that are crucial for safe laboratory operations. Scientists must understand these limitations before integrating AI into safety-critical research environments," he added.

The research team's innovative approach included extensive ablation studies that isolated specific failure modes. They discovered that models performed substantially better when identical information was presented as text rather than images, indicating incomplete multimodal integration –- a fundamental requirement for scientific work.

The study's implications extend far beyond chemistry and materials science, suggesting broader challenges for AI deployment across scientific disciplines. It indicates that developing reliable AI scientific assistants will require fundamental advances in training methodologies that emphasize genuine understanding over pattern matching.

"Our work provides a roadmap for both the capabilities and limitations of current AI systems in science. While these models show promise as assistive tools for routine tasks, human oversight remains essential for complex reasoning and safety-critical decisions. The path forward requires better uncertainty quantification and frameworks for effective human-AI collaboration," said Indrajeet Mandal, IIT Delhi PhD scholar. PTI GJS GJS MG MG MG