Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini

Document Type : Original Article

Authors

Lecturer, Al-Gazeera High Institute for Computer and Information Systems, Cairo, Egypt

Abstract

This study conducts a systematic comparison of ChatGPT-4 and Gemini in addressing academic queries across four disciplines: Python programming, financial accounting, business administration, and medical sciences. Through a mixed-methods analysis of 40 standardized questions (balanced between numerical and narrative formats), we evaluate the models' accuracy, reasoning capabilities, and limitations. Results reveal ChatGPT-4's superior performance with 82.5% overall accuracy (85% numerical, 80% narrative) versus Gemini's 68.8% (72.5% numerical, 65% narrative). While both models demonstrate competence in straightforward queries, ChatGPT-4 exhibits significantly better contextual interpretation and explanatory depth for complex narrative questions. Gemini, though faster in response generation, shows higher susceptibility to errors in technical domains. Notably, both systems face challenges in handling implicit assumptions, particularly in advanced accounting problems, where error rates reach 15-20%. These findings underscore ChatGPT-4's current advantage as an educational support tool while emphasizing the necessity of human oversight for quality control. The study contributes practical evaluation metrics and implementation guidelines for academic institutions adopting AI technologies.

Keywords

Main Subjects