This study conducts a systematic comparison of ChatGPT-4 and Gemini in addressing academic queries across four disciplines: Python programming, financial accounting, business administration, and medical sciences. Through a mixed-methods analysis of 40 standardized questions (balanced between numerical and narrative formats), we evaluate the models' accuracy, reasoning capabilities, and limitations. Results reveal ChatGPT-4's superior performance with 82.5% overall accuracy (85% numerical, 80% narrative) versus Gemini's 68.8% (72.5% numerical, 65% narrative). While both models demonstrate competence in straightforward queries, ChatGPT-4 exhibits significantly better contextual interpretation and explanatory depth for complex narrative questions. Gemini, though faster in response generation, shows higher susceptibility to errors in technical domains. Notably, both systems face challenges in handling implicit assumptions, particularly in advanced accounting problems, where error rates reach 15-20%. These findings underscore ChatGPT-4's current advantage as an educational support tool while emphasizing the necessity of human oversight for quality control. The study contributes practical evaluation metrics and implementation guidelines for academic institutions adopting AI technologies.
Embark, A., & Amin, Y. (2025). Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini. Journal of Artificial Intelligence in Engineering Practice, 2(1), 17-30. doi: 10.21608/jaiep.2025.395670.1017
MLA
Asmaa Saeed Embark; Yassmeen Ali Amin. "Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini", Journal of Artificial Intelligence in Engineering Practice, 2, 1, 2025, 17-30. doi: 10.21608/jaiep.2025.395670.1017
HARVARD
Embark, A., Amin, Y. (2025). 'Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini', Journal of Artificial Intelligence in Engineering Practice, 2(1), pp. 17-30. doi: 10.21608/jaiep.2025.395670.1017
VANCOUVER
Embark, A., Amin, Y. Evaluating AI Performance in Academic Settings: A Comparative Study of ChatGPT-4 and Gemini. Journal of Artificial Intelligence in Engineering Practice, 2025; 2(1): 17-30. doi: 10.21608/jaiep.2025.395670.1017