AI Detection in Formative Assessment: A comprehensive experimental study across various subjects

As part of this study, I solved more than 100 numerical questions (approximately 100 ± 10) using different AI tools.
Artificial Intelligence (AI) tools such as ChatGPT Pro, Gemini Pro, and Perplexity have become extremely popular among engineering students. They are widely used for explanations, quick problem-solving, and revision. As these tools become more integrated into student life, it becomes crucial to understand how reliable they usually are, especially for formative assessments that require numerical precision, diagram interpretation, fatigue analysis, and table-dependent calculations.
In this study, I evaluated how well different AI platforms could solve real quiz questions from MAS236–Machine Elements & FEM, MAS237–Actuation Systems, and MAS-178– Mathematics. The objective was not just to measure accuracy but to understand why AI answers go wrong, and whether prompt quality or input structure changes the correctness.
This Blog represents the motivation, methodology, scenarios, results, and insights from my experiment.
Formative assessments help students learn—not merely score marks. But with AI now capable of solving complex questions instantly, a major concern appears:
If students rely on AI for numerical engineering problems, will they usually learn the fundamental concepts?
Mechanical Engineering subjects, especially Machine Design, Actuation Systems, Mathematics require:
• precise reading of engineering figures
• correct use of standard textbook tables
• accurate application of fatigue and stress formulas
• decimal-level numerical accuracy
• understanding of assumptions behind formulas
These elements make Machine Design a good field for testing AI accuracy.
My experiment helps clarify whether AI can truly replace the analytical thinking needed for Machine Design or whether it remains only a supporting tool for understanding.
The following AI tools were tested: • ChatGPT Pro • Gemini Pro • Perplexity AI
Each AI was given the same questions under controlled conditions to compare consistency in reasoning and numerical accuracy across the three subjects
Three structured testing scenarios were used:
Scenario A — All questions at once Outcome:
Steps correct, many numerical answers incorrect.
Scenario B — One question at a time + access to all diagrams Outcome: Correct diagram interpretation but still incorrect final values.
Scenario C — One question + only its specific diagram Outcome: Better clarity, but numerical
“Double check calculations”, “Use correct fatigue formulas.” Outcome: Improved reasoning but no improvement in numerical accuracy Across all subjects and scenarios, AI tools failed to consistently match exact quiz answers.
AI didn’t give exact answers for those question that require the need of diagrams or values from the tables from text books.The questions that were solved directly using formulas gave correct answers in most ai paltofrms