Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations
MAIO 137 Neuhouser PDF

Supplementary Files

MAIO 137 Neuhouser Appendix


How to Cite

Neuhouser AJ, Kamboj A, Mokhtarzadeh A, Harrison AR. Artificial intelligence in practice: measuring its medical accuracy in oculoplastics consultations. MAIO [Internet]. 2024 May 10 [cited 2024 Jul. 25];6(1):1-11. Available from:

Copyright notice

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2024 Adam J. Neuhouser, Alisha Kamboj, Ali Mokhtarzadeh, Andrew R. Harrison


artificial intelligence; Chat GPT; DALLE; oculoplastics; patient information


Purpose: The aim of this study was to investigate the medical accuracy of responses produced by Chat Generative Pretrained Transformer 4 (Chat GPT-4) and DALLE-2 in relation to common questions encountered during oculoplastic consultations.

Methods: The 5 most frequently discussed oculoplastic procedures on social media were selected for evaluation using Chat GPT-4 and DALLE-2. Questions were formulated from common patient concerns and inputted into Chat GPT-4, and responses were assessed on a 3-point scale. For procedure imagery, descriptions were submitted to DALLE-2, and the resulted images were graded for anatomical and surgical accuracy. Grading was completed by 5 oculoplastic surgeons through a 110-question survey.

Results: Overall, 87.3% of Chat GPT-4’s responses achieved a score of 2 or 3 points, denoting a good to high level of accuracy. Across all procedures, questions about pain, bruising, procedure risk, and adverse events garnered high scores. Conversely, responses regarding specific case scenarios, procedure longevity, and procedure
definitions were less accurate. Images produced by DALLE-2-were notably subpar, often failing to accurately depict surgical outcomes and realistic details.

Conclusions: Chat GPT-4 demonstrated a creditable level of accuracy in addressing common oculoplastic procedure concerns. However, its limitations in handling case-based scenarios suggests that it is best suited as a supplementary source of information rather than a primary diagnostic or consultative tool. The current state of medical imagery generated by means of artificial intelligence lacks anatomical accuracy. Significant technological advancements are necessary before such imagery can complement oculoplastic consultations effectively.
MAIO 137 Neuhouser PDF


Akosman S, Qi L, Pakhchanian H, Foos W, Maliakkal J, Raiker R, Belyea DA, Geist C. Using infodemiology metrics to assess patient demand for oculoplastic surgeons in the United States: insights from Google Search Trends. Orbit. 2022 Nov 12;1-7.

Cohen SA, Tijerina JD, Kossler A. The Readability and Accountability of Online Patient Education Materials Related to Common Oculoplastics Diagnoses and Treatments. Semin Ophthalmol. 2023;38(4):387-393.

Chen J, Wang Y. Social Media Use for Health Purposes: Systematic Review. Journal of Medical Internet Research. 2021;23(5):e17917.

Arab K, Barasain O, Altaweel A, et al. Influence of Social Media on the Decision to Undergo a Cosmetic Procedure. Plastic and Reconstructive Surgery Global Open. 2019;7(8):e2333.

Schmuter G, North VS, Kazim M, Tran AQ. Medical Accuracy of Patient Discussions in Oculoplastic Surgery on Social Media. Ophthalmic Plastic and Reconstructive Surgery. 2023;39(2):132-135.

Bartz D, Bartz D. As ChatGPT’s popularity explodes, U.S. lawmakers take an interest. Reuters. 2023 Feb 13.

Nayak LM, Linkov G. Social Media Marketing in Facial Plastic Surgery: What Has Worked? Facial Plastic Surgery Clinics of North America. 2019;27(3):373-377.

DATAtab Team. Cite DATAtab: DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria; 2023. Jaccard Similarity Coefficient Algorithm Online Tool. [Accessed September 5, 2023]. Available from:

Mago J, Sharma M. The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus.2023;15(7):e42133.

Hu X, Ran AR, Nguyen TX, et al. What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study. Ophthalmology Therapy. 2023.

Lahat A, Shachar E, Avidan B, et al. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep. 2023;13:4164.

Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic and Physiological Optics. 2023;43(6):1562-1570.

Samaan JS, Yeo YH, Rajeev N, Hawley L, Abel S, Ng WH, Samakar K. Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery. Obesity Surgery. 2023;33(6):1790-1796.

Johnson D, Goodman R, Patrinely J, et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Preprint. Research Square. 2023 Feb 28.

Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model. JAMA. 2023;329(10):842-844.

Karako K, Song P, Chen Y, Tang W. New Possibilities for Medical Support Systems Utilizing Artificial Intelligence (AI) and Data Platforms. Bioscience Trends. 2023;17(3):186-189.

MAIO 137 Neuhouser PDF