Artificial Intelligence in Inflammatory Bowel Disease: A Pilot Study on ChatGPT and Copilot's Impact on Health Literacy in Ulcerative Colitis and Crohn's Disease
Background:
Approximately 3 million Americans have inflammatory bowel disease (IBD), but health literacy among these patients is notably low, with rates as low as 24%. We attempted to increase awareness of IBD by using artificial intelligence (AI) to improve the understanding of the vast online information. AI is increasingly becoming a key resource for accessing information. Chat Generative Pre-trained Transformer (ChatGPT) by Open AI and Copilot by Microsoft are conversational language models for learning, writing, productivity, and general assistance. By lowering the Flesch-Kincaid Grade Level (FKGL), AI tools can simplify complex medical information, making it more accessible to IBD patients with lower health literacy.
Methods:
We used the Crohn’s and Colitis Foundation (CCF) website, a non-profit organization dedicated to improving the lives of individuals with ulcerative colitis (UC) and Crohn’s disease (CD). We compiled a set of questions and answers covering the definition, symptomatology, diagnosis, treatment, and lifestyle modifications in UC and CD. We asked ChatGPT and Copilot to simplify the text to a sixth-grade reading level or below. To assess the educational level of the questionnaire, we analyzed both the original and modified text using the FKGL, Flesch-Reading Ease Score (FRES), reading level (RL), and average words. We investigated the data and performed a paired t-test using SPSS software. Additionally, we compared the FKGL scores of ChatGPT and Copilot using an independent t-test.
Results:
The average original FKGL before the modification by AI was UC (11.78) and CD (10.34) and after modification by ChatGPT and Copilot was UC (8.75 and 7.3) and CD (9.07 and 7.25). The most significant improvement was observed in the question regarding UC symptomatology, where ChatGPT lowered the RL from college to 10th-12th grade, and Copilot reduced it to a seventh-grade RL. FRES ranges from 0 to 100, with higher scores indicating that the text is easier to comprehend, whereas our scores ranged from before modification in UC (28-56) and CD (34-56) and after modification, UC (ChatGPT - 45-60, Copilot - 52-70) and CD (ChatGPT - 51-70, Copilot - 54-70), thus indicating increased ease of reading. The <italic>P-</italic>value < 0.05 for both FKGL and FRES, indicates statistical significance. However, when comparing FKGL scores between the 2 AI models, the <italic>P-</italic>value > 0.05, suggesting that the difference between them was not statistically significant.
Conclusions:
Our study has shown that AI can make medical information about UC and CD more comprehensible and accessible to everyone, regardless of their literacy levels. The AI modification was found to be statistically significant (<italic>P-</italic>value < 0.05), which means that text simplification was successful. There was a reduction in average words by 30-50% by both AI modules thus enhancing the readability in fewer words. The non-significant difference in FKGL scores between ChatGPT and Copilot highlights that both AI models are equally effective in simplifying medical information, suggesting that neither model offers a distinct advantage. Our pilot study indicates that incorporating AI into health communication strategies can improve public health literacy and enable individuals to make informed decisions about their well-being