Skip to main content
Peer Review

Peer Reviewed

Original Research

AI in Cosmetic Surgery: A New Look at Virtual Abdominoplasty and Buttock Augmentation

January 2025
1937-5719
ePlasty 2025;25:e3
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of ePlasty or HMP Global, their employees, and affiliates.

Abstract

Background. Online before-and-after photos commonly guide patient expectations in body contouring surgeries. However, recent artificial intelligence (AI) advancements allow for lifelike "photos" of hypothetical individuals, which patients can use in their decision-making. The accuracy of AI models, trained on divergent image sets, in showing realistic figures, cosmetic defects, and surgical outcomes is questionable. This study sought to evaluate the quality of these images.

Methods. We utilized AI platforms GetIMG, Leonardo, and Perchance to create pre- and post-surgery visuals for abdominoplasty and buttock augmentation. Expert board-certified plastic surgeons and plastic surgery residents assessed the images across 11 criteria, focusing on realism and clinical value. ANOVA and Tukey honestly significant difference post-hoc tests were executed for data analysis.

Results. Realism and clinical value scores among AI models (mean ± standard deviation) were not significantly different, indicating comparable performance (GetIMG 3.83 ± 0.81, Leonardo 3.30 ± 0.69, Perchance 2.68 ± 0.77; P > .05). Perchance significantly underperformed in size and volume accuracy (P = .02) and pathological feature recognition (P = .01 and .03). No consistent underperforming metric was identified when evaluated. The phenomenon of the "uncanny valley" was also identified.

Conclusions. Despite some realistic and accurate surgical predictions, most AI-generated images were anatomically unrealistic, demonstrated inaccurate postoperative results, and invoked the "uncanny valley" effect. Given the uniformly poor performance, patients should avoid using these images for surgical decisions due to the potential of unrealistic expectations. Surgeons are advised to use real patient photos for consultations. Future research aims to compare AI images with actual before-and-after photos and include a bigger pool of experts for evaluation.

Introduction

In the rapidly evolving landscape of modern medicine, the advent of artificial intelligence (AI) has heralded a new era of innovation and integration, pushing the boundaries of what is possible in various professional domains.1-4 AI's application has expanded from enhancing operational efficiencies to facilitating complex tasks that demand precision and expertise.5-7 Particularly in the realm of medical science, there has been a notable surge in AI-related research and literature especially in the last decade,8,9 reflecting the growing interest and reliance on this technology for diverse medical applications.

Despite the proliferation of AI technologies and their celebrated success in text generation models like ChatGPT, their application in the generation of medical imagery, especially pre- and postoperative visuals for surgical purposes, remains underexplored. The medical community has yet to fully embrace AI's potential in creating specific, demand-driven images, a gap that presents both challenges and opportunities. The emergence of AI tools like GetIMG, Leonardo, and Perchance offers a glimpse into a future where the generation of copyright-free, authentic images could significantly alleviate the pervasive issue of copyright infringement while democratizing access to high-quality visual resources globally.

The field of body contouring surgery, with its intrinsic reliance on a detailed understanding of human anatomy and aesthetics for procedures such as abdominoplasty and buttock augmentation, stands to benefit immensely from AI's capabilities in visual representation. Traditionally, the depiction of surgical outcomes and anatomical details has relied on artistic illustrations, medical imaging techniques, and textbook images.10-12 However, the potential of AI to revolutionize this aspect of surgical planning and patient education through the creation of lifelike, detailed pre- and postoperative images is immense.

Yet, the application of AI in this sensitive and precision-oriented domain raises critical questions regarding the accuracy and reliability of the images produced, especially when used for educational purposes, surgical planning, and research. The possibility of inaccuracies in anatomical representation poses significant risks, potentially leading to educational gaps and surgical inaccuracies.

This study aims to investigate the effectiveness and precision of AI in generating pre- and postoperative images for cosmetic body contouring surgeries such as abdominoplasty and buttock augmentation. These procedures are immensely popular and central to aesthetic enhancement,13,14 making the accuracy of visual representations crucial. By evaluating the capabilities of easily accessible AI tools such as GetIMG, Leonardo, and Perchance, this research seeks to determine the extent to which AI-generated images can provide visually and anatomically accurate representations that are clinically useful. This exploration marks a pioneering effort to assess the practicality and accuracy of AI in producing pre- and postoperative imagery, setting a foundation for further advancements in the integration of AI technologies in medical visualization and surgical planning.

Methods

Image creation

In this research, we harnessed the power of 3 sophisticated AI platforms—GetIMG, Leonardo, and Perchance—for the creation of pre- and postoperative images pertaining to abdominoplasty and buttock augmentation surgeries. Each of these platforms was selected due to its ability to produce realistic, high-definition images from text-based instructions. Detailed prompts, tailored to each surgical procedure and aimed at depicting both the expected preoperative state and the anticipated postoperative results, were crafted using ChatGPT-4. These carefully designed prompts ensured the anonymity of patient data, adhering to ethical standards and privacy concerns. Through a process of continuous refinement, we aimed to optimize these prompts to yield images that were not only relevant to the surgical context but also devoid of unnecessary elements. The goal was to impartially assess the capacity of AI to precisely replicate the intricacies of surgical outcomes.

Surgeon assessment process

To evaluate the AI-created images, we assembled a diverse panel of surgeons with vast expertise in these types of surgeries, comprising 2 board-certified plastic surgeons and 2 plastic surgery residents. This panel conducted their assessments based on comprehensive 11-point criteria, focusing on 2 primary aspects: realism and the clinical value of the images. Realism was judged based on 7 criteria: size and volume accuracy, anatomical correctness, correct simulation of age, color fidelity, texture mapping, symmetry analysis, and shadows and lighting consistency. The clinical value was assessed through 4 criteria: pathological feature recognition, postoperative result prediction, surgical relevance, and healing and scarring prediction.

Evaluation metrics

The assessment framework consisted of 11 distinct metrics:

  1. Size and volume accuracy: Measuring the precision in depicting expected dimensions and volumes.
  2. Anatomical correctness: The ability to generate images with accurate anatomical details.
  3. Correct simulation of age: The capacity to authentically reflect age-specific characteristics.
  4. Color fidelity: Realism in color compared to real-world appearances.
  5. Texture mapping: The AI's proficiency in replicating skin textures.
  6. Symmetry analysis: Analyzing the balance and symmetry, essential for aesthetic reconstructions.
  7. Shadows and lighting consistency: Evaluating the realism of shadow and lighting effects for enhanced depth.
  8. Pathological feature recognition: The AI's capability to accurately depict relevant pathological conditions.
  9. Postoperative result prediction: Assessing how accurately the AI predicts postoperative outcomes.
  10. Surgical relevance: The relevance of the images to surgical aims and planning.
  11. Healing and scarring prediction: The AI's ability to predict scarring and healing processes post-surgery.

Scoring framework

A detailed scoring framework was developed to quantitatively evaluate each criterion, allowing for precise measurement of each metric. For example, anatomical correctness could be scored on a scale from 1 to 5, with 5 representing the highest level of accuracy.

Statistical analysis

Following expert evaluations, we conducted a detailed analysis using the ANOVA statistical test. This analysis was done to examine whether there is a significant difference between the performances of the AI models in realism and clinical value, between the performances of the AI models for each of the 11 metrics, and between the overall performance of each of the metrics independent of the AI models. Significant differences identified by ANOVA prompted further examination through Tukey honestly significant difference (HSD) post-hoc analysis, aiming to minimize false positives, accurately determine which exact comparisons are statistically significant, and highlight the AI models or metrics exhibiting the weakest performance.

Ethical consideration

The study strictly avoided the use of real patient data or imagery, thus bypassing the need for approval from the institutional review board (IRB) of Rutgers New Jersey Medical School.

Results

Images

Figures 1, 2, and 3 show the AI-generated pre- and postoperative images for abdominoplasty, and Figures 4, 5, and 6 show the AI-generated pre- and postoperative images for buttock augmentation.

Figure 1

Figure 1. GetIMG generative adversarial network (GAN): generated image for pre- and postoperative abdominoplasty patient.

Figure 2

Figure 2. Leonardo generative adversarial network (GAN): generated image for pre- and postoperative abdominoplasty patient.

Figure 3

Figure 3. Perchance generative adversarial network (GAN): generated image for pre- and postoperative abdominoplasty patient.

Figure 4

Figure 4. GetIMG generative adversarial network (GAN): generated image for pre- and postoperative buttock augmentation patient.

Figure 5

Figure 5. Leonardo generative adversarial network (GAN): generated image for pre- and postoperative buttock augmentation patient.

Figure 6

Figure 6. Perchance generative adversarial network (GAN): generated image for pre- and postoperative buttock augmentation patient.

Statistical analysis

In this analysis, we evaluated the realism and clinical value of 3 AI models as detailed in Figure 7. The ANOVA tests performed on these ratings yielded P values of .29 and .65 for realism and clinical value, respectively, indicating no significant differences in the performance of the models in these areas.

Figure 7

Figure 7. Comparative performance evaluation of AI models GetIMG, Leonardo, and Perchance focuses on the main categories of realism and clinical value, including their respective subcategories.

This study also assessed the comprehensive performance of each AI model across the 11 metrics, with the observed results (mean [± standard deviation]) detailed in Figure 7 and Table 1. ANOVA tests for these criteria revealed significant differences, notably in size and volume accuracy (P = .01) and pathological feature recognition (P = .01), suggesting marked variations in performance among the models in these areas. Post-hoc analysis using the Tukey HSD test identified significant differences between Perchance and the other 2 models in size and volume accuracy (P = .02) and pathological feature recognition (P = .01 and .03), indicating Perchance's lower performance in these metrics. However, the performance among the models did not significantly differ in other metrics.

Table 1. Comparative Performance Evaluation of AI Models GetIMG, Leonardo, and Perchance

Table 1

Additionally, the study highlighted the overall average performance (mean [± standard deviation]) for each evaluation metric across the analyzed AI models, as presented in Figure 8 and Table 2.

Figure 8

Figure 8. Mean performance scores (± SD) for each evaluation metric, averaged across all AI models. The error bars represent the standard deviations, indicating the variability in performance scores.

Table 2. Overall Average Performance for Each Evaluation Metric

Table 2

Further ANOVA analysis identified a significant difference in the metrics' means, with P values <.01. Subsequent Tukey HSD post-hoc testing showed that the prediction of healing and scarring metric generally performed worse than most others, with P values consistently ≤.04. However, this metric did not show a significant difference, with P values >.05, when compared with pathological feature recognition, texture mapping, and surgical relevance metrics, as outlined in Figure 9 and Table 3.

Figure 9

Figure 9. Tukey honestly significant difference post-hoc analysis: comparisons involving healing and scarring prediction. The chart displays the generative adversarial network (GAN) for each comparison between the healing and scarring prediction metric and 10 other evaluation metrics. The red dashed line indicates the significance threshold at a P value of .05, with bars crossing this line representing nonsignificant differences.

Table 3. Tukey Honestly Significant Difference Post-Hoc Analysis of Metrics

Table 3

Table 3 continued

Panelists' feedback further highlighted the occurrence of the "uncanny valley" effect in several images, which refers to the discomfort or eeriness felt when artificial representations approach, but do not fully achieve, lifelike accuracy.

Discussion

The exploration of generative AI's utility in plastic surgery, particularly through the lenses of realism and clinical value, offers a nuanced understanding of its current capabilities and limitations. The results of this study shed light on the interplay between the recent technological advancements and clinical practice in body contouring plastic surgery.

The nonsignificant difference in the mediocre performance among the AI models GetIMG, Leonardo, and Perchance in creating realistic and clinically valuable images sheds a light on the dichotomy between AI's technological sophistication and its clinical utility. While some of the models did modestly well in generating visually compelling and realistic images, their value in the surgical planning process is questionable. Moreover, the analysis among the AI models for the metrics size and volume accuracy and pathological feature recognition revealed that, while GetIMG and Leonardo might not significantly outperform each other, they both have an advantage over Perchance in these critical surgical planning areas. Another important point to mention is that our analysis across the metrics revealed no significant difference between them. 

While AI-generated images offer a valuable visual aid in predicting surgical outcomes, several discrepancies exist between these predictions and actual postoperative results. The AI images tend to lack detailed skin texture changes and scarring, which are common after surgery. Additionally, the AI may overly smooth contours, failing to reflect residual skin laxity or asymmetry. In terms of abdominal definition, the AI-generated outcomes often present an overly smooth and flat appearance, whereas some residual folds or irregularities are typically observed post-surgery. Furthermore, AI images do not account for variations in skin tone and potential discoloration. The realism of fat distribution is another area where AI falls short, as it might not accurately depict the uneven fat removal and natural imperfections. AI predictions also frequently overestimate skin tightness, ignoring potential sagging or looseness. The side profile changes and detailing of surgical effects, such as minor scars or skinfold variations, are also not accurately captured by AI. Lastly, AI-generated images often predict perfect symmetry, while in reality, slight asymmetries are common. Highlighting these discrepancies is crucial for ethical patient counseling and setting realistic expectations, ensuring patients are fully informed about the possible outcomes and limitations of AI predictions in aesthetic surgery. These visual examples support our findings that AI-generated images lack the realism and clinical accuracy necessary for reliable use in surgical planning.

It is crucial to emphasize that the training data sources of the neural networks are not revealed to the public.15 Hence, it is possible that anatomical pictures may have been employed for the purpose of training these networks, but perhaps in a quantity that may not have been substantial enough to produce medically precise outcomes, especially for this paper's topic of focus. Considering the increased anticipation around the progress of AI technologies, it is extremely important to do thorough testing of these tools, as we have exemplified in this study, to prevent the proliferation of the misconception that they ensure flawless results. This matter underscores the importance of adopting a collaborative development methodology, wherein the iterative improvement of AI models is informed directly by the clinical insights of plastic surgery experts in the specific domains that the developers are planning to train the AI for. By ensuring that model development is in line with clinical objectives, the effectiveness of AI in preoperative planning, patient education, and realistic outcome simulation can be significantly improved and trusted.

Ethical considerations

The integration of AI in predicting surgical outcomes, particularly in aesthetic surgery, brings forth several ethical considerations that must be addressed to ensure responsible use and patient safety.

Using AI to predict surgical outcomes necessitates a clear and thorough informed consent process. Patients must be informed that AI-generated images are based on predictive models and may not accurately represent individual outcomes. Emphasizing the experimental nature of these predictions is crucial to avoid creating unrealistic expectations. Patients should be made aware of the potential discrepancies between AI predictions and actual surgical results, ensuring they have a realistic understanding of what AI technology can and cannot do.

The use of AI in medical imaging requires careful attention to data privacy and confidentiality. It is imperative to ensure that any patient data used in training AI models are anonymized and handled in compliance with relevant data protection regulations, such as the General Data Protection Regulation (GDPR) in Europe and the Health Insurance Portability and Accountability Act (HIPAA) in the United States. Protecting patient data from unauthorized access and potential misuse is a fundamental ethical obligation.

The accuracy and reliability of AI predictions are critical ethical concerns. AI models must undergo rigorous validation to ensure they provide clinically relevant and accurate predictions. This involves continuous testing against actual surgical outcomes and updates to the models as new data become available. Overreliance on unvalidated AI predictions can lead to poor surgical planning and unsatisfactory patient outcomes. Therefore, AI tools should be used as an adjunct to, rather than a replacement for, professional clinical judgment.

AI models are only as good as the data on which they are trained. There is a risk of inherent bias in AI predictions if the training data are not representative of the diverse patient population. This can lead to skewed predictions that may not be applicable to all demographic groups. It is essential to ensure that AI models are trained on diverse datasets that include various ages, genders, ethnicities, and body types to provide fair and equitable predictions.

AI-generated images can significantly influence patient expectations and decision-making. There is a potential risk that patients might place undue confidence in AI predictions, leading to disappointment if the actual surgical results differ. It is crucial for surgeons to manage these expectations by clearly communicating the limitations of AI-generated images and emphasizing the variability of individual surgical outcomes. Surgeons should use AI predictions as tools to facilitate discussions rather than definitive forecasts.

Another key point to mention is that the ethical dimensions of integrating AI into plastic surgery extend beyond patient consent and data privacy to include considerations of equity and access. Ensuring that the benefits of AI technologies are accessible across diverse patient populations is crucial to avoid exacerbating existing disparities in health care. Furthermore, practical challenges such as integrating AI tools into electronic health records and training clinicians on AI utilization need systematic solutions. Addressing these challenges requires a coordinated effort involving health care institutions, technology developers, regulatory bodies, and professional associations.

It is crucial to consider these ethical dimensions to ensure that AI technologies are used responsibly in clinical settings. Patient autonomy and informed consent are paramount, and surgeons must ensure that patients understand the limitations of AI predictions and the potential for inaccuracies.

Limitations

This study, while pioneering in assessing the utility of generative AI in body contouring plastic surgery, encounters several potential limitations that merit consideration. First, the utilization of ChatGPT-4 in the prompt engineering process, while groundbreaking, introduces a degree of unpredictability regarding the prompts' quality and specificity, which may have an impact on the output of the AI image generation platforms. Second, the evaluation framework, despite being exhaustive in nature, is subject to the interpretive variability inherent in subjective evaluation, even among seasoned plastic surgeons. Also, the relatively small sample size of evaluators (4 experts) may not fully capture the diversity of opinion present in the broader plastic surgery community, potentially limiting the generalizability of the findings. Lastly, it is important to note that the study's scope, which was restricted to 3 AI image generation platforms, might not have covered the entire range of available technologies, potentially overlooking other platforms that could possess different or superior capabilities.

Furthermore, the presence of the uncanny valley effect in our study is noteworthy due to the potential impact this psychological phenomenon may have on the expectations and satisfaction of patients.16,17 In health care contexts, it is imperative for developers to acknowledge its unsettling impact to guarantee that AI-generated depictions are precise and reassuring for patients, rather than inducing anxiety that would exacerbate the preoperative stress patients already endure.

These limitations highlight areas for future research, including the development of more standardized prompt generation methods, expanding evaluator sample sizes, and broadening the range of AI technologies assessed.

Future research should also aim to incorporate dynamic assessments of the surgeries such as videos. These evaluations can capture potential nerve injuries and other motion-related outcomes that AI programs might predict. Including such assessments will provide a more comprehensive understanding of the AI's capabilities and limitations in predicting surgical outcomes.

Another important area for future investigation is to include a direct comparison between AI-generated results and those generated by expert surgeons using preoperative surgical planning platforms already in existence. By directly comparing AI-generated images with expert-generated images, we can better assess the realism and clinical accuracy of AI tools. This comparison will also help further identify areas where AI models may need further refinement to meet the standards set by human expertise.

Conclusions

The utilization of generative AI tools for producing pre- and postoperative images does not meet the necessary level of precision needed for patients and plastic surgeons to use in consultations. This study emphasizes the significance of collaboration between technology developers and cosmetic body contouring specialists to harness the immense potential of AI in surgical visualization. Additional investigation and meticulous deliberation are necessary to safely use AI in this domain. Our study establishes a fundamental basis for subsequent investigations that seek to improve the precision, prognostic capability, and clinical applicability of images generated by AI.

Acknowledgments

Affiliations: 1Division of Plastic and Reconstructive Surgery, Rutgers New Jersey Medical School, Newark, New Jersey; 2Founder and Researcher, Arclivia, a platform for innovation and research in AI integration; 3Department of Architecture and Territory, Mediterranean University of Reggio Calabria, Calabria, Italy; 4Department of Landscape Architecture, International Credit Hours Engineering Programs of Ain Shams University, Cairo, Egypt

Correspondence: Ashley Ignatiuk, MD, MSC, FRCSC; ai253@njms.rutgers.edu

Meeting presentation: Accepted for podium presentation at PSTM 2024

Ethics: The study strictly avoided the use of real patient or animal data or imagery, thus bypassing the need for approval from the institutional review board (IRB) of Rutgers New Jersey Medical School.

Disclosures: The authors declare that they have no conflict of interest.

References

1. Dzobo K, Adotey S, Thomford NE, Dzobo W. Integrating artificial and human intelligence: a partnership for responsible innovation in biomedical engineering and medicine. OMICS. 2020;24(5):247-263.

2. Hutchinson P. Reinventing innovation management: the impact of self-innovating artificial intelligence. IEEE Trans Eng Manag. 2020;68(2):628-639.

3. Adir O, Poley M, Chen G, et al. Integrating artificial intelligence and nanotechnology for precision cancer medicine. Adv Mater. 2020 Apr;32(13):1901989.

4. Dwivedi YK, Hughes L, Ismagilova E, et al. Artificial intelligence (AI): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inform Manag. 2021;57:101994.

5. Jha J, Vishwakarma AK, Chaithra N, et al. Artificial intelligence and applications. 2023. 1st International Conference on Intelligent Computing and Research Trends (ICRT).

6. Johnson KB, Wei W, Weeraratne D, et al. Precision medicine, AI, and the future of personalized health care. Clin Transl Sci. 2021;14(1):86-93.

7. Salehi H, Burgueño R. Emerging artificial intelligence methods in structural engineering. Eng Struct. 2018;171:170-189.

8. Duan Y, Edwards JS, Dwivedi YK. Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda. Int J Inform Manag. 2019;48:63-71.

9. Tran BX, Vu GT, Ha GH, et al. Global evolution of research in artificial intelligence in health and medicine: a bibliometric study. J Clinical Med. 2019 Mar 14;8(3):360.

10. Card EB, Mauch JT, Lin IC. Learner drawing and sculpting in surgical education: a systematic review. J Surg Res. 2021 Nov;267:577-585.

11. Rohrich RJ, Sullivan D. Philadelphia's legacy of the revolution: American surgical art. Plast Reconstr Surg. 2004 Sept 15;114(4):961-963.

12. Netter FH. Atlas of Human Anatomy. Professional edition e-book. Elsevier; 2014.

13. Sinno S, Chang JB, Brownstone ND, Saadeh PB, Wall Jr S. Determining the safety and efficacy of gluteal augmentation: a systematic review of outcomes and complications. Plast Reconstr Surg. 2016;137(4):1151-1156.

14. Hurvitz KA, Olaya WA, Nguyen A, Wells JH. Evidence-based medicine: abdominoplasty. Plast Reconstr Surg. 2014;133(5):1214-1221.

15. Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-to-image generation. Presented at the International Conference on Machine Learning; July 2021.

16. Moore RK. A Bayesian explanation of the ‘uncanny calley' effect and related psychological phenomena. Sci Rep. 2012;2(1):864.

17. Ratajczyk D. Uncanny valley in video games: an overview. Homo Ludens. 2019(1 [12]):135-148.