Optimizing AI Models for Doping Detection with LOOCV to Address Data Imbalance in Mass Spectrometry
- Authors
- Park, Hana; Son, Junghyun
- Issue Date
- 2025-02-17
- Publisher
- Manfred Donike Institute
- Citation
- 43rd Cologne Workshop on Dope Analysis 2025
- Abstract
- The application of artificial intelligence (AI) in analytical chemistry has advanced significantly, improving data processing, pattern recognition, and decision-making in complex datasets. In doping analysis, the continuous expansion of the World Anti-Doping Agency (WADA) list of prohibited substances poses increasing challenges to conventional screening and reviewing results, particularly in maintaining analytical efficiency and accuracy. To address these limitations, this study applies AI-based methodologies to enhance the detection of banned substances in mass spectrometry (MS) data. A major obstacle in doping analysis is the extreme class imbalance, with positive cases comprising only approximately 1% of total samples. Training machine learning models on such highly skewed datasets often results in poor sensitivity to rarely positive cases. To overcome this issue, we systematically evaluated multiple machine learning models and employed robust validation strategies to improve model generalizability. Specifically, six machine learning algorithms―logistic regression, K-nearest neighbor (KNN), support vector machine (SVM), Gaussian Naive Bayes, random forest (RF), and extreme gradient boosting―were trained and assessed using both K-fold cross-validation and leave-one-out cross validation (LOOCV). Comparative analysis demonstrated that LOOCV outperformed K-fold cross-validation by improving sensitivity to positive cases in an imbalanced dataset. Notably, the RF and KNN models trained with LOOCV achieved 100% in all classification metrics, highlighting the effectiveness of LOOCV as a validation strategy for doping datasets with extreme class imbalance. Furthermore, the implementation of this AI-based framework reduced manual intervention, improved processing efficiency, and enhanced consistency and reliability in doping detection workflows.
- URI
- https://pubs.kist.re.kr/handle/201004/151811
- Appears in Collections:
- KIST Conference Paper > Others
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.