SVOH: Rigorous Selection Approach for Optimal Hyperparameter Values


Authors : Kopoin NDiffon Charlemagne; Koffi Dagou Augustin; Zouneme Boris Stéphane

Volume/Issue : Volume 9 - 2024, Issue 10 - October


Google Scholar : https://tinyurl.com/5448vz8u

Scribd : https://tinyurl.com/mrnecepp

DOI : https://doi.org/10.38124/ijisrt/IJISRT24OCT497

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : The problem we address in this paper is a model selection problem. We consider the k-fold cross-validation (KCV) technique, applied to the Gaussian support vector machine (SVM) classification algorithm. In the cross-vali- dation process, the value of k for the number of subsets is generally chosen and set aprioristically (without any ex- periment). However, the value of k affects the choice of the best compromise between the estimation error and the ap- proximation error of the model. In this way, the k value of the number of subsets can severely influence the optimal values of the SVM classifier's hyperparameters and conse- quently affect the performance of the selected model and its ability to generalize. In this work, we propose a rigorous approach for finding the values of the hyperparameters of the Gaussian SVM known as SVOH (Selection of Optimal Hyperparam- eter Values) in a context of protein-protein interaction (PPI) prediction, where it is necessary to classify the pairs of pro- teins that interact together and those that do not interact together. The proposed approach considers the k value of the number of subsets as an influential parameter of the model and therefore performs learning to find an optimal value of k.

Keywords : Machine Learning, Model Selection, Cross-Vali- dation, Prediction of Protein-Protein Interactions

References :

  1. X.-Y. Pan, Y.-N. Zhang, et H.-B. Shen, « Large-Scale Prediction of Human Protein−Protein Interactions from Amino Acid Sequence Based on Latent Topic Features », J. Proteome Res., vol. 9, no 10, p. 4992‑5001, oct. 2010, doi: 10.1021/pr100618t.
  2. W. Ma, Y. Cao, W. Bao, B. Yang, et Y. Chen, « ACT-SVM: Prediction of Protein-Protein Interactions Based on Support Vector Basis Model », Sci. Program., vol. 2020, p. e8866557, juill. 2020, doi: 10.1155/2020/8866557.
  3. C. N. Kopoin, N. T. Tchimou, B. K. Saha, et M. Babri, « A Feature Extraction Method in Large Scale Prediction of Human Protein-Protein Interactions using Physicochemical Properties into Bi-gram », in 2020 IEEE International Conf on Natural and Engineering Sciences for Sahel’s Sustainable Development - Impact of Big Data Application on Society and Environment (IBASE-BF), Ouagadougou, Burkina Faso: IEEE, févr. 2020, p. 1‑7. doi: 10.1109/IBASE-BF48578.2020.9069594.
  4. M. W. Gardner et S. R. Dorling, « Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences », Atmos. Environ., vol. 32, no 14‑15, p. 2627‑2636, 1998.
  5. D. Anguita, A. Ghio, S. Ridella, et D. Sterpi, « K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines. », in DMIN, 2009, p. 291‑297.
  6. D. Anguita, L. Ghelardoni, A. Ghio, L. Oneto, et S. Ridella, « The’K’in K-fold Cross Validation. », in ESANN, 2012, p. 441‑446.
  7. J. D. Rodriguez, A. Perez, et J. A. Lozano, « Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation », IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no 3, p. 569‑575, mars 2010, doi: 10.1109/TPAMI.2009.187.
  8. Y. Bengio et Y. Grandvalet, « No unbiased estimator of the variance of k-fold cross-validation », J. Mach. Learn. Res., vol. 5, no Sep, p. 1089‑1105, 2004.
  9. S. Arlot et A. Celisse, « A survey of cross-validation procedures for model selection », Stat. Surv., vol. 4, no none, p. 40‑79, janv. 2010, doi: 10.1214/09-SS054.
  10. V. Vapnik, The Nature of Statistical Learning Theory. Springer Science & Business Media, 2013.
  11. J. L. Rodgers, « The bootstrap, the jackknife, and the randomization test: A sampling taxonomy », Multivar. Behav. Res., vol. 34, no 4, p. 441‑456, 1999.
  12. C.-W. Hsu, C.-C. Chang, et C.-J. Lin, A practical guide to support vector classification. Taipei, 2003.
  13. G. A. Y. Laura, « Algorithme de descente du gradient stochastique », 2015.
  14. J. A. A. Brito, F. E. McNeill, C. E. Webber, et D. R. Chettle, « Grid search: an innovative method for the estimation of the rates of lead exchange between body compartments », J. Environ. Monit., vol. 7, no 3, p. 241‑247, févr. 2005, doi: 10.1039/B416054A.
  15. L. Yang et A. Shami, « On hyperparameter optimization of machine learning algorithms: Theory and practice », Neurocomputing, vol. 415, p. 295‑316, nov. 2020, doi: 10.1016/j.neucom.2020.07.061.
  16. C. N. Kopoin, A. K. Atiampo, B. G. N’Guessan, et M. Babri, « Prediction of Protein-Protein Interactions from Sequences using a Correlation Matrix of the Physicochemical Properties of Amino Acids », Int. J. Comput. Sci. Netw. Secur., vol. 21, no 3, p. 41‑47, mars 2021, doi: 10.22937/IJCSNS.2021.21.3.6.
  17. J. R. Bock et D. A. Gough, « Predicting protein–protein interactions from primary structure », Bioinformatics, vol. 17, no 5, p. 455‑460, mai 2001, doi: 10.1093/bioinformatics/17.5.455.
  18. Y. E. Göktepe et H. Kodaz, « Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method », Neurocomputing, vol. 303, p. 68‑74, août 2018, doi: 10.1016/j.neucom.2018.03.062.
  19. Z.-H. You, Y.-K. Lei, L. Zhu, J. Xia, et B. Wang, « Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis », BMC Bioinformatics, vol. 14, no S8, p. S10, mai 2013, doi: 10.1186/1471-2105-14-S8-S10.
  20. Z.-H. You, S. Li, X. Gao, X. Luo, et Z. Ji, « Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model », BioMed Research International. Consulté le: 5 janvier 2019. [En ligne]. Disponible sur: https://www.hindawi.com/journals/bmri/2014/598129/abs/
  21. J. Martin, « Prédiction de la structure locale des protéines par des modeles de chaˆınes de Markov cachées », PhD Thesis, Citeseer, 2005.
  22. C.-H. Huang, H.-S. Peng, et K.-L. Ng, « Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms », BioMed Research International. Consulté le: 28 mai 2018. [En ligne]. Disponible sur: https://www.hindawi.com/journals/bmri/2015/312047/abs/
  23. M. Riley, « Functions of the Gene Products of Escherichia coli », MICROBIOL REV, vol. 57, p. 91, 1993.
  24. X.-T. Huang, Y. Zhu, L. L. H. Chan, Z. Zhao, et H. Yan, « An integrative C. elegans protein–protein interaction network with reliability assessment based on a probabilistic graphical model », Mol. Biosyst., vol. 12, no 1, p. 85‑92, 2016.
  25. Y. Z. Zhou, Y. Gao, et Y. Y. Zheng, « Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence », in Advances in Computer Science and Education Applications, vol. 202, M. Zhou et H. Tan, Éd., Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, p. 254‑262. doi: 10.1007/978-3-642-22456-0_37.
  26. V. N. Vapnik, « An overview of statistical learning theory », IEEE Trans. Neural Netw., vol. 10, no 5, p. 988‑999, 1999.
  27. C. J. van Oss, « Hydrophobicity and hydrophilicity of biosurfaces », Curr. Opin. Colloid Interface Sci., vol. 2, no 5, p. 503‑512, 1997.
  28. P. Wira, « Réseaux de neurones artificiels: architectures et applications », Cours En Ligne Univ. Haute-Alsace, 2009.
  29. L. Bottou, « Stochastic gradient descent tricks », in Neural networks: Tricks of the trade, Springer, 2012, p. 421‑436.
  30. X. Cao, W. Zhang, et Y. Yu, « A Bootstrapping Framework With Interactive Information Modeling for Network Alignment », IEEE Access, vol. 6, p. 13685‑13696, 2018, doi: 10.1109/ACCESS.2018.2811721.

The problem we address in this paper is a model selection problem. We consider the k-fold cross-validation (KCV) technique, applied to the Gaussian support vector machine (SVM) classification algorithm. In the cross-vali- dation process, the value of k for the number of subsets is generally chosen and set aprioristically (without any ex- periment). However, the value of k affects the choice of the best compromise between the estimation error and the ap- proximation error of the model. In this way, the k value of the number of subsets can severely influence the optimal values of the SVM classifier's hyperparameters and conse- quently affect the performance of the selected model and its ability to generalize. In this work, we propose a rigorous approach for finding the values of the hyperparameters of the Gaussian SVM known as SVOH (Selection of Optimal Hyperparam- eter Values) in a context of protein-protein interaction (PPI) prediction, where it is necessary to classify the pairs of pro- teins that interact together and those that do not interact together. The proposed approach considers the k value of the number of subsets as an influential parameter of the model and therefore performs learning to find an optimal value of k.

Keywords : Machine Learning, Model Selection, Cross-Vali- dation, Prediction of Protein-Protein Interactions

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe