⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



Performance Evaluation of LVFace-B with Fusion Embedding Optimization on Pose-Variant Face Recognition


Authors : Ramneet Singh Chadha; Jugesh; Jasmehar Singh

Volume/Issue : Volume 11 - 2026, Issue 3 - March


Google Scholar : https://tinyurl.com/ycxc692w

Scribd : https://tinyurl.com/yc2cy4ce

DOI : https://doi.org/10.38124/ijisrt/26mar1065

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Since face recognition is so common and non-intrusive, it is used extensively in contemporary biometric systems. Recent developments in deep learning have significantly improved recognition performance on common benchmarks, particularly with Vision Transformers (ViTs). However, real-world face recognition needs to be computationally efficient and able to handle significant pose variations (such as frontal vs. profile views). A difficult pose-variant dataset (Celebrities in Frontal-Profile in the Wild) is used in this study to assess the state-of-the-art LVFace-B model, a ViT-based model with Progressive Cluster Optimization. End-to-end performance is evaluated using MediaPipe's BlazeFace, a lightning-fast face detector that operates at over 275 frames per second on a mobile CPU. Furthermore, a Fusion Embedding strategy is presented, wherein multiple embeddings from the same identity are averaged to generate a singular representative vector. Three identification scenarios are analyzed: a single embedding for each identity, multiple embeddings for each identity, and a fused mean embedding for each identity. Extensive experiments demonstrate that fusion embedding attains the highest accuracy (Rank-1 = 96.98%) while significantly decreasing computational demands. The results show that averaging embeddings makes them more robust when the pose changes and is a useful compromise for large-scale 1:N search. The suggested method is ready to be used in real time because it strikes a good balance between speed and accuracy.

Keywords : Computer Vision, Vision Transformer, Face Recognition, LvFace, Fusion Embedding.

References :

  1. J. You et al., “LVFACE: Progressive cluster optimization for large vision models in face recognition,” arXiv (Cornell University), Jan. 2025, doi: 10.48550/arxiv.2501.13420.
  2. S. Sengupta, J. -C. Chen, C. Castillo, V. M. Patel, R. Chellappa and D. W. Jacobs, "Frontal to profile face verification in the wild," 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 2016, pp. 1-9, doi: 10.1109/WACV.2016.7477558.
  3. V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann, “BlazeFace: Sub-millisecond Neural face Detection on mobile GPUs,” arXiv (Cornell University), Jul. 2019, doi: 10.48550/arxiv.1907.05047.
  4. C. Lugaresi et al., “MediaPipe: A framework for building perception Pipelines,” arXiv (Cornell University), Jun. \2019, doi: 10.48550/arxiv.1906.08172.
  5. Md. I. Hossain, Sama-E-Shan, and H. Kabir, “An efficient way to recognize faces using mean embeddings,” 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), vol. 10, pp. 1–10, Feb. 2021, doi: 10.1109/icaect49130.2021.9392401.
  6. Wikipedia contributors. (2025, September 17). Cosine similarity. Wikipedia. https://en.wikipedia.org/wiki/Cosine_similarity
  7. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, Dec. 2005, doi: 10.1016/j.patrec.2005.10.010.
  8. A. K. Jain, A. A. Ross, and K. Nandakumar, Introduction to Biometrics. 2011. doi: 10.1007/978-0-387-77326-1.
  9. B. DeCann and A. Ross, “Relating ROC and CMC curves via the biometric menagerie,” IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), Arlington, VA, USA, 2013, pp. 1–8, Sep. 2013, doi: 10.1109/btas.2013.6712705.
  10. N. Damer, A. Opel, and A. Nouak, “CMC curve properties and biometric source weighting in multi-biometric score-level fusion,” 17th International Conference on Information Fusion (FUSION), Salamanca, Spain, 2014, pp. 1–6, Jul. 2014, [Online]. Available: https://publica.fraunhofer.de/handle/publica/387491
  11. “Multiclass Receiver Operating Characteristic (ROC),” Scikit-learn. https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
  12. A. Nemavhola, C. Chibaya, and S. Viriri, “A systematic review of CNN architectures, databases, performance metrics, and applications in face recognition,” Information, vol. 16, no. 2, p. 107, Feb. 2025, doi: 10.3390/info16020107.
  13. Deng, Jiankang et al. “ArcFace: Additive Angular Margin Loss for Deep Face Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2018): 5962-5979.
  14. M. Kim, A. Jain, and X. Liu, “50 years of automated face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, pp. 1–20, Jan. 2026, doi: 10.1109/tpami.2026.3664269.
  15. J. Dan et al., "TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective," 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, pp. 20585-20596, doi: 10.1109/ICCV51070.2023.01887.

Since face recognition is so common and non-intrusive, it is used extensively in contemporary biometric systems. Recent developments in deep learning have significantly improved recognition performance on common benchmarks, particularly with Vision Transformers (ViTs). However, real-world face recognition needs to be computationally efficient and able to handle significant pose variations (such as frontal vs. profile views). A difficult pose-variant dataset (Celebrities in Frontal-Profile in the Wild) is used in this study to assess the state-of-the-art LVFace-B model, a ViT-based model with Progressive Cluster Optimization. End-to-end performance is evaluated using MediaPipe's BlazeFace, a lightning-fast face detector that operates at over 275 frames per second on a mobile CPU. Furthermore, a Fusion Embedding strategy is presented, wherein multiple embeddings from the same identity are averaged to generate a singular representative vector. Three identification scenarios are analyzed: a single embedding for each identity, multiple embeddings for each identity, and a fused mean embedding for each identity. Extensive experiments demonstrate that fusion embedding attains the highest accuracy (Rank-1 = 96.98%) while significantly decreasing computational demands. The results show that averaging embeddings makes them more robust when the pose changes and is a useful compromise for large-scale 1:N search. The suggested method is ready to be used in real time because it strikes a good balance between speed and accuracy.

Keywords : Computer Vision, Vision Transformer, Face Recognition, LvFace, Fusion Embedding.

Paper Submission Last Date
31 - March - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe