Authors :
Karan Rathi; Manas Bisht
Volume/Issue :
Volume 8 - 2023, Issue 5 - May
Google Scholar :
https://bit.ly/3TmGbDi
Scribd :
https://bit.ly/41WaJym
DOI :
https://doi.org/10.5281/zenodo.7950988
Abstract :
Music genre classification is one example of
content-based analysis of music signals. Historically,
human-engineered features were employed to automate
this process, and in the 10-genre classification, 61%
accuracy was attained. Even yet, it falls short of the 70%
accuracy that humans are capable of in the identical
activity. Here, we suggest a novel approach that
combines understanding of the neurophysiology of the
auditory system with research on human perception in
the classification of musical genres. The technique
involves training a straightforward convolutional neural
network (CNN) to categorise a brief portion of the music
input. The genre of the song is then identified by
breaking it up into manageable chunks and combining
CNN's predictions from each individual chunk. The
filters learned in the CNN match the Spectro temporal
receptual field (STRF) in humans, and after training,
this approach reaches human-level (70%) accuracy.
Music genre classification is one example of
content-based analysis of music signals. Historically,
human-engineered features were employed to automate
this process, and in the 10-genre classification, 61%
accuracy was attained. Even yet, it falls short of the 70%
accuracy that humans are capable of in the identical
activity. Here, we suggest a novel approach that
combines understanding of the neurophysiology of the
auditory system with research on human perception in
the classification of musical genres. The technique
involves training a straightforward convolutional neural
network (CNN) to categorise a brief portion of the music
input. The genre of the song is then identified by
breaking it up into manageable chunks and combining
CNN's predictions from each individual chunk. The
filters learned in the CNN match the Spectro temporal
receptual field (STRF) in humans, and after training,
this approach reaches human-level (70%) accuracy.