Authors :
Chou-Cheng Chen
Volume/Issue :
Volume 7 - 2022, Issue 11 - November
Google Scholar :
https://bit.ly/3IIfn9N
DOI :
https://doi.org/10.5281/zenodo.7439934
Abstract :
- Significant amounts of health information can
be obtained from Chinese newspapers and magazines, but
the reader must spend much time to study this. Common
methods of extracting information from articles include
machine learning, text mining, word cloud sampling or use
of algorithms. A high-quality model of machine learning
for extracting information must be trained using a large
amount of good data. Before high precision and recall of
extracting information is obtained from text mining, many
keywords should be collected to identify token sentences.
This means that both extracting information from
machine learning and text mining take up significant
amounts of time. Although word cloud systems can quickly
identify which words are widely used in the article, the
extracted information is often fragmented. Accordingly,
the author has created an elegant algorithm to extract
health information from Chinese news using computation
of noun numbers. Firstly, the title or subtitle of context
from Chinese health news of websites were labeled.
Secondly, each sentence was separated via identification of
commas, periods, and question marks. Thirdly, word
segments of context were tagged as parts of speech via
natural language processing. Fourthly, the score of each
sentence was identified via computation of the number of
nouns where the nouns were identified as 3 points and 2
points as nouns detected in the title and subtitle
respectively, while other nouns were identified as 1 point.
Finally, high scoring sentences were selected via the query
of the user
- Significant amounts of health information can
be obtained from Chinese newspapers and magazines, but
the reader must spend much time to study this. Common
methods of extracting information from articles include
machine learning, text mining, word cloud sampling or use
of algorithms. A high-quality model of machine learning
for extracting information must be trained using a large
amount of good data. Before high precision and recall of
extracting information is obtained from text mining, many
keywords should be collected to identify token sentences.
This means that both extracting information from
machine learning and text mining take up significant
amounts of time. Although word cloud systems can quickly
identify which words are widely used in the article, the
extracted information is often fragmented. Accordingly,
the author has created an elegant algorithm to extract
health information from Chinese news using computation
of noun numbers. Firstly, the title or subtitle of context
from Chinese health news of websites were labeled.
Secondly, each sentence was separated via identification of
commas, periods, and question marks. Thirdly, word
segments of context were tagged as parts of speech via
natural language processing. Fourthly, the score of each
sentence was identified via computation of the number of
nouns where the nouns were identified as 3 points and 2
points as nouns detected in the title and subtitle
respectively, while other nouns were identified as 1 point.
Finally, high scoring sentences were selected via the query
of the user