I am processing sentences with Python and MeCab.
As shown in , only nouns and verbs are given conditions (nouns exclude numbers and symbols).
However, if you count the parts of speech in the output,
and the excluded words.
Why is this?
Is there such an unstable behavior in MeCab?
Or is the code wrong?
Thank you for your cooperation.
What is in the feature? How do you generate instances of MeCab?
The feature is the code, as parsed by
mecab=MeCab.Tagger("-b5242880") is basically divided into words and parts of speech (surface and feature here).The mecab instance is as above.df is the data frame containing the text.The small_list is just a list of words that have cleared the conditions.
How do you count the
feature after filtering when you only collect
surface that meets the requirements?
I'm just parsing the remaining words again.python regular-expression pandas mecab
Analysis of "Yes" (depending on the dictionary)
Then, if you analyze this surface layer again, Of course
That's the result.
There is no particular problem with the code.
I thought it would be fine, but it's not an essential mistake.
© 2022 OneMinuteCode. All rights reserved.