作者:2335286cc | 来源:互联网 | 2023-02-05 10:43
IhavebeentryingtosplitastringthatcontainstextinVietnameseintoindividualwords.Forexa
I have been trying to split a string that contains text in Vietnamese into individual words. For example:
我一直试图将包含越南语文本的字符串拆分为单个单词。例如:
s = "Chào bạn, mình tên Đạt."
s =“Chàobạn,mìnhtênĐạt。”
Will be splitted into an array:
将被拆分为一个数组:
arr = {"Chào", "bạn", "mình", "tên", "Đạt"}
arr = {“Chào”,“bạn”,“mình”,“tên”,“Đạt”}
Normally in English, this would be easily solve by 1 line only:
通常在英语中,这很容易通过1行解决:
arr = s.split("\\W+");
but since there are many non-alphabetic letters in Vietnamese, it can't be solve by just one line. So the question is: Is there any regular expressions that can replace this "\W+" (I'm not very good with regular expressions)? If not, is there any other ways around it?
但由于越南语中有许多非字母字母,因此只能用一行来解决。所以问题是:是否有任何正则表达式可以替换这个“\ W +”(我对正则表达式不是很好)?如果没有,还有其他方法吗?
1 个解决方案