假设我有以下代码:
String word1 = "bar"; String word2 = "foo"; String story = "Once upon a time, there was a foo and a bar." story = story.replace("foo", word1); story = story.replace("bar", word2);
此代码运行后,story
将是值"Once upon a time, there was a foo and a foo."
如果我以相反的顺序替换它们,则会出现类似的问题:
String word1 = "bar"; String word2 = "foo"; String story = "Once upon a time, there was a foo and a bar." story = story.replace("bar", word2); story = story.replace("foo", word1);
价值story
将是"Once upon a time, there was a bar and a bar."
我的目标是story
变成"Once upon a time, there was a bar and a foo."
我怎样才能实现这一目标?
这不是一个容易的问题.而且你拥有的搜索替换参数越多,它就会变得越来越棘手.你有几个选项,分散在丑陋优雅,高效浪费的调色板上:
使用StringUtils.replaceEach
Apache Commons @AlanHay推荐.如果您可以在项目中自由添加新的依赖项,这是一个不错的选择.您可能会很幸运:依赖项可能已包含在您的项目中
使用@Jeroen建议的临时占位符,并分两步执行替换:
使用原始文本中不存在的唯一标记替换所有搜索模式
用实际目标替换替换占位符
这不是一个好方法,原因如下:它需要确保第一步中使用的标签真的是唯一的; 它执行的字符串替换操作比实际需要的更多
建立从所有模式的正则表达式,并使用法Matcher
,并StringBuffer
通过建议@arshajii.这并不是很糟糕,但也不是那么好,因为建立正则表达式是一种hackish,它涉及到StringBuffer
前一段时间已经过时了StringBuilder
.
使用@mjolka提出的递归解决方案,通过在匹配的模式中拆分字符串,并在其余的段上递归.这是一个很好的解决方案,紧凑而且非常优雅.它的弱点是潜在的许多子串和连接操作,以及适用于所有递归解决方案的堆栈大小限制
将文本拆分为单词并使用Java 8流来优雅地执行替换,如@msandiford建议的那样,但当然只有在单词边界处分割时才有效,这使得它不适合作为一般解决方案
这是我的版本,基于从Apache的实现中借鉴的想法.它既不简单也不优雅,但它有效,并且应该相对有效,没有不必要的步骤.简而言之,它的工作方式如下:在文本中重复查找下一个匹配的搜索模式,并使用a StringBuilder
来累积不匹配的段和替换.
public static String replaceEach(String text, String[] searchList, String[] replacementList) { // TODO: throw new IllegalArgumentException() if any param doesn't make sense //validateParams(text, searchList, replacementList); SearchTracker tracker = new SearchTracker(text, searchList, replacementList); if (!tracker.hasNextMatch(0)) { return text; } StringBuilder buf = new StringBuilder(text.length() * 2); int start = 0; do { SearchTracker.MatchInfo matchInfo = tracker.matchInfo; int textIndex = matchInfo.textIndex; String pattern = matchInfo.pattern; String replacement = matchInfo.replacement; buf.append(text.substring(start, textIndex)); buf.append(replacement); start = textIndex + pattern.length(); } while (tracker.hasNextMatch(start)); return buf.append(text.substring(start)).toString(); } private static class SearchTracker { private final String text; private final Map<String, String> patternToReplacement = new HashMap<>(); private final Set<String> pendingPatterns = new HashSet<>(); private MatchInfo matchInfo = null; private static class MatchInfo { private final String pattern; private final String replacement; private final int textIndex; private MatchInfo(String pattern, String replacement, int textIndex) { this.pattern = pattern; this.replacement = replacement; this.textIndex = textIndex; } } private SearchTracker(String text, String[] searchList, String[] replacementList) { this.text = text; for (int i = 0; i < searchList.length; ++i) { String pattern = searchList[i]; patternToReplacement.put(pattern, replacementList[i]); pendingPatterns.add(pattern); } } boolean hasNextMatch(int start) { int textIndex = -1; String nextPattern = null; for (String pattern : new ArrayList<>(pendingPatterns)) { int matchIndex = text.indexOf(pattern, start); if (matchIndex == -1) { pendingPatterns.remove(pattern); } else { if (textIndex == -1 || matchIndex < textIndex) { textIndex = matchIndex; nextPattern = pattern; } } } if (nextPattern != null) { matchInfo = new MatchInfo(nextPattern, patternToReplacement.get(nextPattern), textIndex); return true; } return false; } }
单元测试:
@Test public void testSingleExact() { assertEquals("bar", StringUtils.replaceEach("foo", new String[]{"foo"}, new String[]{"bar"})); } @Test public void testReplaceTwice() { assertEquals("barbar", StringUtils.replaceEach("foofoo", new String[]{"foo"}, new String[]{"bar"})); } @Test public void testReplaceTwoPatterns() { assertEquals("barbaz", StringUtils.replaceEach("foobar", new String[]{"foo", "bar"}, new String[]{"bar", "baz"})); } @Test public void testReplaceNone() { assertEquals("foofoo", StringUtils.replaceEach("foofoo", new String[]{"x"}, new String[]{"bar"})); } @Test public void testStory() { assertEquals("Once upon a foo, there was a bar and a baz, and another bar and a cat.", StringUtils.replaceEach("Once upon a baz, there was a foo and a bar, and another foo and a cat.", new String[]{"foo", "bar", "baz"}, new String[]{"bar", "baz", "foo"}) ); }
使用Apache Commons StringUtils中的replaceEach()
方法:
StringUtils.replaceEach(story, new String[]{"foo", "bar"}, new String[]{"bar", "foo"})
您可以尝试这样的事情,使用Matcher#appendReplacement
和Matcher#appendTail
:
String word1 = "bar"; String word2 = "foo"; String story = "Once upon a time, there was a foo and a bar."; Pattern p = Pattern.compile("foo|bar"); Matcher m = p.matcher(story); StringBuffer sb = new StringBuffer(); while (m.find()) { /* do the swap... */ switch (m.group()) { case "foo": m.appendReplacement(sb, word1); break; case "bar": m.appendReplacement(sb, word2); break; default: /* error */ break; } } m.appendTail(sb); System.out.println(sb.toString());
Once upon a time, there was a bar and a foo.
搜索要替换的第一个单词.如果它在字符串中,则在发生之前递归字符串的部分,并在发生之后递归字符串部分.
否则,继续下一个要替换的单词.
一个天真的实现可能看起来像这样
public static String replaceAll(String input, String[] search, String[] replace) { return replaceAll(input, search, replace, 0); } private static String replaceAll(String input, String[] search, String[] replace, int i) { if (i == search.length) { return input; } int j = input.indexOf(search[i]); if (j == -1) { return replaceAll(input, search, replace, i + 1); } return replaceAll(input.substring(0, j), search, replace, i + 1) + replace[i] + replaceAll(input.substring(j + search[i].length()), search, replace, i); }
样品用法:
String input = "Once upon a baz, there was a foo and a bar."; String[] search = new String[] { "foo", "bar", "baz" }; String[] replace = new String[] { "bar", "baz", "foo" }; System.out.println(replaceAll(input, search, replace));
输出:
Once upon a foo, there was a bar and a baz.
一个不太天真的版本:
public static String replaceAll(String input, String[] search, String[] replace) { StringBuilder sb = new StringBuilder(); replaceAll(sb, input, 0, input.length(), search, replace, 0); return sb.toString(); } private static void replaceAll(StringBuilder sb, String input, int start, int end, String[] search, String[] replace, int i) { while (i < search.length && start < end) { int j = indexOf(input, search[i], start, end); if (j == -1) { i++; } else { replaceAll(sb, input, start, j, search, replace, i + 1); sb.append(replace[i]); start = j + search[i].length(); } } sb.append(input, start, end); }
不幸的是,Java String
没有indexOf(String str, int fromIndex, int toIndex)
方法.我省略了indexOf
这里的实现,因为我不确定它是否正确,但它可以在ideone上找到,以及这里发布的各种解决方案的一些粗略时间.
以下是Java 8流的可能性,可能对某些人感兴趣:
String word1 = "bar"; String word2 = "foo"; String story = "Once upon a time, there was a foo and a bar."; // Map is from untranslated word to translated word Map<String, String> wordMap = new HashMap<>(); wordMap.put(word1, word2); wordMap.put(word2, word1); // Split on word boundaries so we retain whitespace. String translated = Arrays.stream(story.split("\\b")) .map(w -> wordMap.getOrDefault(w, w)) .collect(Collectors.joining()); System.out.println(translated);
以下是Java 7中相同算法的近似值:
String word1 = "bar"; String word2 = "foo"; String story = "Once upon a time, there was a foo and a bar."; // Map is from untranslated word to translated word Map<String, String> wordMap = new HashMap<>(); wordMap.put(word1, word2); wordMap.put(word2, word1); // Split on word boundaries so we retain whitespace. StringBuilder translated = new StringBuilder(); for (String w : story.split("\\b")) { String tw = wordMap.get(w); translated.append(tw != null ? tw : w); } System.out.println(translated);
Java 8中的单线程:
story = Pattern .compile(String.format("(?<=%1$s)|(?=%1$s)", "foo|bar")) .splitAsStream(story) .map(w -> ImmutableMap.of("bar", "foo", "foo", "bar").getOrDefault(w, w)) .collect(Collectors.joining());
查看正则表达式(?<=
,?=
):http://www.regular-expressions.info/lookaround.html
如果单词可以包含特殊的正则表达式字符,请使用Pattern.quote来转义它们.
我使用guava ImmutableMap来简洁,但显然任何其他Map都可以完成这项工作.
您使用中间值(句子中尚未出现).
story = story.replace("foo", "lala"); story = story.replace("bar", "foo"); story = story.replace("lala", "bar");
作为对批评的回应:如果你使用一个足够大的罕见字符串如zq515sqdqs5d5sq1dqs4d1q5dqqé"&é5d4sqjshsjddjhodfqsqc,nvùq^μU; d&€SDQ:d:;)àçàçlala和使用,这是不可能的地步,我甚至不会展开辩论用户将进入此状态.了解用户是否通过了解源代码的唯一方法是,您可能会遇到其他问题.
是的,也许有花哨的正则表达方式.我更喜欢可读的东西,我知道也不会突破我.
同时重申@David Conrad在评论中提出的出色建议:
不要巧妙地(愚蠢地)使用某些字符串.使用Unicode专用区域中的字符,U + E000..U + F8FF.首先删除任何此类字符,因为它们不应合法地在输入中(它们在某些应用程序中仅具有特定于应用程序的含义),然后在替换时将它们用作占位符.
这是使用Map的一个不太复杂的答案.
private static String replaceEach(String str,Map<String, String> map) { Object[] keys = map.keySet().toArray(); for(int x = 0 ; x < keys.length ; x ++ ) { str = str.replace((String) keys[x],"%"+x); } for(int x = 0 ; x < keys.length ; x ++) { str = str.replace("%"+x,map.get(keys[x])); } return str; }
并调用方法
Map<String, String> replaceStr = new HashMap<>(); replaceStr.put("Raffy","awesome"); replaceStr.put("awesome","Raffy"); String replaced = replaceEach("Raffy is awesome, awesome awesome is Raffy Raffy", replaceStr);
输出是:太棒了Raffy,Raffy Raffy真棒太棒了
如果要替换由空格分隔的句子中的单词,如示例所示,则可以使用此简单算法.
在白色空间的分裂故事
如果foo将其替换为bar和副varsa,则替换每个元素
将数组加入一个字符串
如果不能接受在空间上拆分,则可以遵循此备用算法.您需要先使用较长的字符串.如果弦乐是愚蠢的,你需要先使用傻瓜然后再使用foo.
拆分字foo
用foo替换数组的每个元素
加入该数组后,在除最后一个元素之外的每个元素后添加条