我有两种方法可以读取字符串,并创建Character对象:
static void newChar(String string) { int len = string.length(); System.out.println("Reading " + len + " characters"); for (int i = 0; i < len; i++) { Character cur = new Character(string.charAt(i)); } }
和
static void justChar(String string) { int len = string.length(); for (int i = 0; i < len; i++) { Character cur = string.charAt(i); } }
当我使用18,554,760字符串运行方法时,我的运行时间差异很大.我得到的输出是:
newChar took: 20 ms justChar took: 41 ms
对于较小的输入(4,638,690个字符),时间不会变化.
newChar took: 12 ms justChar took: 13 ms
在这种情况下,为什么新的效率更高?
编辑:
我的基准代码非常hacky.
start = System.currentTimeMillis(); newChar(largeString); end = System.currentTimeMillis(); diff = end-start; System.out.println("New char took: " + diff + " ms"); start = System.currentTimeMillis(); justChar(largeString); end = System.currentTimeMillis(); diff = end-start; System.out.println("just char took: " + diff+ " ms");
Aleksey Ship.. 22
好吧,我不确定Marko是否故意复制原来的错误.TL; DR; 新实例未被使用,被淘汰.调整基准可以反转结果.不要相信错误的基准,从中学习.
这是JMH基准:
@OutputTimeUnit(TimeUnit.MICROSECONDS) @BenchmarkMode(Mode.AverageTime) @Warmup(iterations = 3, time = 1) @Measurement(iterations = 3, time = 1) @Fork(3) @State(Scope.Thread) public class Chars { // Source needs to be @State field to avoid constant optimizations // on sources. Results need to be sinked into the Blackhole to // avoid dead-code elimination private String string; @Setup public void setup() { string = "12345678901234567890"; for (int i = 0; i < 10; i++) { string += string; } } @GenerateMicroBenchmark public void newChar_DCE(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = new Character(string.charAt(i)); } } @GenerateMicroBenchmark public void justChar_DCE(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = Character.valueOf(string.charAt(i)); } } @GenerateMicroBenchmark public void newChar(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = new Character(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void justChar(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = Character.valueOf(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void newChar_prim(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { char c = new Character(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void justChar_prim(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { char c = Character.valueOf(string.charAt(i)); bh.consume(c); } } }
......这就是结果:
Benchmark Mode Samples Mean Mean error Units o.s.Chars.justChar avgt 9 93.051 0.365 us/op o.s.Chars.justChar_DCE avgt 9 62.018 0.092 us/op o.s.Chars.justChar_prim avgt 9 82.897 0.440 us/op o.s.Chars.newChar avgt 9 117.962 4.679 us/op o.s.Chars.newChar_DCE avgt 9 25.861 0.102 us/op o.s.Chars.newChar_prim avgt 9 41.334 0.183 us/op
DCE代表"死代码消除",这就是原始基准所遭受的损失.如果我们消除这种影响,以JMH的方式要求我们将值吸入Blackhole,分数会反转.所以,回想起来,这似乎表明new Character()
原始代码在DCE方面有了重大改进,而Character.valueOf
不是那么成功.我不确定我们应该讨论为什么,因为这与现实世界的用例无关,实际使用的是生成的角色.
你可以从这里走两条路:
获取基准方法的程序集以确认上面的猜想.请参见PrintAssembly.
运行更多线程.返回缓存字符和实例化新字符之间的区别会随着我们增加线程数而减少,从而点击"分配墙".
UPD:关注Marko的问题,似乎主要的影响是消除分配本身,无论是通过EA还是DCE,请参阅*_prim测试.
UPD2:看着集会.同样的运行-XX:-DoEscapeAnalysis
确认主要影响是由于消除了分配,因为逃逸分析的效果:
Benchmark Mode Samples Mean Mean error Units o.s.Chars.justChar avgt 9 94.318 4.525 us/op o.s.Chars.justChar_DCE avgt 9 61.993 0.227 us/op o.s.Chars.justChar_prim avgt 9 82.824 0.634 us/op o.s.Chars.newChar avgt 9 118.862 1.096 us/op o.s.Chars.newChar_DCE avgt 9 97.530 2.485 us/op o.s.Chars.newChar_prim avgt 9 101.905 1.871 us/op
这证明了原始的DCE猜想是不正确的.EA是主要贡献者. DCE结果仍然更快,因为我们不支付拆箱的成本,并且通常在任何方面处理返回的值.然而,基准在这方面是错误的.
好吧,我不确定Marko是否故意复制原来的错误.TL; DR; 新实例未被使用,被淘汰.调整基准可以反转结果.不要相信错误的基准,从中学习.
这是JMH基准:
@OutputTimeUnit(TimeUnit.MICROSECONDS) @BenchmarkMode(Mode.AverageTime) @Warmup(iterations = 3, time = 1) @Measurement(iterations = 3, time = 1) @Fork(3) @State(Scope.Thread) public class Chars { // Source needs to be @State field to avoid constant optimizations // on sources. Results need to be sinked into the Blackhole to // avoid dead-code elimination private String string; @Setup public void setup() { string = "12345678901234567890"; for (int i = 0; i < 10; i++) { string += string; } } @GenerateMicroBenchmark public void newChar_DCE(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = new Character(string.charAt(i)); } } @GenerateMicroBenchmark public void justChar_DCE(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = Character.valueOf(string.charAt(i)); } } @GenerateMicroBenchmark public void newChar(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = new Character(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void justChar(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { Character c = Character.valueOf(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void newChar_prim(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { char c = new Character(string.charAt(i)); bh.consume(c); } } @GenerateMicroBenchmark public void justChar_prim(BlackHole bh) { int len = string.length(); for (int i = 0; i < len; i++) { char c = Character.valueOf(string.charAt(i)); bh.consume(c); } } }
......这就是结果:
Benchmark Mode Samples Mean Mean error Units o.s.Chars.justChar avgt 9 93.051 0.365 us/op o.s.Chars.justChar_DCE avgt 9 62.018 0.092 us/op o.s.Chars.justChar_prim avgt 9 82.897 0.440 us/op o.s.Chars.newChar avgt 9 117.962 4.679 us/op o.s.Chars.newChar_DCE avgt 9 25.861 0.102 us/op o.s.Chars.newChar_prim avgt 9 41.334 0.183 us/op
DCE代表"死代码消除",这就是原始基准所遭受的损失.如果我们消除这种影响,以JMH的方式要求我们将值吸入Blackhole,分数会反转.所以,回想起来,这似乎表明new Character()
原始代码在DCE方面有了重大改进,而Character.valueOf
不是那么成功.我不确定我们应该讨论为什么,因为这与现实世界的用例无关,实际使用的是生成的角色.
你可以从这里走两条路:
获取基准方法的程序集以确认上面的猜想.请参见PrintAssembly.
运行更多线程.返回缓存字符和实例化新字符之间的区别会随着我们增加线程数而减少,从而点击"分配墙".
UPD:关注Marko的问题,似乎主要的影响是消除分配本身,无论是通过EA还是DCE,请参阅*_prim测试.
UPD2:看着集会.同样的运行-XX:-DoEscapeAnalysis
确认主要影响是由于消除了分配,因为逃逸分析的效果:
Benchmark Mode Samples Mean Mean error Units o.s.Chars.justChar avgt 9 94.318 4.525 us/op o.s.Chars.justChar_DCE avgt 9 61.993 0.227 us/op o.s.Chars.justChar_prim avgt 9 82.824 0.634 us/op o.s.Chars.newChar avgt 9 118.862 1.096 us/op o.s.Chars.newChar_DCE avgt 9 97.530 2.485 us/op o.s.Chars.newChar_prim avgt 9 101.905 1.871 us/op
这证明了原始的DCE猜想是不正确的.EA是主要贡献者. DCE结果仍然更快,因为我们不支付拆箱的成本,并且通常在任何方面处理返回的值.然而,基准在这方面是错误的.
您的测量确实会产生实际效果.
它主要是偶然的,因为你的基准测试有许多技术缺陷,它暴露的效果可能不是你想到的那个.
当且仅当 HotSpot的Escape分析成功证明生成的实例可以安全地分配到堆栈而不是堆上时,该new Character()
方法更快.因此,效果并不像你的问题所暗示的那样普遍.
new Character()
更快的原因是引用的位置:您的实例位于堆栈上,对它的所有访问都是通过CPU缓存命中.当您重用缓存实例时,您必须
访问远程static
字段;
将其取消引入远程阵列;
取消引用远程Character
实例中的数组条目;
访问char
该实例中包含的内容.
每个解除引用都是潜在的CPU缓存未命中.此外,它强制将高速缓存的一部分重定向到那些远程位置,从而导致输入字符串和/或堆栈位置上的更多高速缓存未命中.
我运行此代码jmh
:
@OutputTimeUnit(TimeUnit.MICROSECONDS) @BenchmarkMode(Mode.AverageTime) public class Chars { static String string = "12345678901234567890"; static { for (int i = 0; i < 10; i++) string += string; } @GenerateMicroBenchmark public void newChar() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)); } @GenerateMicroBenchmark public void justChar() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)); } }
这保留了代码的本质,但消除了一些系统错误,如预热和编译时间.这些是结果:
Benchmark Mode Thr Cnt Sec Mean Mean error Units o.s.Chars.justChar avgt 1 3 5 39.062 6.587 usec/op o.s.Chars.newChar avgt 1 3 5 19.114 0.653 usec/op
这将是我对正在发生的事情的最好猜测:
在newChar
你正在创建一个新的实例Character
.HotSpot的Escape Analysis可以证明实例永远不会逃脱,因此它允许堆栈分配,或者在特殊情况下Character
,可以完全消除分配,因为来自它的数据可以证明从未使用过;
在justChar
你涉及查找Character
缓存数组,这有一些成本.
为了回应Aleks的批评,我在基准测试中添加了更多方法.主要效果保持稳定,但我们得到更细微的细节关于较小的优化效果.
@GenerateMicroBenchmark public int newCharUsed() { int len = string.length(), sum = 0; for (int i = 0; i < len; i++) sum += new Character(string.charAt(i)); return sum; } @GenerateMicroBenchmark public int justCharUsed() { int len = string.length(), sum = 0; for (int i = 0; i < len; i++) sum += Character.valueOf(string.charAt(i)); return sum; } @GenerateMicroBenchmark public void newChar() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)); } @GenerateMicroBenchmark public void justChar() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)); } @GenerateMicroBenchmark public void newCharValue() { int len = string.length(); for (int i = 0; i < len; i++) new Character(string.charAt(i)).charValue(); } @GenerateMicroBenchmark public void justCharValue() { int len = string.length(); for (int i = 0; i < len; i++) Character.valueOf(string.charAt(i)).charValue(); }
基本版本是justChar
和newChar
;
...Value
方法将charValue
调用添加到基本版本;
...Used
方法添加charValue
调用(隐式)并使用该值来排除任何死代码消除.
Benchmark Mode Thr Cnt Sec Mean Mean error Units o.s.Chars.justChar avgt 1 3 1 246.847 5.969 usec/op o.s.Chars.justCharUsed avgt 1 3 1 370.031 26.057 usec/op o.s.Chars.justCharValue avgt 1 3 1 296.342 60.705 usec/op o.s.Chars.newChar avgt 1 3 1 123.302 10.596 usec/op o.s.Chars.newCharUsed avgt 1 3 1 172.721 9.055 usec/op o.s.Chars.newCharValue avgt 1 3 1 123.040 5.095 usec/op
有证据表明存在一些死代码消除(DCE)justChar
和newChar
变体,但它只是部分;
对于newChar
变体,添加charValue
没有效果所以显然它是DCE'd;
与justChar
,charValue
确实有效果,所以似乎没有消除;
DCE具有较小的整体效果,见证newCharUsed
和之间的稳定差异justCharUsed
.