是什么让这个函数运行得慢得多呢?-Whatmakesthisfunctionrunmuchslower?

作者：新野野 | 来源：互联网 | 2023-09-23 23:13

Ivebeentryingtomakeanexperimenttoseeifthelocalvariablesinfunctionsarestoredonast

I've been trying to make an experiment to see if the local variables in functions are stored on a stack.

我一直在尝试做一个实验，看看函数中的局部变量是否存储在堆栈中。

So I wrote a little performance test

我写了一个性能测试

function test(fn, times){
    var i = times;
    var t = Date.now()
    while(i--){
        fn()
    }
    return Date.now() - t;
} 
ene
function straight(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    a = a * 5
    b = Math.pow(b, 10)
    c = Math.pow(c, 11)
    d = Math.pow(d, 12)
    e = Math.pow(e, 25)
}
function inversed(){
    var a = 1
    var b = 2
    var c = 3
    var d = 4
    var e = 5
    e = Math.pow(e, 25)
    d = Math.pow(d, 12)
    c = Math.pow(c, 11)
    b = Math.pow(b, 10)
    a = a * 5
}

I expected to get inversed function work much faster. Instead an amazing result came out.

我期望逆函数能更快地工作。相反，一个惊人的结果出来了。

Untill I test one of the functions it runs 10 times faster than after testing the second one.

直到我测试其中一个函数，它运行速度比测试第二个函数快10倍。

Example:

例子:

> test(straight, 10000000)
30
> test(straight, 10000000)
32
> test(inversed, 10000000)
390
> test(straight, 10000000)
392
> test(inversed, 10000000)
390

Same behaviour when tested in alternative order.

同样的行为，当以不同的顺序测试时。

> test(inversed, 10000000)
25
> test(straight, 10000000)
392
> test(inversed, 10000000)
394

I've tested it both in the Chrome browser and in Node.js and I've got absolutely no clue why would it happen. The effect lasts till I refresh the current page or restart Node REPL.

我在Chrome浏览器和Node上都测试过。我完全不知道为什么会这样。效果持续到刷新当前页面或重新启动节点REPL。

What could be a source of such significant (~12 times worse) performance?

是什么原因导致了如此显著的(甚至是糟糕的12倍)性能呢?

PS. Since it seems to work only in some environemnts please write the environment You're using to test it.

由于它似乎只适用于某些环境，请编写用于测试的环境。

Mine were:

我的是:

OS: Ubuntu 14.04
Node v0.10.37
Chrome 43.0.2357.134 (Official Build) (64-bit)

操作系统:Ubuntu 14.04 Node v0.10.37 Chrome 43.0.2357.134(官方版本)(64位)

/Edit
On Firefox 39 it takes ~5500 ms for each test regardless of the order. It seems to occur only on specific engines.

/在firefox39上进行编辑，不管顺序如何，每次测试大约需要5500毫秒。它似乎只发生在特定的引擎上。

/Edit2
Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

/Edit2将函数嵌入到测试函数中，使它始终运行在相同的时间。如果函数参数总是相同的，是否可能有一个优化来嵌入函数参数?

3 个解决方案

#1

Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.

一旦你调用了两个不同功能的测试，fn()的callsite就变成了megamorphic，而V8则无法内联。

Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.

在V8中，函数调用(与方法调用o.m(…)相反)伴随着一个元素内联缓存，而不是真正的多态内联缓存。

Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.

因为V8无法在fn() callsite中内联，所以它无法对代码应用各种优化。如果您在IRHydra中查看您的代码(我将编译工件上传到了gist，以满足您的方便)，您将注意到第一个优化版本的测试(当它专门用于fn = straight时)有一个完全空的主循环。

enter image description here

V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.

V8只是内联，并删除了你希望用死代码消除优化作为基准的所有代码。在旧版本的V8而不是DCE V8中，只需通过LICM将代码从循环中移除——因为代码是完全循环不变的。

When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions

当直线不是内联的时候，V8无法应用这些优化——因此性能不同。新版本的V8仍然会将DCE应用到直上并将其自身转化为空函数

enter image description here

so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).

所以性能差异不是很大(大约2-3x)。老版本的V8在DCE上不够激进——这将在内联和非内联情况之间表现出更大的差异，因为内联情况的峰值性能仅仅是主动循环不变量代码运动(LICM)的结果。

On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.

在相关的说明上，这说明了为什么基准不应该像这样写——因为它们的结果在您最终测量一个空循环时没有任何用处。

If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.

如果您对V8中的多态及其含义感兴趣，请查看我的文章“单态是怎么回事”(有关与函数调用相关的缓存的章节“不是所有缓存都是相同的”)。我还建议大家通读我的一篇关于微基准测试的危险的演讲，例如最近的“标杆JS”演讲(视频)——它可能会帮助你避免常见的陷阱。

#2

You're misunderstanding the stack.

你误解了堆栈。

While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that

虽然“真实”堆栈实际上只有Push和Pop操作，但这并不适用于用于执行的堆栈类型。除了Push和Pop，您还可以随机访问任何变量，只要您有它的地址。这意味着，即使编译器没有为您重新排序，但局部变量的顺序并不重要。在伪汇编中，您似乎这样认为

var x = 1;
var y = 2;

x = x + 1;
y = y + 1;

translates to something like

转化为类似

push 1 ; x
push 2 ; y

; get y and save it
pop tmp
; get x and put it in the accumulator
pop a
; add 1 to the accumulator
add a, 1
; store the accumulator back in x
push a
; restore y
push tmp
; ... and add 1 to y

In truth, the real code is more like this:

事实上，真正的代码是这样的:

push 1 ; x
push 2 ; y

add [bp], 1
add [bp+4], 1

If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.

如果线程堆栈真的是一个真实的、严格的堆栈，那么这将是不可能的，是真的。在这种情况下，经营秩序和当地居民将比现在重要得多。相反，通过允许对堆栈上的值进行随机访问，可以为编译器和CPU节省大量工作。

To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.

为了回答你的实际问题，我怀疑这两个函数都没有做任何事情。您只修改了局部变量，而您的函数没有返回任何内容——编译器完全删除函数体是完全合法的，甚至可能删除函数调用。如果确实如此，那么您所观察到的性能差异可能只是一个度量工件，或者与调用函数/迭代的固有成本相关的东西。

#3

Inlining the function to the test function makes it run always the same time.
Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

将函数内联到测试函数，使其始终运行。如果函数参数总是相同的，是否可能有一个优化来嵌入函数参数?

Yes, this seems to be exactly what you are observing. As already mentioned by @Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.

是的，这似乎正是你所观察到的。正如@Luaan已经提到的，无论如何，编译器可能会丢弃直接函数和逆函数的主体，因为它们没有任何副作用，而只是操作一些局部变量。

When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.

当您第一次调用test(…，100000)时，经过一些迭代之后，优化编译器意识到被调用的fn()总是相同的，并且进行内联，避免了代价高昂的函数调用。它现在所做的就是1000万次对一个变量进行递减并对其进行0检验。

But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.

但是当您使用不同的fn调用test时，它必须去优化。它可能稍后会再做一些其他的优化，但是现在知道有两个不同的函数可以被调用，它不能再内联它们了。

Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.

因为您真正要度量的是函数调用，这将导致结果的严重差异。

An experiment to see if the local variables in functions are stored on a stack

一个实验，看看函数中的局部变量是否存储在堆栈中

Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.

对于您的实际问题，不，单个变量不是存储在堆栈(堆栈机器)中，而是存储在寄存器(寄存器机器)中。在函数中声明或使用它们的顺序并不重要。

Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:

但是，它们作为所谓的“堆栈帧”的一部分存储在堆栈中。每个函数调用都有一个框架，存储执行上下文的变量。在您的例子中，堆栈可能是这样的:

[straight: a, b, c, d, e]
[test: fn, times, i, t]
…

推荐阅读

chrome
JavaScript简介及语言特点

本文介绍了JavaScript的起源和发展历程，以及其在前端验证和服务器端开发中的应用。同时，还介绍了ECMAScript标准、DOM对象和BOM对象的作用及特点。最后，对JavaScript作为解释型语言和编译型语言的区别进行了说明。 ... [详细]

蜡笔小新 2023-12-10 17:45:49
merge
OrbitDBPeer 2 Peer Database using CRDTs

2019独角兽企业重金招聘Python工程师标准Apeer-to-peerdatabaseforthedecentralizedwebOrbitDBisaserverless ... [详细]

蜡笔小新 2023-10-16 15:17:57
text
Google Contacts API未返回输出 - Google Contacts API not returning output

Itwasworkingcorrectly,butyesterdayitstartedgiving401.IhavetriedwithGooglecontactsAPI ... [详细]

蜡笔小新 2023-10-15 16:40:16
get
用NAN写一个nodejs的c++扩大

NAN引见NAN的全称为NativeAbstractionforNode.js,其表现上是一个Node.js包。装置后，就获得一堆C++头文件，内里是一堆宏。它主要为Node.js ... [详细]

蜡笔小新 2023-10-13 13:32:56
text
在类中定义数组时出错 - Error on defining arrays in class

Iamtryingtomakeaclassthatwillreadatextfileofnamesintoanarray,thenreturnthatarra ... [详细]

蜡笔小新 2023-12-14 17:38:12
byte
java boolean 大小_java boolean 大小

先看官方文档TheJavaTutorialshavebeenwrittenforJDK8.Examplesandpracticesdescribedinthispagedontta ... [详细]

蜡笔小新 2023-12-12 13:36:56
blob
在虚拟服务器上安装oracle 10g客户端的问题及解决方法

本文讨论了在VMWARE5.1的虚拟服务器Windows Server 2008R2上安装oracle 10g客户端时出现的问题，并提供了解决方法。错误日志显示了异常访问违例，通过分析日志中的问题帧，找到了解决问题的线索。文章详细介绍了解决方法，帮助读者顺利安装oracle 10g客户端。 ... [详细]

蜡笔小新 2023-12-11 13:08:10
version
Java 11相对于Java 8，OptaPlanner性能提升有多大？

本文通过基准测试比较了Java 11和Java 8对OptaPlanner的性能提升。测试结果表明，在相同的硬件环境下，Java 11相对于Java 8在垃圾回收方面表现更好，从而提升了OptaPlanner的性能。 ... [详细]

蜡笔小新 2023-12-11 10:59:22
text
Apache Shiro 身份验证绕过漏洞 (CVE202011989) 详细解析及防范措施

本文详细解析了Apache Shiro 身份验证绕过漏洞 (CVE202011989) 的原理和影响，并提供了相应的防范措施。Apache Shiro 是一个强大且易用的Java安全框架，常用于执行身份验证、授权、密码和会话管理。在Apache Shiro 1.5.3之前的版本中，与Spring控制器一起使用时，存在特制请求可能导致身份验证绕过的漏洞。本文还介绍了该漏洞的具体细节，并给出了防范该漏洞的建议措施。 ... [详细]

蜡笔小新 2023-12-09 19:58:36
text
常用的CSS属性及用法整理

本文整理了常用的CSS属性及用法，包括背景属性、边框属性、尺寸属性、可伸缩框属性、字体属性和文本属性等，方便开发者查阅和使用。 ... [详细]

蜡笔小新 2023-12-09 03:01:43
export
图解 Google V8 # 19 ：异步编程（二）：V8 是如何实现 async/await 的？

说明图解GoogleV8学习笔记前端异步编程的方案史1、什么是回调地狱？如果在代码中过多地使用异步回调函数，会将整个代码逻辑打乱，从 ... [详细]

蜡笔小新 2023-10-16 19:08:34
export
技术周报 · 20210507

技术周报·2021-05-07-小编推荐向现代Javascript转型原文标题：Publish,ship,andinstallmodernJavaScriptforfaste ... [详细]

蜡笔小新 2023-10-16 17:21:04
export
基于STM32的智能循迹小车设计（基础版）

基于STM32的智能循迹小车设计（基础版）硬件准备1、小车底盘+四直流电机（带轮）2、STM32F103C8T6核心板3、12V8700mAh锂电池（可以用几节18650锂电池）4 ... [详细]

蜡笔小新 2023-10-16 10:39:17
export
前端简史之纵横：Node东出

引💡Ajax的出现，带来了jQuery时代，而jQuery时代也伴随着Node风暴淡淡退出了历史舞台。如果说Ajax给前端带来了从网页静 ... [详细]

蜡笔小新 2023-10-15 15:03:55
export
最简便的 JavaScript 代码检查工具安装方式

前两天发了一篇用mingw编译javascriptv8，讲述我为了在Windows下给javascript做lint检查，费劲去编译google的jav ... [详细]

蜡笔小新 2023-10-14 08:14:05

新野野

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章