这个让我困惑了3天.我有一个应用程序需要评估一组具有很少元素的整数多项式(多个args).我已经有一个用Java编写的实现,我目前正在移植到C++.
在测试期间,我注意到C++版本比Java版本慢了几个数量级.我当然知道JIT-ing,这种情况对于这种编译器特别适合,但我看到的远离我的预期.
示例代码如下所示,您需要提升以编译C++代码(但只需要简单的时间测量就可以使用该依赖关系).
choeger@daishi ~/uebb % clang++ -O3 -std=c++11 polytest.cpp -lboost_timer -lboost_system choeger@daishi ~/uebb % ./a.out 0.011694s wall, 0.010000s user + 0.000000s system = 0.010000s CPU (85.5%) Ideal Result: 1e+07 0.421986s wall, 0.420000s user + 0.000000s system = 0.420000s CPU (99.5%) Result: 1e+07 choeger@daishi ~/uebb % javac PolyTest.java choeger@daishi ~/uebb % java PolyTest evals: 10000000 runtime: 17ms Ideal Result: 1.0E7 evals: 10000000 runtime: 78ms Result: 1.0E7
显然,C++版本(使用clang-3.3编译)在纯计算能力方面运行得稍微快一点,但Java(openjdk 1.7.0.60)在解释多项式时表现更好.到目前为止,我的猜测是,由于迭代过小(在样本1元素中)向量,我的C++代码不是很理想.我认为当涉及到缓存命中未命中时,JVM在这方面做得更好.
有没有办法让我的C++版本表现更好?我有没有看到不同的原因?并且作为旁注:有没有办法测量C++和Java进程的缓存一致性?
C++代码如下所示:
#include#include #include using namespace std; struct Product { int factor; vector fields; }; class SumOfProducts { public: vector sum; /** * evaluate the polynomial with arguments separated by width */ inline double eval(const double* arg, const int width) const { double res = 0.0; for (Product p : sum) { double prod = p.factor; for (int f : p.fields) { prod *= arg[f*width]; } res += prod; } return res; }; }; double idealBenchmark(const double* arg, const int width) { boost::timer::auto_cpu_timer t; double res = 0.0; // run 10M evaluations for (long l = 0; l < 10000000; l++) { res = res + arg[width] * arg[width]; } return res; } double benchmark(const double* arg, const SumOfProducts& poly) { boost::timer::auto_cpu_timer t; double res = 0.0; // run 10M evaluations for (long l = 0; l < 10000000; l++) { res = res + poly.eval(arg, 1); } return res; } int main() { //simple polynomial: x_1^2 Product p; p.factor = 1; p.fields.push_back(1); p.fields.push_back(1); SumOfProducts poly; poly.sum.push_back(p); double arg[] = { 0, 1 }; double res = idealBenchmark(arg, 1); cout << "Ideal Result: " << res << endl; res = benchmark(arg, poly); cout << "Result: " << res << endl; }
像这样的Java版本:
public class PolyTest { static class Product { public final int factor; public final int[] fields; public Product(int pFactor, int[] pFields) { factor = pFactor; fields = pFields; } } static class SumOfProducts { final Product[] sum; public SumOfProducts(Product[] pSum) { sum = pSum; } /** * evaluate the polynomial with arguments separated by width */ double eval(final double[] arg, final int width) { double res = 0.0; for (Product p : sum) { double prod = p.factor; for (int f : p.fields) { prod *= arg[f*width]; } res += prod; } return res; } } static double idealBenchmark(final double[] arg, final int width) { final long start = System.currentTimeMillis(); double res = 0.0; long evals = 0; // run 10M evaluations for (long l = 0; l < 10000000; l++) { evals++; res = res + arg[width] * arg[width]; } System.out.println("evals: " + evals + " runtime: " + (System.currentTimeMillis() - start) + "ms"); return res; } static double benchmark(final double[] arg, final SumOfProducts poly) { final long start = System.currentTimeMillis(); double res = 0.0; long evals = 0; // run 10M evaluations for (long l = 0; l < 10000000; l++) { evals++; res = res + poly.eval(arg, 1); } System.out.println("evals: " + evals + " runtime: " + (System.currentTimeMillis() - start) + "ms"); return res; } public static void main(String[] args) { //simple polynomial: x_1^2 Product p = new Product(1, new int[]{1, 1}); SumOfProducts poly = new SumOfProducts(new Product[]{p}); double arg[] = { 0, 1 }; double res = idealBenchmark(arg, 1); System.out.println("Ideal Result: " + res); res = benchmark(arg, poly); System.out.println("Result: " + res); } }
juanchopanza.. 15
你在这里制作昂贵的副本:
for (Product p : sum)
每个副本意味着完全复制std::vector
每个元素的数据成员.改为使用引用:
for (const Product& p : sum)
请注意,我制作了它们const
,因为您不需要更改范围的元素.
对于初学者,您应该更改此行
for (Product p : sum)
成为
for (Product const& p: sum)
每次迭代都会分配,复制和取消分配Product
包含其新内容的新内容std::vector<int>
.我没有看到任何其他的,但由于它接近内循环,我预计会有很大的影响.
你在这里制作昂贵的副本:
for (Product p : sum)
每个副本意味着完全复制std::vector<int>
每个元素的数据成员.改为使用引用:
for (const Product& p : sum)
请注意,我制作了它们const
,因为您不需要更改范围的元素.