我正在使用clj-pdf库生成一个页面,其中包含40000页,每页上有两个相同的图像.考虑到我们曾经用python更快地完成它,它花了我大约1分30秒来生成pdf.我能做些什么让它更快?
这是堆栈跟踪.
user=> (defn #_=> gen-pdf #_=> [] #_=> (println (new java.util.Date)) #_=> (pdf [{} (for [i (range 80000)] (do [:paragraph [:image "sample_logos/batman.jpeg"] [:image "sample_logos/superman.jpeg"] ] ) )] "super.pdf") #_=> (println (new java.util.Date))) #'user/gen-pdf user=> (gen-pdf) #inst "2013-12-26T07:03:05.695-00:00" #inst "2013-12-26T07:04:23.175-00:00" nil user=>
edbond.. 5
更新:clj-pdf的作者非常善于添加对库的引用.这是使用clj-pdf的"1.11.9"版本的更新代码:
(defn gen-pdf [] (time (pdf [{:references {:batman [:image "sample_logos/batman.jpeg"] :superman [:image "sample_logos/superman.jpeg"]}} (for [i (range 80000)] [:paragraph [:reference :batman] [:reference :superman]])] "super.pdf")))
我的机器在12秒内完成.
我使用[clj-pdf"1.11.7"]运行你的例子,花了大约68秒并生成了5.4Gb文件.
然后我创建了一个python示例:
from reportlab.pdfgen import canvas from datetime import datetime batman = "sample_logos/batman.jpeg" superman = "sample_logos/superman.jpeg" n = 80000 def hello(c): for i in range(0, n): c.drawImage(batman, 0,0) c.showPage() for i in range(0, n): c.drawImage(superman, 0,0) c.showPage() t1 = datetime.now() c = canvas.Canvas("super_py.pdf") hello(c) c.save() t2 = datetime.now() print (t2 - t1)
它大致相当,使用python 2.7.5+和reportlab 2.7花了53秒生成108Mb文件.
Reportlab重用相同的图像,所以我改变了clj-pdf以允许在:image
标签中传递iText Image - 请参阅https://github.com/yogthos/clj-pdf/blob/master/src/clj_pdf/core.clj#L461
我添加了另一个条件来按原样传递Image实例:
(let [img (cond (instance? Image img-data) img-data (instance? java.awt.Image img-data) (Image/getInstance (.createImage ...
并将代码更改为
(defn gen-pdf [] (let [batman (Image/getInstance "sample_logos/batman.jpeg") superman (Image/getInstance "sample_logos/superman.jpeg")] (time (pdf [{} (for [i (range 80000)] [:paragraph ;; [:image "sample_logos/batman.jpeg"] ;; [:image "sample_logos/superman.jpeg"] [:image batman] [:image superman]])] "super.pdf"))))
这种优化使我能够在17秒和70 Mb 内生成pdf
更新:clj-pdf的作者非常善于添加对库的引用.这是使用clj-pdf的"1.11.9"版本的更新代码:
(defn gen-pdf [] (time (pdf [{:references {:batman [:image "sample_logos/batman.jpeg"] :superman [:image "sample_logos/superman.jpeg"]}} (for [i (range 80000)] [:paragraph [:reference :batman] [:reference :superman]])] "super.pdf")))
我的机器在12秒内完成.
我使用[clj-pdf"1.11.7"]运行你的例子,花了大约68秒并生成了5.4Gb文件.
然后我创建了一个python示例:
from reportlab.pdfgen import canvas from datetime import datetime batman = "sample_logos/batman.jpeg" superman = "sample_logos/superman.jpeg" n = 80000 def hello(c): for i in range(0, n): c.drawImage(batman, 0,0) c.showPage() for i in range(0, n): c.drawImage(superman, 0,0) c.showPage() t1 = datetime.now() c = canvas.Canvas("super_py.pdf") hello(c) c.save() t2 = datetime.now() print (t2 - t1)
它大致相当,使用python 2.7.5+和reportlab 2.7花了53秒生成108Mb文件.
Reportlab重用相同的图像,所以我改变了clj-pdf以允许在:image
标签中传递iText Image - 请参阅https://github.com/yogthos/clj-pdf/blob/master/src/clj_pdf/core.clj#L461
我添加了另一个条件来按原样传递Image实例:
(let [img (cond (instance? Image img-data) img-data (instance? java.awt.Image img-data) (Image/getInstance (.createImage ...
并将代码更改为
(defn gen-pdf [] (let [batman (Image/getInstance "sample_logos/batman.jpeg") superman (Image/getInstance "sample_logos/superman.jpeg")] (time (pdf [{} (for [i (range 80000)] [:paragraph ;; [:image "sample_logos/batman.jpeg"] ;; [:image "sample_logos/superman.jpeg"] [:image batman] [:image superman]])] "super.pdf"))))
这种优化使我能够在17秒和70 Mb 内生成pdf