问

懒惰IO +并行:将图像转换为灰度

小丸子派大星_127 发布于 2023-01-06 15:51

buffer

我试图将并行性添加到将.bmp转换为灰度.bmp的程序中.我发现并行代码的性能通常会低2-4倍.我正在调整parBuffer/chunking大小,但似乎仍无法推理它.寻找指导.

这里使用的整个源文件:http://lpaste.net/106832

我们Codec.BMP用来读取由...表示的像素流type RGBA = (Word8, Word8, Word8, Word8).要转换为灰度,只需在所有像素上映射"亮度"变换.

串行实现字面意思是:

toGray :: [RGBA] -> [RGBA]
toGray x = map luma x

测试输入.bmp是5184 x 3456(71.7 MB).

串行实现运行在~10s,~550ns /像素.Threadscope看起来很干净:

为什么这么快？我想它有懒惰的ByteString(即使Codec.BMP使用严格的ByteString - 这里是否发生了隐式转换？)和融合.

添加并行性

添加并行性的第一次尝试是通过parList.好家伙.该程序使用~4-5GB内存并开始交换系统.

然后我读了Simon Marlow的O'Reilly书中的"使用parBuffer并行化Lazy Streams"部分并尝试parBuffer了大尺寸.这仍然没有产生理想的性能.火花尺寸非常小.

然后我尝试通过分块懒惰列表然后坚持parBuffer并行性来增加火花大小:

toGrayPar :: [RGBA] -> [RGBA]
toGrayPar x = concat $ (withStrategy (parBuffer 500 rpar) . map (map luma))
                       (chunk 8000 x)

chunk :: Int -> [a] -> [[a]]
chunk n [] = []
chunk n xs = as : chunk n bs where
  (as,bs) = splitAt (fromIntegral n) xs

但这仍然无法产生理想的性能:

  18,934,235,760 bytes allocated in the heap
  15,274,565,976 bytes copied during GC
     639,588,840 bytes maximum residency (27 sample(s))
     238,163,792 bytes maximum slop
            1910 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     35277 colls, 35277 par   19.62s   14.75s     0.0004s    0.0234s
  Gen  1        27 colls,    26 par   13.47s    7.40s     0.2741s    0.5764s

  Parallel GC work balance: 30.76% (serial 0%, perfect 100%)

  TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)

  SPARKS: 4480 (2240 converted, 0 overflowed, 0 dud, 2 GC'd, 2238 fizzled)

  INIT    time    0.00s  (  0.01s elapsed)
  MUT     time   14.31s  ( 14.75s elapsed)
  GC      time   33.09s  ( 22.15s elapsed)
  EXIT    time    0.01s  (  0.12s elapsed)
  Total   time   47.41s  ( 37.02s elapsed)

  Alloc rate    1,323,504,434 bytes per MUT second

  Productivity  30.2% of total user, 38.7% of total elapsed

gc_alloc_block_sync: 7433188
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 1017408

PAR1

我怎样才能更好地推断这里发生了什么？

撰写答案

今天，你开发时遇到什么问题呢？

立即提问

热门标签