从python中的列表中获取哈希的最快方法

 葉の鋼琴曲 发布于 2023-02-12 15:24

我有一长串的整数,我想把它变成MD5哈希.最快的方法是什么?我尝试了两个选项,两者都相似.只是想知道我是否错过了一种明显更快的方法.

import random
import hashlib
import cPickle as pickle

r = [random.randrange(1, 1000) for _ in range(0, 1000000)]

def method1(r):
    p = pickle.dumps(r, -1)
    return hashlib.md5(p).hexdigest()

def method2(r):
    p = str(r)
    return hashlib.md5(p).hexdigest()

def method3(r):
    p = ','.join(map(str, r))
    return hashlib.md5(p).hexdigest()

然后在iPython中计时:

timeit method1(r)
timeit method2(r)
timeit method3(r)

给我这个:

In [8]: timeit method1(r)
10 loops, best of 3: 68.7 ms per loop

In [9]: timeit method2(r)
10 loops, best of 3: 176 ms per loop

In [10]: timeit method3(r)
1 loops, best of 3: 270 ms per loop

所以,选项1是我得到的最好的.但是我必须做很多事情,而且它目前是我的代码中的速率决定步骤.

使用Python 2.7,从大型列表中获取独特哈希的任何提示或技巧都比使用Python更快?

1 个回答
  • 你可能会觉得这很有用.它使用我自己的自定义基准测试框架(基于timeit)来收集和打印结果.由于速度的变化主要是由于需要将r列表转换为hashlib.md5()可以使用的内容,我已经更新了测试用例套件,以显示如何将值存储在一个array.array,如@DSM在评论中建议的那样,大大加快了速度.请注意,由于列表中的整数都相对较小,因此我将它们存储在一个短(2字节)值的数组中.

    from __future__ import print_function
    import sys
    import timeit
    
    setup = """
    import array
    import random
    import hashlib
    import marshal
    import cPickle as pickle
    import struct
    
    r = [random.randrange(1, 1000) for _ in range(0, 1000000)]
    ra = array.array('h', r)   # create an array of shorts equivalent
    
    def method1(r):
        p = pickle.dumps(r, -1)
        return hashlib.md5(p).hexdigest()
    
    def method2(r):
        p = str(r)
        return hashlib.md5(p).hexdigest()
    
    def method3(r):
        p = ','.join(map(str, r))
        return hashlib.md5(p).hexdigest()
    
    def method4(r):
        fmt = '%dh' % len(r)
        buf = struct.pack(fmt, *r)
        return hashlib.md5(buf).hexdigest()
    
    def method5(r):
        a = array.array('h', r)
        return hashlib.md5(a).hexdigest()
    
    def method6(r):
        m = marshal.dumps(r)
        return hashlib.md5(m).hexdigest()
    
    # using pre-built array...
    def pb_method1(ra):
        p = pickle.dumps(ra, -1)
        return hashlib.md5(p).hexdigest()
    
    def pb_method2(ra):
        p = str(ra)
        return hashlib.md5(p).hexdigest()
    
    def pb_method3(ra):
        p = ','.join(map(str, ra))
        return hashlib.md5(p).hexdigest()
    
    def pb_method4(ra):
        fmt = '%dh' % len(ra)
        buf = struct.pack(fmt, *ra)
        return hashlib.md5(buf).hexdigest()
    
    def pb_method5(ra):
        return hashlib.md5(ra).hexdigest()
    
    def pb_method6(ra):
        m = marshal.dumps(ra)
        return hashlib.md5(m).hexdigest()
    """
    
    statements = {
        "pickle.dumps(r, -1)": """
            method1(r)
        """,
        "str(r)": """
            method2(r)
        """,
        "','.join(map(str, r))": """
            method3(r)
        """,
        "struct.pack(fmt, *r)": """
            method4(r)
        """,
        "array.array('h', r)": """
            method5(r)
        """,
        "marshal.dumps(r)": """
            method6(r)
        """,
    # versions using pre-built array...
        "pickle.dumps(ra, -1)": """
            pb_method1(ra)
        """,
        "str(ra)": """
            pb_method2(ra)
        """,
        "','.join(map(str, ra))": """
            pb_method3(ra)
        """,
        "struct.pack(fmt, *ra)": """
            pb_method4(ra)
        """,
        "ra (pre-built)": """
            pb_method5(ra)
        """,
        "marshal.dumps(ra)": """
            pb_method6(ra)
        """,
    }
    
    N = 10
    R = 3
    
    timings = [(
        idea,
        min(timeit.repeat(statements[idea], setup=setup, repeat=R, number=N)),
        ) for idea in statements]
    
    longest = max(len(t[0]) for t in timings)  # length of longest name
    
    print('fastest to slowest timings (Python {}.{}.{})\n'.format(*sys.version_info[:3]),
          '  ({:,d} calls, best of {:d})\n'.format(N, R))
    
    ranked = sorted(timings, key=lambda t: t[1])  # sort by speed (fastest first)
    for timing in ranked:
        print("{:>{width}} : {:.6f} secs, rel speed {rel:>8.6f}x".format(
              timing[0], timing[1], rel=timing[1]/ranked[0][1], width=longest))
    

    结果:

    fastest to slowest timings (Python 2.7.6)
       (10 calls, best of 3)
    
            ra (pre-built) : 0.037906 secs, rel speed 1.000000x
         marshal.dumps(ra) : 0.177953 secs, rel speed 4.694626x
          marshal.dumps(r) : 0.695606 secs, rel speed 18.350932x
       pickle.dumps(r, -1) : 1.266096 secs, rel speed 33.401179x
       array.array('h', r) : 1.287884 secs, rel speed 33.975950x
      pickle.dumps(ra, -1) : 1.955048 secs, rel speed 51.576558x
      struct.pack(fmt, *r) : 2.085602 secs, rel speed 55.020743x
     struct.pack(fmt, *ra) : 2.357887 secs, rel speed 62.203962x
                    str(r) : 2.918623 secs, rel speed 76.996860x
                   str(ra) : 3.686666 secs, rel speed 97.258777x
     ','.join(map(str, r)) : 4.701531 secs, rel speed 124.032173x
    ','.join(map(str, ra)) : 4.968734 secs, rel speed 131.081303x
    

    2023-02-12 15:26 回答
撰写答案
今天,你开发时遇到什么问题呢?
立即提问
热门标签
PHP1.CN | 中国最专业的PHP中文社区 | PNG素材下载 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有