为什么C#内存流保留了这么多内存？

Question

问

为什么C#内存流保留了这么多内存？

MwMw在不在_669_301_603 发布于 2023-01-04 14:58

我们的软件通过a解压缩某些字节数据GZipStream,从a读取数据MemoryStream.这些数据以4KB的块解压缩并写入另一个MemoryStream.

我们已经意识到进程分配的内存远远高于实际的解压缩数据.

示例:具有2,425,536字节的压缩字节数组被解压缩为23,050,718字节.我们使用的内存分析器显示Method MemoryStream.set_Capacity(Int32 value)分配了67,104,936个字节.这是保留和实际写入内存之间的2.9倍.

注:MemoryStream.set_Capacity被称为从MemoryStream.EnsureCapacity它本身被称为MemoryStream.Write我们的功能.

为什么MemoryStream保留这么大的容量,即使它只附加4KB的块？

以下是解压缩数据的代码段:

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream())
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

注意:如果相关,这是系统配置:

Windows XP 32位,

.NET 3.5

使用Visual Studio 2008编译

Scott Chambe.. 44

因为这是扩展其容量的算法.

public override void Write(byte[] buffer, int offset, int count) {

    //... Removed Error checking for example

    int i = _position + count;
    // Check for overflow
    if (i < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));

    if (i > _length) {
        bool mustZero = _position > _length;
        if (i > _capacity) {
            bool allocatedNewArray = EnsureCapacity(i);
            if (allocatedNewArray)
                mustZero = false;
        }
        if (mustZero)
            Array.Clear(_buffer, _length, i - _length);
        _length = i;
    }

    //... 
}

private bool EnsureCapacity(int value) {
    // Check for overflow
    if (value < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
    if (value > _capacity) {
        int newCapacity = value;
        if (newCapacity < 256)
            newCapacity = 256;
        if (newCapacity < _capacity * 2)
            newCapacity = _capacity * 2;
        Capacity = newCapacity;
        return true;
    }
    return false;
}

public virtual int Capacity 
{
    //...

    set {
         //...

        // MemoryStream has this invariant: _origin > 0 => !expandable (see ctors)
        if (_expandable && value != _capacity) {
            if (value > 0) {
                byte[] newBuffer = new byte[value];
                if (_length > 0) Buffer.InternalBlockCopy(_buffer, 0, newBuffer, 0, _length);
                _buffer = newBuffer;
            }
            else {
                _buffer = null;
            }
            _capacity = value;
        }
    }
}

因此,每次达到容量限制时,它的容量都会增加一倍.这样做的原因Buffer.InternalBlockCopy是大型数组的操作速度很慢,因此如果必须经常调整每个Write调用的大小,性能会显着下降.

您可以采取一些措施来提高性能,您可以将初始容量设置为至少压缩阵列的大小,然后可以将大小增加一个小于2.0减少内存量的因素.

const double ResizeFactor = 1.25;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * ResizeFactor)) //Set the initial size to be the same as the compressed size + 25%.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
               resultStream.Capacity = resultStream.Capacity * ResizeFactor; //Resize to 125% instead of 200%

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

如果你想,你可以做更多花哨的算法,比如根据当前的压缩比调整大小

const double MinResizeFactor = 1.05;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * MinResizeFactor)) //Set the initial size to be the same as the compressed size + the minimum resize factor.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
            {
               double sizeRatio = ((double)resultStream.Position + iCount) / (compressedStream.Position + 1); //The +1 is to prevent divide by 0 errors, it may not be necessary in practice.

               //Resize to minimum resize factor of the current capacity or the 
               // compressed stream length times the compression ratio + min resize 
               // factor, whichever is larger.
               resultStream.Capacity =  Math.Max(resultStream.Capacity * MinResizeFactor, 
                                                 (sizeRatio + (MinResizeFactor - 1)) * compressedStream.Length);
             }

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

接受这个答案,因为它提供了我的问题的详细答案和关于如何节省内存的其他建议.我最终通过从GZip数据的最后四个字节(源自GZip文件)中提取未压缩的数据大小并创建一个具有该确切大小作为目标的字节数组来完全删除`MemoryStream resultStream`,但我本来就走了这样,如果另一种方式没有奏效. (2认同)

usr.. 16

MemoryStream当空间不足时,它的内部缓冲区加倍.这可能导致2倍的浪费.我不知道为什么你会看到更多.但这种基本行为是可以预期的.

如果您不喜欢这种行为,请编写自己的流,将其数据存储在较小的块中(例如a List).这样的算法将其浪费量限制在64KB.

3 个回答

MemoryStream当空间不足时,它的内部缓冲区加倍.这可能导致2倍的浪费.我不知道为什么你会看到更多.但这种基本行为是可以预期的.

如果您不喜欢这种行为,请编写自己的流,将其数据存储在较小的块中(例如a List<byte[1024 * 64]>).这样的算法将其浪费量限制在64KB.

2023-01-04 15:01 回答

爱心常在V_991
看起来你正在查看分配的内存总量,而不是最后一次调用.由于内存流在重新分配时会增加一倍,因此它每次增长大约两倍 - 因此总分配内存大约为2的幂的总和,如:

和_{i = 1} ^k(2 ⁱ)= 2 ^{k + 1} -1.

(其中k是重新分配的数量,如k = 1 + log ₂ StreamSize

这是关于你看到的.

2023-01-04 15:01 回答

手机用户2602891751

因为这是扩展其容量的算法.

public override void Write(byte[] buffer, int offset, int count) {

    //... Removed Error checking for example

    int i = _position + count;
    // Check for overflow
    if (i < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));

    if (i > _length) {
        bool mustZero = _position > _length;
        if (i > _capacity) {
            bool allocatedNewArray = EnsureCapacity(i);
            if (allocatedNewArray)
                mustZero = false;
        }
        if (mustZero)
            Array.Clear(_buffer, _length, i - _length);
        _length = i;
    }

    //... 
}

private bool EnsureCapacity(int value) {
    // Check for overflow
    if (value < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
    if (value > _capacity) {
        int newCapacity = value;
        if (newCapacity < 256)
            newCapacity = 256;
        if (newCapacity < _capacity * 2)
            newCapacity = _capacity * 2;
        Capacity = newCapacity;
        return true;
    }
    return false;
}

public virtual int Capacity 
{
    //...

    set {
         //...

        // MemoryStream has this invariant: _origin > 0 => !expandable (see ctors)
        if (_expandable && value != _capacity) {
            if (value > 0) {
                byte[] newBuffer = new byte[value];
                if (_length > 0) Buffer.InternalBlockCopy(_buffer, 0, newBuffer, 0, _length);
                _buffer = newBuffer;
            }
            else {
                _buffer = null;
            }
            _capacity = value;
        }
    }
}

因此,每次达到容量限制时,它的容量都会增加一倍.这样做的原因Buffer.InternalBlockCopy是大型数组的操作速度很慢,因此如果必须经常调整每个Write调用的大小,性能会显着下降.

您可以采取一些措施来提高性能,您可以将初始容量设置为至少压缩阵列的大小,然后可以将大小增加一个小于2.0减少内存量的因素.

const double ResizeFactor = 1.25;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * ResizeFactor)) //Set the initial size to be the same as the compressed size + 25%.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
               resultStream.Capacity = resultStream.Capacity * ResizeFactor; //Resize to 125% instead of 200%

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

如果你想,你可以做更多花哨的算法,比如根据当前的压缩比调整大小

const double MinResizeFactor = 1.05;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * MinResizeFactor)) //Set the initial size to be the same as the compressed size + the minimum resize factor.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
            {
               double sizeRatio = ((double)resultStream.Position + iCount) / (compressedStream.Position + 1); //The +1 is to prevent divide by 0 errors, it may not be necessary in practice.

               //Resize to minimum resize factor of the current capacity or the 
               // compressed stream length times the compression ratio + min resize 
               // factor, whichever is larger.
               resultStream.Capacity =  Math.Max(resultStream.Capacity * MinResizeFactor, 
                                                 (sizeRatio + (MinResizeFactor - 1)) * compressedStream.Length);
             }

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

2023-01-04 15:01 回答

Not-Only-For曾广超

撰写答案