Skip to content

Buffer allocated on LOH by XmlTextReader when using Async mode #61459

Closed
@chrisdcmoore

Description

@chrisdcmoore

Description

Desired labels: area-System.Xml

The choice of XmlReader.AsyncBufferSize and how it is used results in the allocation of a char[] on the Large Object Heap by XmlTextReaderImpl.InitTextReaderInputAsync(...) when the reader's settings have Async = true.

I'm not sure if this is intentional, or an oversight? Presumably reducing this buffer size would have a positive impact on users of this code who typically operate on "small, simple" XML, but regress performance for those who operate on "big, complicated" XML which contain structures that require the buffers to be resized?

Minimal repro - running this with the VS performance profile shows the allocation of a System.Char[65537] which is 131,098 bytes and therefore presumably ends up on the LOH:

    public class Program
    {
        public static async Task Main(string[] args)
        {
            using var xmlReader = XmlReader.Create(new StringReader(@"<?xml version = ""1.0"" encoding = ""utf-8""?><myRoot><selfClosing /></myRoot>"), new XmlReaderSettings { Async = true });
            while (await xmlReader.ReadAsync())
            {
                Console.WriteLine(xmlReader.Name);
            }
        }
    }

Configuration

.NET 5, Windows 10, 64-bit

Regression?

I don't think so.

Data

Screenshot of the performance profiling of the minimal code repro above (it doesn't contain any important text that I haven't already included elsewhere in this issue):

xmlreader-async-loh

Analysis

The constant is defined at the following location

internal const int AsyncBufferSize = 64 * 1024; //64KB

There is a comment suggesting it is "64KB" but whilst that will be roughly true for using this to size the byte[] buffer, using the same number (plus 1) to size the char[] buffer will result in it being larger than the 85K threshold for being allocated on the LOH, since sizeof(char) = 2 * sizeof(byte) = 2 bytes.

// take over the byte buffer allocated in XmlReader.Create, if available
int bufferSize;
if (bytes != null)
{
_ps.bytes = bytes;
_ps.bytesUsed = byteCount;
bufferSize = _ps.bytes.Length;
}
else
{
// allocate the byte buffer
if (_laterInitParam != null && _laterInitParam.useAsync)
{
bufferSize = AsyncBufferSize;
}
else
{
bufferSize = XmlReader.CalcBufferSize(stream);
}
if (_ps.bytes == null || _ps.bytes.Length < bufferSize)
{
_ps.bytes = new byte[bufferSize];
}
}
// allocate char buffer
if (_ps.chars == null || _ps.chars.Length < bufferSize + 1)
{
_ps.chars = new char[bufferSize + 1];
}

In the Async = false case, it looks like the code decides on two different (smaller) buffer sizes depending on the length of the stream (if available):

internal static int CalcBufferSize(Stream input)
{
// determine the size of byte buffer
int bufferSize = DefaultBufferSize;
if (input.CanSeek)
{
long len = input.Length;
if (len < bufferSize)
{
bufferSize = (int)len;
}
else if (len > MaxStreamLengthForDefaultBufferSize)
{
bufferSize = BiggerBufferSize;
}
}
// return the byte buffer size
return bufferSize;
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions