Description
Description
Desired labels: area-System.Xml
The choice of XmlReader.AsyncBufferSize
and how it is used results in the allocation of a char[]
on the Large Object Heap by XmlTextReaderImpl.InitTextReaderInputAsync(...)
when the reader's settings have Async = true
.
I'm not sure if this is intentional, or an oversight? Presumably reducing this buffer size would have a positive impact on users of this code who typically operate on "small, simple" XML, but regress performance for those who operate on "big, complicated" XML which contain structures that require the buffers to be resized?
Minimal repro - running this with the VS performance profile shows the allocation of a System.Char[65537]
which is 131,098
bytes and therefore presumably ends up on the LOH:
public class Program
{
public static async Task Main(string[] args)
{
using var xmlReader = XmlReader.Create(new StringReader(@"<?xml version = ""1.0"" encoding = ""utf-8""?><myRoot><selfClosing /></myRoot>"), new XmlReaderSettings { Async = true });
while (await xmlReader.ReadAsync())
{
Console.WriteLine(xmlReader.Name);
}
}
}
Configuration
.NET 5, Windows 10, 64-bit
Regression?
I don't think so.
Data
Screenshot of the performance profiling of the minimal code repro above (it doesn't contain any important text that I haven't already included elsewhere in this issue):
Analysis
The constant is defined at the following location
There is a comment suggesting it is "64KB" but whilst that will be roughly true for using this to size the byte[]
buffer, using the same number (plus 1) to size the char[]
buffer will result in it being larger than the 85K threshold for being allocated on the LOH, since sizeof(char) = 2 * sizeof(byte) = 2 bytes
.
In the Async = false
case, it looks like the code decides on two different (smaller) buffer sizes depending on the length of the stream (if available):
runtime/src/libraries/System.Private.Xml/src/System/Xml/Core/XmlReader.cs
Lines 1779 to 1798 in a761b9f