Skip to content

TypeNameParser API #97566

Closed
Closed
@adamsitnik

Description

@adamsitnik

Background and Motivation

For years various teams at Microsoft and outside of it had been implementing their own type name parsers.

None of them were good for untrusted input. We want to put an end to it and provide a single, public API for parsing type names from untrusted input.

The new APIs need to be shipped in an OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.

Proposed API

The parser has two modes:

  • Default: ECMA-335 compliant mode where everything that the standard allows for is allowed (used by CLR APIs like Type.GetType).
  • Strict: extends ECMA-335 standard limitations with a set of opinionated rules based on most up-to-date security knowledge (example: no escaping, no null or path characters etc).

To prevent from unbounded recursion for inputs like typeof(List<List<List<List<List<...>>>>>).FullName, the parser introduces the concept of complexity:

  • Represents the total amount of work that needs to be performed to fully inspect given type name, including any generic arguments or underlying or nested types.
  • There's not really a parallel concept to this in reflection. Think of it as the total number of TypeName instances that would be created if you were to totally deconstruct this instance and visit each intermediate TypeName that occurs as part of deconstruction.
  • int and Person each have complexities of 1 because they're standalone types.
  • int[] has a complexity of 2 because to fully inspect it involves inspecting the array type itself, plus unwrapping the underlying type (int) and inspecting that.
  • Dictionary<string, List<int[][]>> has complexity 8 because fully visiting it involves inspecting 8 TypeName instances total:
    • Dictionary<string, List<int[][]>> (the original type)
    • Dictionary2` (the generic type definition)
    • string (a type argument of Dictionary)
    • List<int[][]> (a type argument of Dictionary
    • List`1 (the generic type definition)
    • int[][] (a type argument of List)
    • int[] (the underlying type of int[][])
    • int (the underlying type of int[]

Returned information matches the System.Type APIs: Name, FullName, AssemblyQualifiedName etc.

namespace System.Reflection.Metadata;

public sealed class TypeName : IEquatable<TypeName>
{
    internal TypeName() { }
    
    /// <summary>
    /// The assembly-qualified name of the type; e.g., "System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089".
    /// </summary>
    /// <remarks>
    /// If <see cref="GetAssemblyName()"/> returns null, simply returns <see cref="FullName"/>.
    /// </remarks>
    public string AssemblyQualifiedName { get; }
    
    /// <summary>
    /// If this type is a nested type (see <see cref="IsNestedType"/>), gets
    /// the containing type. If this type is not a nested type, returns null.
    /// </summary>
    /// <remarks>
    /// For example, given "Namespace.Containing+Nested", unwraps the outermost type and returns "Namespace.Containing".
    /// </remarks>
    public TypeName? ContainingType { get; }
    
    /// <summary>
    /// The full name of this type, including namespace, but without the assembly name; e.g., "System.Int32".
    /// Nested types are represented with a '+'; e.g., "MyNamespace.MyType+NestedType".
    /// </summary>
    /// <remarks>
    public string FullName { get; }
    
    public bool IsArray { get; }
    public bool IsConstructedGenericType { get; }
    public bool IsElementalType { get; }
    public bool IsManagedPointerType { get; } // name inconsistent with Type.IsByRef
    public bool IsNestedType { get; }
    public bool IsSzArrayType { get; }
    public bool IsUnmanagedPointerType { get; } // name inconsistent with Type.IsPointer
    public bool IsVariableBoundArrayType { get; }
    
    /// <summary>
    /// The name of this type, without the namespace and the assembly name; e.g., "Int32".
    /// Nested types are represented without a '+'; e.g., "MyNamespace.MyType+NestedType" is just "NestedType".
    /// </summary>
    public string Name { get; }

    /// <summary>
    /// Represents the total amount of work that needs to be performed to fully inspect
    /// this instance, including any generic arguments or underlying types.
    /// </summary>
    /// <remarks>
    /// <para>There's not really a parallel concept to this in reflection. Think of it
    /// as the total number of <see cref="TypeName"/> instances that would be created if
    /// you were to totally deconstruct this instance and visit each intermediate <see cref="TypeName"/>
    /// that occurs as part of deconstruction.</para>
    /// <para>"int" and "Person" each have complexities of 1 because they're standalone types.</para>
    /// <para>"int[]" has a complexity of 2 because to fully inspect it involves inspecting the
    /// array type itself, <em>plus</em> unwrapping the underlying type ("int") and inspecting that.</para>
    /// <para>
    /// "Dictionary&lt;string, List&lt;int[][]&gt;&gt;" has complexity 8 because fully visiting it
    /// involves inspecting 8 <see cref="TypeName"/> instances total:
    /// <list type="bullet">
    /// <item>Dictionary&lt;string, List&lt;int[][]&gt;&gt; (the original type)</item>
    /// <item>Dictionary`2 (the generic type definition)</item>
    /// <item>string (a type argument of Dictionary)</item>
    /// <item>List&lt;int[][]&gt; (a type argument of Dictionary)</item>
    /// <item>List`1 (the generic type definition)</item>
    /// <item>int[][] (a type argument of List)</item>
    /// <item>int[] (the underlying type of int[][])</item>
    /// <item>int (the underlying type of int[])</item>
    /// </list>
    /// </para>
    /// </remarks>
    public int TotalComplexity { get; }
    
    /// <summary>
    /// If this type is not an elemental type (see <see cref="IsElementalType"/>), gets
    /// the underlying type. If this type is an elemental type, returns null.
    /// </summary>
    /// <remarks>
    /// For example, given "int[][]", unwraps the outermost array and returns "int[]".
    /// Given "Dictionary&lt;string, int&gt;", returns the generic type definition "Dictionary&lt;,&gt;".
    /// </remarks>
    public TypeName? UnderlyingType { get; }

    public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);

    public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
    
    public int GetArrayRank();
    
    /// <summary>
    /// Returns assembly name which contains this type, or null if this <see cref="TypeName"/> was not
    /// created from a fully-qualified name.
    /// </summary>
    /// <remarks>Since <seealso cref="AssemblyName"/> is mutable, this method returns a copy of it.</remarks>
    public Reflection.AssemblyName? GetAssemblyName();
    
    /// <summary>
    /// If this <see cref="TypeName"/> represents a constructed generic type, returns a span
    /// of all the generic arguments. Otherwise it returns an empty span.
    /// </summary>
    /// <remarks>
    /// <para>For example, given "Dictionary&lt;string, int&gt;", returns a 2-element span containing
    /// string and int.</para>
    /// </remarks>
    public ReadOnlySpan<TypeName> GetGenericArguments();
}

public sealed class TypeNameParseOptions
{
    public TypeNameParseOptions() { }
    public bool AllowFullyQualifiedName { get; set; } = true;
    
    /// <summary>
    /// Limits the maximum value of <seealso cref="TypeName.TotalComplexity"/> that parser can handle.
    /// </summary>
    public int MaxTotalComplexity { get; set; } = 10;
    
    /// <summary>
    /// Extends ECMA-335 standard limitations with a set of opinionated rules based on most up-to-date security knowledge.
    /// </summary>
    /// <remarks>
    /// When parsing AssemblyName, only Version, Culture and PublicKeyToken attributes are allowed.
    /// The comparison is also case-sensitive (in contrary to <seealso cref="AssemblyName(string)"/> constructor).
    /// </remarks>
    public bool StrictValidation { get; set; } = false;
}

Usage Examples

Sample serialization binder (whole code with tests can be found here):

public override Type? BindToType(string assemblyName, string typeName)
{
    // Fast path for common primitive type names and user-defined type names
    // that use the same syntax and casing as System.Type.FullName API.
    if (TryGetTypeFromFullName(typeName, out Type type))
    {
        return type;
    }

    TypeNameParsingOptions options = new()
    {
        AllowFullyQualifiedName = false,
        MaxTotalComplexity = 10
    };

    if (!TypeName.TryParse(typeName.AsSpan(), out TypeName parsed, options))
    {
        // we can throw any exception, log the information etc
        throw new InvalidOperationException($"Invalid type name: '{typeName}'");
    }

    return GetTypeFromParsedTypeName(parsed);
}

private Type? GetTypeFromParsedTypeName(TypeName parsed)
{
    if (TryGetTypeFromFullName(parsed.FullName, out Type type))
    {
        return type;
    }
    else if (parsed.IsArray)
    {
        TypeName arrayElementTypeName = parsed.UnderlyingType; // equivalent of type.GetElementType()
        Type arrayElementType = GetTypeFromParsedTypeName(arrayElementTypeName); // recursive call allows for creating arrays of arrays etc

        return parsed.IsSzArrayType
            ? arrayElementType.MakeArrayType()
            : arrayElementType.MakeArrayType(parsed.GetArrayRank());
    }
    else if (parsed.IsConstructedGenericType)
    {
        TypeName genericTypeDefinitionName = parsed.UnderlyingType; // equivalent of type.GetGenericTypeDefinition()
        Type genericTypeDefinition = GetTypeFromParsedTypeName(genericTypeDefinitionName);
        Debug.Assert(genericTypeDefinition.IsGenericTypeDefinition);

        ReadOnlySpan<TypeName> genericArgs = parsed.GetGenericArguments();
        Type[] typeArguments = new Type[genericArgs.Length];
        for (int i = 0; i < genericArgs.Length; i++)
        {
            typeArguments[i] = GetTypeFromParsedTypeName(genericArgs[i]); // recursive call allows for generics of generics like "List<int?>"
        }
        return genericTypeDefinition.MakeGenericType(typeArguments);
    }

    throw new ArgumentException($"{parsed.FullName} is not on the allow list.");
}

Risks

Introducing changes to the behavior of strict mode in the future can break the users.
If we ever do that, it will be just to enforce new security best practices and will break only very, very unusual input.

Initial issue description:

I am working on the design of new type parser. It's going to include a new type that represents the parsed type name, the type name parser and most likely an option bag for its customization. It may also include an AssemblyNameParser, but I am not sure whether this is going to be required (nobody has asked for a standalone assembly name parser).

For brevity I am going to call these types: TypeName, TypeNameParser and TypeNameParserOptions.

public sealed class TypeName
{
    // properties that describe parsed type like its generic arguments or array rank
}
public ref struct TypeNameParser
{
    public static TypeName Parse(ReadOnlySpan<char> name, TypeNameParserOptions? options = null);
}
public class TypeNameParserOptions
{
    // properties that describe customizable settings like max allowed recursion depth
}

Before I submit the proposal, I want to have something that:

  • replaces the internal TypeNameParser used by System.Private.CoreLib
  • replaces the internal TypeNameParser used by ILVerification
  • replaces the internal TypeNameParser used by ILCompiler*
  • replaces the parser built by Levi in his private repo
  • can be shipped from dotnet/runtime as part of .NET 9 shared framework and as OOB for .NET Standard 2.0 (so existing customers who target older TFMs can use it for security purposes)
  • all tests are passing everywhere

I'll try to replace the Roslyn parser too, but I can't promise that (please let me know if this is a must have).

The thing I am not sure of is where the mentioned types should belong.

For example, currently the Type.GetType(string name) is part of CoreLib. I believe that my proposal should include a new Type method for loading type from a parsed name:

public class Type
{
    public static Type? GetType(TypeName typeName);
}

So those who have parsed the type name and verified it, could load the type without parsing the type name again. This leads me to thinking, that TypeName should be part of CoreLib.
But can I at the same time ship this type in an OOB package like System.Reflection.Metadata?

  • it targets net9.0, so I would need to exclude it for this TFM
  • it targets netstandard2.0, which could lead into a situation where a .NET 9 apps references a NS2.0 library that references the package and it leads to two types with the same name being loaded and a runtime error when the TypeName from OOB is passed to Type.Load(TypeName) in CoreLib?

I can also simply not extend the Type class and move the Load method to TypeName itself and reference the TypeNameParser as a link in CoreLib, with sth like this:

#if SYSTEM_PRIVATE_CORELIB
    internal
#else
    public
#endif
        struct TypeNameParser

But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from CoreLib and another, public from the OOB package.

@jkotas @GrabYourPitchforks are there any better solutions?

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.Reflection.Metadatain-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions