MulticlassClassificationExperiment preFeaturizer not acting on the schema

**System Information (please complete the following information):**
 - OS & Version: Windows 11
 - ML.NET Version: ML.NET v4.0.2 & Auto ML.NET v0.22.2
 - .NET Version:  .NET 10

**Describe the bug**
When using a transform like DropColumns or CustomMapping the resulting transform is not taken into account as the new schema. This causes Schema validation errors with our even applying the transforms. For instance, with the following code applied outside of the will work when passed outside of the preFeaturizer, but when used as an input it fails saying it cannot find the column. If a custom mapper is used the reverse problems happen where an output schema is not seen and the columns type is rejected.
"`csharp
var transformedData = ctx.Transforms.DropColumns([HybridClassifierInputModel.imageSource]).Fit(fullData).Transform(fullData);`"

This will also result in errors after training where you will need to manually apply the transform before the predictor.

**To Reproduce**
Steps to reproduce the behavior:
1. Create a IDataView with a schema
2. Set up an experiment for the MulticlassExperimentSettings
3. Try any preFeaturizer where you would drop a column or try to change the schema too much. DropColumns or SelectColumns are perfect examples

**Expected behavior**
I would expect the transformer in preFeaturizer to be applied before the validation. Thus, it would allow for column drops or when working with data that must be massaged, or you are using in multiple ways. 

**Screenshots, Code, Sample Projects**

```csharp
MulticlassExperimentSettings textModelSettings = new MulticlassExperimentSettings()
{
	OptimizingMetric = OptimizingMetric,
	MaxExperimentTimeInSeconds = maxTrainTimeInSeconds,
	//CacheBeforeTrainer = CacheBeforeTrainer.On,
	CacheDirectoryName = Environment.CurrentDirectory, // Skip the disk and store in-memory
};

//var transformedData = ctx.Transforms.DropColumns([HybridClassifierInputModel.imageSource]).Fit(fullData).Transform(fullData);
MulticlassClassificationExperiment experiment = ctx.Auto().CreateMulticlassClassificationExperiment(textModelSettings);

TrainTestData trainValidationData = ctx.Data.TrainTestSplit(ctx.Data.ShuffleRows(transformedData), testFraction: 0.2);

ExperimentResult <MulticlassClassificationMetrics> result = experiment.Execute(trainData: trainValidationData.TrainSet,
								//preFeaturizer: ctx.Transforms.CustomMapping<HybridClassifierInputModel, TextClassifierInputModel>(HybridToTextCustomAction.CustomAction, nameof(HybridToTextCustomAction), outputSchemaDefinition: SchemaDefinition.Create(typeof(TextClassifierInputModel))),
								preFeaturizer: ctx.Transforms.DropColumns([HybridClassifierInputModel.imageSource]),
								validationData: trainValidationData.TestSet,
								labelColumnName: HybridClassifierInputModel.target,
								progressHandler: new TextCPUMlClassifierProgressHandler<IHybridMlClassifierService>(Logger)); 
```

**Additional context**
I would have like to use the data to train more than one model, but it needs some small data changes for either one. I have a custom IDataView that attaches to a DbLite. It would be good if the preFeature worked so that it could stream the data.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MulticlassClassificationExperiment preFeaturizer not acting on the schema #7522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MulticlassClassificationExperiment preFeaturizer not acting on the schema #7522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions