[FEATURE REQUEST]: Implement ML Features 

** These should all be implemented with https://github.com/dotnet/spark/pull/1031 **

=========================================


This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features

Bucketizer has been implemented in https://github.com/dotnet/spark/pull/378 but there are more features that should be implemented.

- [ ] Feature Extractors
  - [x] TF-IDF
  - [x] Word2Vec (https://github.com/dotnet/spark/pull/491)
  - [x] CountVectorizer (https://github.com/dotnet/spark/pull/608)
  - [x] FeatureHasher (https://github.com/dotnet/spark/pull/652)
- [ ] Feature Transformers
  - [x] Tokenizer  (https://github.com/dotnet/spark/pull/574)
  - [x] StopWordsRemover (https://github.com/dotnet/spark/pull/726 thanks @SARAVANA1501 )
  - [x] n-gram (in-progress #734)
  - [x] Binarizer (in-progress #744)
  - [] PCA (in-progress)
  - [ ] PolynormalExpansion
  - [ ] Dicrete Cosine Transform (DCT)
  - [ ] StringIndexer (in-progress)
  - [ ] IndexToString
  - [ ] OneHotEncoderEstimator
  - [ ] VectorIndexer
  - [ ] Normalizer
  - [ ] StandardScaler
  - [ ] MinMaxScaler
  - [ ] MaxAbsScaler
  - [X] Bucketizer
  - [ ] ElementwiseProduct
  - [x] SQLTransformer (https://github.com/dotnet/spark/pull/781  @ramanathanv)
  - [ ] VectorAssembler
  - [ ] VectorSizeHint
  - [ ] QuantileDiscretizer
  - [ ] Imputer
- [ ] Feature Selectors
  - [ ] VectorSlicer
  - [ ] RFormula
  - [ ] ChiSqSelector
- [ ] Locality Sensitive Hashing
  - [ ] LSH Operations
    - [ ] Feature Transformation
    - [ ] Approximate Similarity Join
    - [ ] Approximate Nearest Neighbour Search
  - [ ] LSH Algorithms
    - [ ] Bucketed Random Projection for Euclidean Distance
    - [ ] MinHash for Jaccard Distance


If anyone else is going to implement probably best to put a comment here and I'll keep the list up to date.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST]: Implement ML Features #381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST]: Implement ML Features #381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions