Skip to content

Commit cb1e1a0

Browse files
Merge pull request #17967 from SamuelMarks:keras.layers.preprocessing-defaults-to
PiperOrigin-RevId: 527024756
2 parents 08f9b1a + a1925ec commit cb1e1a0

9 files changed

+56
-50
lines changed

keras/layers/preprocessing/category_encoding.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ class CategoryEncoding(base_layer.Layer):
9090
inputs to the layer must integers in the range `0 <= value <
9191
num_tokens`, or an error will be thrown.
9292
output_mode: Specification for the output of the layer.
93-
Defaults to `"multi_hot"`. Values can be `"one_hot"`, `"multi_hot"` or
93+
Values can be `"one_hot"`, `"multi_hot"` or
9494
`"count"`, configuring the layer as follows:
9595
- `"one_hot"`: Encodes each individual element in the input into an
9696
array of `num_tokens` size, containing a 1 at the element index. If
@@ -105,6 +105,7 @@ class CategoryEncoding(base_layer.Layer):
105105
- `"count"`: Like `"multi_hot"`, but the int array contains a count of
106106
the number of times the token at that index appeared in the sample.
107107
For all output modes, currently only output up to rank 2 is supported.
108+
Defaults to `"multi_hot"`.
108109
sparse: Boolean. If true, returns a `SparseTensor` instead of a dense
109110
`Tensor`. Defaults to `False`.
110111

keras/layers/preprocessing/discretization.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -164,8 +164,8 @@ class Discretization(base_preprocessing_layer.PreprocessingLayer):
164164
0.01). Higher values of epsilon increase the quantile approximation, and
165165
hence result in more unequal buckets, but could improve performance
166166
and resource consumption.
167-
output_mode: Specification for the output of the layer. Defaults to
168-
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, or
167+
output_mode: Specification for the output of the layer. Values can be
168+
`"int"`, `"one_hot"`, `"multi_hot"`, or
169169
`"count"` configuring the layer as follows:
170170
- `"int"`: Return the discretized bin indices directly.
171171
- `"one_hot"`: Encodes each individual element in the input into an
@@ -180,9 +180,10 @@ class Discretization(base_preprocessing_layer.PreprocessingLayer):
180180
will be `(..., num_tokens)`.
181181
- `"count"`: As `"multi_hot"`, but the int array contains a count of
182182
the number of times the bin index appeared in the sample.
183+
Defaults to `"int"`.
183184
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`,
184185
and `"count"` output modes. If True, returns a `SparseTensor` instead of
185-
a dense `Tensor`. Defaults to False.
186+
a dense `Tensor`. Defaults to `False`.
186187
187188
Examples:
188189

keras/layers/preprocessing/hashed_crossing.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,15 @@ class HashedCrossing(base_layer.Layer):
5151
5252
Args:
5353
num_bins: Number of hash bins.
54-
output_mode: Specification for the output of the layer. Defaults to
55-
`"int"`. Values can be `"int"`, or `"one_hot"` configuring the layer as
56-
follows:
54+
output_mode: Specification for the output of the layer. Values can be
55+
`"int"`, or `"one_hot"` configuring the layer as follows:
5756
- `"int"`: Return the integer bin indices directly.
5857
- `"one_hot"`: Encodes each individual element in the input into an
5958
array the same size as `num_bins`, containing a 1 at the input's bin
6059
index.
60+
Defaults to `"int"`.
6161
sparse: Boolean. Only applicable to `"one_hot"` mode. If True, returns a
62-
`SparseTensor` instead of a dense `Tensor`. Defaults to False.
62+
`SparseTensor` instead of a dense `Tensor`. Defaults to `False`.
6363
**kwargs: Keyword arguments to construct a layer.
6464
6565
Examples:

keras/layers/preprocessing/hashing.py

+9-9
Original file line numberDiff line numberDiff line change
@@ -109,17 +109,16 @@ class Hashing(base_layer.Layer):
109109
bin, so the effective number of bins is `(num_bins - 1)` if `mask_value`
110110
is set.
111111
mask_value: A value that represents masked inputs, which are mapped to
112-
index 0. Defaults to None, meaning no mask term will be added and the
113-
hashing will start at index 0.
112+
index 0. `None` means no mask term will be added and the
113+
hashing will start at index 0. Defaults to `None`.
114114
salt: A single unsigned integer or None.
115115
If passed, the hash function used will be SipHash64, with these values
116116
used as an additional input (known as a "salt" in cryptography).
117-
These should be non-zero. Defaults to `None` (in that
118-
case, the FarmHash64 hash function is used). It also supports
119-
tuple/list of 2 unsigned integer numbers, see reference paper for
120-
details.
121-
output_mode: Specification for the output of the layer. Defaults to
122-
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, or
117+
These should be non-zero. If `None`, uses the FarmHash64 hash function.
118+
It also supports tuple/list of 2 unsigned integer numbers, see
119+
reference paper for details. Defaults to `None`.
120+
output_mode: Specification for the output of the layer. Values can bes
121+
`"int"`, `"one_hot"`, `"multi_hot"`, or
123122
`"count"` configuring the layer as follows:
124123
- `"int"`: Return the integer bin indices directly.
125124
- `"one_hot"`: Encodes each individual element in the input into an
@@ -134,9 +133,10 @@ class Hashing(base_layer.Layer):
134133
will be `(..., num_tokens)`.
135134
- `"count"`: As `"multi_hot"`, but the int array contains a count of
136135
the number of times the bin index appeared in the sample.
136+
Defaults to `"int"`.
137137
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`,
138138
and `"count"` output modes. If True, returns a `SparseTensor` instead of
139-
a dense `Tensor`. Defaults to False.
139+
a dense `Tensor`. Defaults to `False`.
140140
**kwargs: Keyword arguments to construct a layer.
141141
142142
Input shape:

keras/layers/preprocessing/image_preprocessing.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@ class Resizing(base_layer.Layer):
6565
height: Integer, the height of the output shape.
6666
width: Integer, the width of the output shape.
6767
interpolation: String, the interpolation method.
68-
Defaults to `"bilinear"`.
6968
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
7069
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
70+
Defaults to `"bilinear"`.
7171
crop_to_aspect_ratio: If True, resize the images without aspect
7272
ratio distortion. When the original aspect ratio differs
7373
from the target aspect ratio, the output image will be
@@ -420,9 +420,9 @@ class RandomFlip(base_layer.BaseRandomLayer):
420420
421421
Args:
422422
mode: String indicating which flip mode to use. Can be `"horizontal"`,
423-
`"vertical"`, or `"horizontal_and_vertical"`. Defaults to
424-
`"horizontal_and_vertical"`. `"horizontal"` is a left-right flip and
425-
`"vertical"` is a top-bottom flip.
423+
`"vertical"`, or `"horizontal_and_vertical"`. `"horizontal"` is a
424+
left-right flip and `"vertical"` is a top-bottom flip. Defaults to
425+
`"horizontal_and_vertical"`
426426
seed: Integer. Used to create a random seed.
427427
"""
428428

@@ -1055,9 +1055,9 @@ class RandomZoom(base_layer.BaseRandomLayer):
10551055
result in an output
10561056
zooming out between 20% to 30%.
10571057
`width_factor=(-0.3, -0.2)` result in an
1058-
output zooming in between 20% to 30%. Defaults to `None`,
1058+
output zooming in between 20% to 30%. `None` means
10591059
i.e., zooming vertical and horizontal directions
1060-
by preserving the aspect ratio.
1060+
by preserving the aspect ratio. Defaults to `None`.
10611061
fill_mode: Points outside the boundaries of the input are
10621062
filled according to the given mode
10631063
(one of `{"constant", "reflect", "wrap", "nearest"}`).
@@ -1377,9 +1377,9 @@ class RandomBrightness(base_layer.BaseRandomLayer):
13771377
will be used for upper bound.
13781378
value_range: Optional list/tuple of 2 floats
13791379
for the lower and upper limit
1380-
of the values of the input data. Defaults to [0.0, 255.0].
1381-
Can be changed to e.g. [0.0, 1.0] if the image input
1382-
has been scaled before this layer.
1380+
of the values of the input data.
1381+
To make no change, use [0.0, 1.0], e.g., if the image input
1382+
has been scaled before this layer. Defaults to [0.0, 255.0].
13831383
The brightness adjustment will be scaled to this range, and the
13841384
output values will be clipped to this range.
13851385
seed: optional integer, for fixed RNG behavior.
@@ -1539,9 +1539,9 @@ class RandomHeight(base_layer.BaseRandomLayer):
15391539
`factor=0.2` results in an output with
15401540
height changed by a random amount in the range `[-20%, +20%]`.
15411541
interpolation: String, the interpolation method.
1542-
Defaults to `"bilinear"`.
15431542
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
15441543
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
1544+
Defaults to `"bilinear"`.
15451545
seed: Integer. Used to create a random seed.
15461546
15471547
Input shape:
@@ -1661,9 +1661,9 @@ class RandomWidth(base_layer.BaseRandomLayer):
16611661
`factor=0.2` results in an output with width changed
16621662
by a random amount in the range `[-20%, +20%]`.
16631663
interpolation: String, the interpolation method.
1664-
Defaults to `bilinear`.
16651664
Supports `"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,
16661665
`"lanczos3"`, `"lanczos5"`, `"gaussian"`, `"mitchellcubic"`.
1666+
Defaults to `bilinear`.
16671667
seed: Integer. Used to create a random seed.
16681668
16691669
Input shape:

keras/layers/preprocessing/index_lookup.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -134,10 +134,10 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
134134
`"tf_idf"`, this argument must be supplied.
135135
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
136136
map indices to vocabulary items instead of mapping vocabulary items to
137-
indices. Default to False.
138-
output_mode: Specification for the output of the layer. Defaults to
139-
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
140-
or `"tf_idf"` configuring the layer as follows:
137+
indices. Defaults to `False`.
138+
output_mode: Specification for the output of the layer. Values can be
139+
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
140+
configuring the layer as follows:
141141
- `"int"`: Return the raw integer indices of the input tokens.
142142
- `"one_hot"`: Encodes each individual element in the input into an
143143
array the same size as the vocabulary, containing a 1 at the element
@@ -153,6 +153,7 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
153153
the number of times the token at that index appeared in the sample.
154154
- `"tf_idf"`: As `"multi_hot"`, but the TF-IDF algorithm is applied to
155155
find the value in each token slot.
156+
Defaults to `"int"`.
156157
pad_to_max_tokens: Only valid when `output_mode` is `"multi_hot"`,
157158
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
158159
padded to `max_tokens` even if the number of unique tokens in the
@@ -161,7 +162,7 @@ class IndexLookup(base_preprocessing_layer.PreprocessingLayer):
161162
False.
162163
sparse: Boolean. Only applicable to `"one_hot"`, `"multi_hot"`, `"count"`
163164
and `"tf-idf"` output modes. If True, returns a `SparseTensor` instead
164-
of a dense `Tensor`. Defaults to False.
165+
of a dense `Tensor`. Defaults to `False`.
165166
"""
166167

167168
def __init__(

keras/layers/preprocessing/integer_lookup.py

+10-9
Original file line numberDiff line numberDiff line change
@@ -71,18 +71,18 @@ class IntegerLookup(index_lookup.IndexLookup):
7171
only be specified when adapting the vocabulary or when setting
7272
`pad_to_max_tokens=True`. If None, there is no cap on the size of the
7373
vocabulary. Note that this size includes the OOV and mask tokens.
74-
Defaults to None.
74+
Defaults to `None`.
7575
num_oov_indices: The number of out-of-vocabulary tokens to use. If this
7676
value is more than 1, OOV inputs are modulated to determine their OOV
7777
value. If this value is 0, OOV inputs will cause an error when calling
78-
the layer. Defaults to 1.
78+
the layer. Defaults to `1`.
7979
mask_token: An integer token that represents masked inputs. When
8080
`output_mode` is `"int"`, the token is included in vocabulary and mapped
8181
to index 0. In other output modes, the token will not appear in the
8282
vocabulary and instances of the mask token in the input will be dropped.
83-
If set to None, no mask term will be added. Defaults to None.
83+
If set to None, no mask term will be added. Defaults to `None`.
8484
oov_token: Only used when `invert` is True. The token to return for OOV
85-
indices. Defaults to -1.
85+
indices. Defaults to `-1`.
8686
vocabulary: Optional. Either an array of integers or a string path to a
8787
text file. If passing an array, can pass a tuple, list, 1D numpy array,
8888
or 1D tensor containing the integer vocbulary terms. If passing a file
@@ -98,10 +98,10 @@ class IntegerLookup(index_lookup.IndexLookup):
9898
`"tf_idf"`, this argument must be supplied.
9999
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
100100
map indices to vocabulary items instead of mapping vocabulary items to
101-
indices. Default to False.
102-
output_mode: Specification for the output of the layer. Defaults to
103-
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
104-
or `"tf_idf"` configuring the layer as follows:
101+
indices. Defaults to `False`.
102+
output_mode: Specification for the output of the layer. Values can be
103+
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
104+
configuring the layer as follows:
105105
- `"int"`: Return the vocabulary indices of the input tokens.
106106
- `"one_hot"`: Encodes each individual element in the input into an
107107
array the same size as the vocabulary, containing a 1 at the element
@@ -119,6 +119,7 @@ class IntegerLookup(index_lookup.IndexLookup):
119119
find the value in each token slot.
120120
For `"int"` output, any shape of input and output is supported. For all
121121
other output modes, currently only output up to rank 2 is supported.
122+
Defaults to `"int"`.
122123
pad_to_max_tokens: Only applicable when `output_mode` is `"multi_hot"`,
123124
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
124125
padded to `max_tokens` even if the number of unique tokens in the
@@ -127,7 +128,7 @@ class IntegerLookup(index_lookup.IndexLookup):
127128
False.
128129
sparse: Boolean. Only applicable when `output_mode` is `"multi_hot"`,
129130
`"count"`, or `"tf_idf"`. If True, returns a `SparseTensor` instead of a
130-
dense `Tensor`. Defaults to False.
131+
dense `Tensor`. Defaults to `False`.
131132
132133
Examples:
133134

keras/layers/preprocessing/normalization.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -52,11 +52,12 @@ class Normalization(base_preprocessing_layer.PreprocessingLayer):
5252
example, if shape is `(None, 5)` and `axis=1`, the layer will track 5
5353
separate mean and variance values for the last axis. If `axis` is set
5454
to `None`, the layer will normalize all elements in the input by a
55-
scalar mean and variance. Defaults to -1, where the last axis of the
55+
scalar mean and variance. When `-1` the last axis of the
5656
input is assumed to be a feature dimension and is normalized per
5757
index. Note that in the specific case of batched scalar inputs where
5858
the only axis is the batch axis, the default will normalize each index
5959
in the batch separately. In this case, consider passing `axis=None`.
60+
Defaults to `-1`.
6061
mean: The mean value(s) to use during normalization. The passed value(s)
6162
will be broadcast to the shape of the kept axes above; if the value(s)
6263
cannot be broadcast, an error will be raised when this layer's

keras/layers/preprocessing/string_lookup.py

+8-7
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,11 @@ class StringLookup(index_lookup.IndexLookup):
6868
only be specified when adapting the vocabulary or when setting
6969
`pad_to_max_tokens=True`. If None, there is no cap on the size of the
7070
vocabulary. Note that this size includes the OOV and mask tokens.
71-
Defaults to None.
71+
Defaults to `None`.
7272
num_oov_indices: The number of out-of-vocabulary tokens to use. If this
7373
value is more than 1, OOV inputs are hashed to determine their OOV
7474
value. If this value is 0, OOV inputs will cause an error when calling
75-
the layer. Defaults to 1.
75+
the layer. Defaults to `1`.
7676
mask_token: A token that represents masked inputs. When `output_mode` is
7777
`"int"`, the token is included in vocabulary and mapped to index 0. In
7878
other output modes, the token will not appear in the vocabulary and
@@ -93,10 +93,10 @@ class StringLookup(index_lookup.IndexLookup):
9393
`"tf_idf"`, this argument must be supplied.
9494
invert: Only valid when `output_mode` is `"int"`. If True, this layer will
9595
map indices to vocabulary items instead of mapping vocabulary items to
96-
indices. Default to False.
97-
output_mode: Specification for the output of the layer. Defaults to
98-
`"int"`. Values can be `"int"`, `"one_hot"`, `"multi_hot"`, `"count"`,
99-
or `"tf_idf"` configuring the layer as follows:
96+
indices. Defaults to `False`.
97+
output_mode: Specification for the output of the layer. Values can be
98+
`"int"`, `"one_hot"`, `"multi_hot"`, `"count"`, or `"tf_idf"`
99+
configuring the layer as follows:
100100
- `"int"`: Return the raw integer indices of the input tokens.
101101
- `"one_hot"`: Encodes each individual element in the input into an
102102
array the same size as the vocabulary, containing a 1 at the element
@@ -114,6 +114,7 @@ class StringLookup(index_lookup.IndexLookup):
114114
find the value in each token slot.
115115
For `"int"` output, any shape of input and output is supported. For all
116116
other output modes, currently only output up to rank 2 is supported.
117+
Defaults to `"int"`
117118
pad_to_max_tokens: Only applicable when `output_mode` is `"multi_hot"`,
118119
`"count"`, or `"tf_idf"`. If True, the output will have its feature axis
119120
padded to `max_tokens` even if the number of unique tokens in the
@@ -122,7 +123,7 @@ class StringLookup(index_lookup.IndexLookup):
122123
False.
123124
sparse: Boolean. Only applicable when `output_mode` is `"multi_hot"`,
124125
`"count"`, or `"tf_idf"`. If True, returns a `SparseTensor` instead of a
125-
dense `Tensor`. Defaults to False.
126+
dense `Tensor`. Defaults to `False`.
126127
encoding: Optional. The text encoding to use to interpret the input
127128
strings. Defaults to `"utf-8"`.
128129

0 commit comments

Comments
 (0)