-
Notifications
You must be signed in to change notification settings - Fork 214
Add Regularizers 1 #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Regularizers 1 #216
Conversation
Sync with master tensorflow on upstream
Merge main branch to local branch
Update after losses merge
Fix Javadoc errors (tensorflow#152)
pull type def
Metrics Phase 1 (tensorflow#180)
Pull latest tensorflow master
Make sure Tests test TFloat32 1nd TFloat64
*/ | ||
public class L1L2<R extends TNumber> extends Regularizer<R> { | ||
|
||
private final Float l1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for storing these as Float
instead of float
? So far, we carefully set them to 0.0
instead of null
, and then also carefully treat null
as though it were 0.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, L1L2
, besides standing on its own, is the super class for L1_L2
, L1
, and L2
. We still need the Float for the constructor because null is a valid value for either l1
, or l2
, depending in the subclass.
However, I will change the internal representation to float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, maybe we don't need the Float
. Just change the ctors for L1
and L2
to pass 0
for the unused calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done
|
||
if (this.getL2() != null && this.getL2() != 0.f) { | ||
Operand<R> l2Op = tf.dtypes.cast(tf.constant(this.getL2()), input.type()); | ||
Operand<R> sqr = tf.math.abs(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abs
-> square
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done
* | ||
* <pre>loss = l2 * reduceSum(square(x))</pre> | ||
* | ||
* <p>The difference between this class and the {@link L1_L2} is use of the default regularization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default of no penalty doesn't seem useful. I wonder if L1_L2
was created later in Python to provide useful defaults? Is it possible we should omit L1L2
on the Java side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like l1_l2
was the after thought in Keras, as it is defined as a method. I am ok with adding the l1_l2
defaults to L1L2
and perhaps add a static method to L1L2
to create an L1L2 class with the current l1_l2
defaults.
Perhaps something like:
public static L1L2 l1_l2() {
return new L1L2(DEFAULT_REGULARIZATION_PENALTY, DEFAULT_REGULARIZATION_PENALTY);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That general approach seems fine to me. I do like the L1L2
class name better. The method name l1_l2
is off-Java-naming-standard and is descriptive only by reference to Python Keras. Perhaps instead overload the name create
and use default values for the variant with no loss factors provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create
or createDefault
? Alternatively, I could change the constructors that do not specify the l1, l2 values,
public L1L2(Ops tf) {
this(tf, DEFAULT_REGULARIZATION_PENALTY, DEFAULT_REGULARIZATION_PENALTY);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean toward changing the no-specified-penalty constructors to use the defaults instead of 0.f
. But if factory methods felt more comfortable to you, and if you felt that fit some overall direction of the code base, I could be comfortable with that.
Yeah, I started down the createDefault
path, but realized the idiomatic pattern is just overloaded create
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am changing L1L2(Ops tf)
to use t DEFAULT_REGULARIZATION_PENALTY
and eliminating class L1_L2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done
Operand<TFloat32> weights = tf.constant(w); | ||
Operand<TFloat32> result = instance.call(weights); | ||
float expected = regularizeL1L2(w, 0.01f, 0.02f); | ||
session.setEpsilon(.09f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tolerance seems unreasonably high to me. I imagine it is why this test passes in spite of the use of abs
instead of square
to compute the L2 loss.
In python, if I've followed the code path correctly, they use:
def assertAllClose(self, a, b, rtol=1e-6, atol=1e-6, msg=None):
where
The relative difference (
rtol * abs(b)
) and the absolute differenceatol
are added together to compare against the absolute difference betweena
andb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a wide tolerance needed when calculating in Java vs calculation in TF. Not sure why, but seems to happen a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems concerning to me! For most (reasonably stable) computations, for the first N significant digits, there's only one right answer. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but I have seen a wide difference in the number of significant digits in other test cases when I use Java to determine the expected result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Ok, seems this is a broader issue, not part of this PR. I'll raise an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do notice that if I use NumPy
to do the test calculation beforehand, rather than use Java to do the calculation at runtime, things are closer to TF. I am not sure if my calculations don't exactly match NumPy
behavior or not. I have found subtle differences between Java and Python behavior in other areas when it comes to Math which may be part of the problem. Negative 0
being one of them.
* | ||
* @param <R> the data type of the operands and result | ||
*/ | ||
public abstract class Regularizer<R extends TNumber> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a clear motivation for giving Regularizer
a type parameter? For today's existing regularizers, L1 and L2, it would be sufficient to put a type parameter on call
. Is there reason to think the Regularizer
instance itself needs to be specialized to a certain tensor type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done
if (keepDims) { | ||
long[] newDims = shape.asArray(); | ||
newDims[axis] = 1; | ||
final AtomicInteger counter = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the AtomicInteger
here? It is local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is referenced in the lambda on lines 784-789, you can't use int i
because it is not final, and if it were final you couldn't increment it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Oh, right!
int xis = nDims - 1 - axis; | ||
long totalSize = shape.size(); | ||
long axisSize = shape.size(xis); | ||
final double[] sums = new double[(int) axisSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that sums
is one-dimensional, maybe this method only works for 2d arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ND is a quick and dirty replacement for NumPy
just used in testing. If we ever do a cheap version of NumPy
, then this class can be replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move ND
into src/test
to avoid setting expectations we don't intend to meet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my fork, it is in the test directory ??? test/java/org/tensorflow/framework/utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's where it is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename this method sum2D
and validate a
accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this goes to the broader issue, are we going to fully implement a NumPy
Replacement or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Fine -- it's a test class
.elements(newDims.length - 1) | ||
.forEachIndexed( | ||
(idx, v) -> { | ||
v.setDouble(sums[counter.getAndAdd(1)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forEachIndexed
does not guarantee an iteration order, so I don't think keeping a parallel counter works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work, I suppose you could take the axis index out of idx
, and use that. Again this class is just used for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Fine -- it's a test class
} | ||
|
||
/** | ||
* Sum all elements of an array based on the specified axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps be more specific by saying "along" or "over" instead of "based on"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Fine -- it's a test class
DoubleNdArray result = sum(a); | ||
if (keepDims) { | ||
double scalar = result.getDouble(0); | ||
long[] dims = {1, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only supports 2d arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By design right now, I just did enough for testing.
A NumPy
replacement based on ndarray
, would be a nice feature to have if you want to explore more.
This was brought up at yesterday's meeting. TensorFlow SIG JVM Notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename this method sum2D
and validate a
accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Fine -- it's a test class
Removed generic from Regularizer class and changed the call method to define the generic return based on the weights parameter. Added static method l1_l2() to L1L2 class. Fixed JavaDoc comments.
modified Float to float for l1 and l2 parameters Change ctor L1L2(Ops tf) to use DEFAULT_REGULARIZATION_PENALTY for l1/l2 parameters Fix JavaDoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few open questions, but I don't see any of those as essential for this PR.
Merge with latest
Resync with origin/master
Sync with tensorflow/java master
Sync with Metrics Phase 2
@JimClarke5 , can you please rebase that PR so we can validate that it now passes the quick build successfully? |
So I reran the same job and it is still failing. So after your rebase, we'll see if the compilation passes and merge it right away if that's the case, thanks |
@JimClarke5 , just a small reminder, if you can find a few minutes to rebase/resubmit that PR so I can merge it, thanks! |
Sync with master
@karllessard I just pushed the rebased versions based on the latest Master. It seemed to push a lot of core files, so if you don't want that, I will create a clean branch of Regularizers and create a new PR. |
@JimClarke5 , I think the rebase went fine, only files of the framework are part of the PR. On the other hand, compilation is failing with the following errors, can you please check?
|
@karllessard weird because I had fixed that error and compiled on my side. I will re look at it. |
Make sure Tests test TFloat32 1nd TFloat64
Removed generic from Regularizer class and changed the call method to define the generic return based on the weights parameter. Added static method l1_l2() to L1L2 class. Fixed JavaDoc comments.
modified Float to float for l1 and l2 parameters Change ctor L1L2(Ops tf) to use DEFAULT_REGULARIZATION_PENALTY for l1/l2 parameters Fix JavaDoc
# Conflicts: # tensorflow-framework/src/main/java/org/tensorflow/framework/regularizers/RegularizerLoss.java
@karllessard totally weird because my local copy was fixed but the upstream copy was not. IntelliJ did not indicate that it needed to be committed. I fooled around with it by cleaning up some code and repushed and now the upstream copy is the same as the local copy. Try it now. |
Hey @JimClarke5 , don’t know what happened neither but rerunning the build make it pass this time. I’m merging, thanks again for your contribution! |
Oh ok, just noticed that reply of yours, that explains it ;) |
Add Regularizers.
Regularizers are classes that allow you to apply penalties on layer parameters or layer activity during
optimization. These penalties are summed into the loss function that the network optimizes.