Skip to content

Add Regularizers 1 #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 2, 2021
Merged

Conversation

JimClarke5
Copy link
Contributor

@JimClarke5 JimClarke5 commented Feb 11, 2021

Add Regularizers.

Regularizers are classes that allow you to apply penalties on layer parameters or layer activity during
optimization. These penalties are summed into the loss function that the network optimizes.

*/
public class L1L2<R extends TNumber> extends Regularizer<R> {

private final Float l1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for storing these as Float instead of float? So far, we carefully set them to 0.0 instead of null, and then also carefully treat null as though it were 0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, L1L2, besides standing on its own, is the super class for L1_L2, L1, and L2. We still need the Float for the constructor because null is a valid value for either l1, or l2, depending in the subclass.
However, I will change the internal representation to float.

Copy link
Contributor Author

@JimClarke5 JimClarke5 Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, maybe we don't need the Float. Just change the ctors for L1 and L2 to pass 0 for the unused calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done


if (this.getL2() != null && this.getL2() != 0.f) {
Operand<R> l2Op = tf.dtypes.cast(tf.constant(this.getL2()), input.type());
Operand<R> sqr = tf.math.abs(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abs -> square

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done

*
* <pre>loss = l2 * reduceSum(square(x))</pre>
*
* <p>The difference between this class and the {@link L1_L2} is use of the default regularization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of no penalty doesn't seem useful. I wonder if L1_L2 was created later in Python to provide useful defaults? Is it possible we should omit L1L2 on the Java side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like l1_l2 was the after thought in Keras, as it is defined as a method. I am ok with adding the l1_l2 defaults to L1L2 and perhaps add a static method to L1L2 to create an L1L2 class with the current l1_l2 defaults.

Perhaps something like:

public static L1L2    l1_l2() {
    return new L1L2(DEFAULT_REGULARIZATION_PENALTY, DEFAULT_REGULARIZATION_PENALTY);
}

Copy link
Contributor

@deansher deansher Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That general approach seems fine to me. I do like the L1L2 class name better. The method name l1_l2 is off-Java-naming-standard and is descriptive only by reference to Python Keras. Perhaps instead overload the name create and use default values for the variant with no loss factors provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create or createDefault? Alternatively, I could change the constructors that do not specify the l1, l2 values,

 public L1L2(Ops tf) {
    this(tf, DEFAULT_REGULARIZATION_PENALTY, DEFAULT_REGULARIZATION_PENALTY);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean toward changing the no-specified-penalty constructors to use the defaults instead of 0.f. But if factory methods felt more comfortable to you, and if you felt that fit some overall direction of the code base, I could be comfortable with that.

Yeah, I started down the createDefault path, but realized the idiomatic pattern is just overloaded create methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am changing L1L2(Ops tf) to use t DEFAULT_REGULARIZATION_PENALTY and eliminating class L1_L2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done

Operand<TFloat32> weights = tf.constant(w);
Operand<TFloat32> result = instance.call(weights);
float expected = regularizeL1L2(w, 0.01f, 0.02f);
session.setEpsilon(.09f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tolerance seems unreasonably high to me. I imagine it is why this test passes in spite of the use of abs instead of square to compute the L2 loss.

In python, if I've followed the code path correctly, they use:

  def assertAllClose(self, a, b, rtol=1e-6, atol=1e-6, msg=None):

where

The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a wide tolerance needed when calculating in Java vs calculation in TF. Not sure why, but seems to happen a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems concerning to me! For most (reasonably stable) computations, for the first N significant digits, there's only one right answer. Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but I have seen a wide difference in the number of significant digits in other test cases when I use Java to determine the expected result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Ok, seems this is a broader issue, not part of this PR. I'll raise an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do notice that if I use NumPy to do the test calculation beforehand, rather than use Java to do the calculation at runtime, things are closer to TF. I am not sure if my calculations don't exactly match NumPy behavior or not. I have found subtle differences between Java and Python behavior in other areas when it comes to Math which may be part of the problem. Negative 0 being one of them.

*
* @param <R> the data type of the operands and result
*/
public abstract class Regularizer<R extends TNumber> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clear motivation for giving Regularizer a type parameter? For today's existing regularizers, L1 and L2, it would be sufficient to put a type parameter on call. Is there reason to think the Regularizer instance itself needs to be specialized to a certain tensor type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 done

if (keepDims) {
long[] newDims = shape.asArray();
newDims[axis] = 1;
final AtomicInteger counter = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the AtomicInteger here? It is local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is referenced in the lambda on lines 784-789, you can't use int i because it is not final, and if it were final you couldn't increment it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Oh, right!

int xis = nDims - 1 - axis;
long totalSize = shape.size();
long axisSize = shape.size(xis);
final double[] sums = new double[(int) axisSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that sums is one-dimensional, maybe this method only works for 2d arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ND is a quick and dirty replacement for NumPy just used in testing. If we ever do a cheap version of NumPy, then this class can be replaced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move ND into src/test to avoid setting expectations we don't intend to meet?

Copy link
Contributor Author

@JimClarke5 JimClarke5 Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my fork, it is in the test directory ??? test/java/org/tensorflow/framework/utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's where it is!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename this method sum2D and validate a accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this goes to the broader issue, are we going to fully implement a NumPy Replacement or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fine -- it's a test class

.elements(newDims.length - 1)
.forEachIndexed(
(idx, v) -> {
v.setDouble(sums[counter.getAndAdd(1)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forEachIndexed does not guarantee an iteration order, so I don't think keeping a parallel counter works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work, I suppose you could take the axis index out of idx, and use that. Again this class is just used for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fine -- it's a test class

}

/**
* Sum all elements of an array based on the specified axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps be more specific by saying "along" or "over" instead of "based on"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fine -- it's a test class

DoubleNdArray result = sum(a);
if (keepDims) {
double scalar = result.getDouble(0);
long[] dims = {1, 1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only supports 2d arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design right now, I just did enough for testing.

A NumPy replacement based on ndarray, would be a nice feature to have if you want to explore more.
This was brought up at yesterday's meeting. TensorFlow SIG JVM Notes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename this method sum2D and validate a accordingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fine -- it's a test class

Removed generic from Regularizer class and changed the call method to define the generic return based on the weights parameter.
Added static method l1_l2() to L1L2 class.
Fixed JavaDoc comments.
modified Float to float for l1 and l2 parameters
Change ctor L1L2(Ops tf) to use DEFAULT_REGULARIZATION_PENALTY for l1/l2 parameters
Fix JavaDoc
deansher
deansher previously approved these changes Feb 17, 2021
Copy link
Contributor

@deansher deansher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few open questions, but I don't see any of those as essential for this PR.

@karllessard
Copy link
Collaborator

@JimClarke5 , can you please rebase that PR so we can validate that it now passes the quick build successfully?

@karllessard
Copy link
Collaborator

So I reran the same job and it is still failing. So after your rebase, we'll see if the compilation passes and merge it right away if that's the case, thanks

@karllessard
Copy link
Collaborator

@JimClarke5 , just a small reminder, if you can find a few minutes to rebase/resubmit that PR so I can merge it, thanks!

@JimClarke5
Copy link
Contributor Author

@karllessard I just pushed the rebased versions based on the latest Master. It seemed to push a lot of core files, so if you don't want that, I will create a clean branch of Regularizers and create a new PR.

@karllessard
Copy link
Collaborator

@JimClarke5 , I think the rebase went fine, only files of the framework are part of the PR. On the other hand, compilation is failing with the following errors, can you please check?

Error:  COMPILATION ERROR : 
3468
[INFO] -------------------------------------------------------------
3469
Error:  /__w/java/java/tensorflow-framework/src/test/java/org/tensorflow/framework/utils/ND.java:[819,31] method sum(org.tensorflow.ndarray.DoubleNdArray) is already defined in class org.tensorflow.framework.utils.ND
3470
Error:  /__w/java/java/tensorflow-framework/src/test/java/org/tensorflow/framework/utils/ND.java:[832,31] method sum(org.tensorflow.ndarray.DoubleNdArray,int) is already defined in class org.tensorflow.framework.utils.ND
3471
Error:  /__w/java/java/tensorflow-framework/src/test/java/org/tensorflow/framework/utils/ND.java:[844,31] method sum(org.tensorflow.ndarray.DoubleNdArray,int,boolean) is already defined in class org.tensorflow.framework.utils.ND

@JimClarke5
Copy link
Contributor Author

@karllessard weird because I had fixed that error and compiled on my side. I will re look at it.

JimClarke5 added 7 commits May 2, 2021 08:33
Make sure Tests test TFloat32 1nd TFloat64
Removed generic from Regularizer class and changed the call method to define the generic return based on the weights parameter.
Added static method l1_l2() to L1L2 class.
Fixed JavaDoc comments.
modified Float to float for l1 and l2 parameters
Change ctor L1L2(Ops tf) to use DEFAULT_REGULARIZATION_PENALTY for l1/l2 parameters
Fix JavaDoc
# Conflicts:
#	tensorflow-framework/src/main/java/org/tensorflow/framework/regularizers/RegularizerLoss.java
@JimClarke5
Copy link
Contributor Author

@karllessard totally weird because my local copy was fixed but the upstream copy was not. IntelliJ did not indicate that it needed to be committed. I fooled around with it by cleaning up some code and repushed and now the upstream copy is the same as the local copy. Try it now.

@karllessard
Copy link
Collaborator

Hey @JimClarke5 , don’t know what happened neither but rerunning the build make it pass this time. I’m merging, thanks again for your contribution!

@karllessard karllessard merged commit e013353 into tensorflow:master May 2, 2021
@karllessard
Copy link
Collaborator

@karllessard totally weird because my local copy was fixed but the upstream copy was not. IntelliJ did not indicate that it needed to be committed. I fooled around with it by cleaning up some code and repushed and now the upstream copy is the same as the local copy. Try it now.

Oh ok, just noticed that reply of yours, that explains it ;)

@JimClarke5 JimClarke5 deleted the Regularizers_1 branch May 2, 2021 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants