Skip to content

Conversation

@Yancey0623
Copy link
Contributor

Fixed #8996


### Sparse Update

For an embedding layer, the gradient maybe be very sparse(upper 90% is zero) for each mini-batch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a space between sparse and (.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper => up to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gradient maybe be very sparse => the gradient may have many rows containing only 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

### Sparse Update

For an embedding layer, the gradient maybe be very sparse(upper 90% is zero) for each mini-batch.
Fluid use [SelectedRows](../selected_rows.md) to support the sparse variable. Distributed training support `Sparse Update`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sparse variable => sparse variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support => supports

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributed training support Sparse Update, which sends a SelectedRows variable to the parameter server to run parameter updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

It would save a lot of bandwidth and make the distributed training job have better performance.
For embedding layers, the gradient may have many rows containing only 0 when training,
if the gradient use a dense tensor to do parameter optimization,
it could spend unnessesary memory, slow down the calculations and waste
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnessesary => unnecessary

### Sparse Update

For embedding layers, the gradient may have many rows containing only 0 when training,
if the gradient use a dense tensor to do parameter optimization,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use -> uses.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++

@Yancey0623 Yancey0623 merged commit 2cc2fb4 into PaddlePaddle:develop Mar 13, 2018
@Yancey0623 Yancey0623 deleted the sparse_update_doc branch March 13, 2018 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants