-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Description
Project
https://github.com/PaddlePaddle/Paddle/projects/56
Tasks
-
Add distributed lookup table design(with Abacus) Add distributed lookup table design #9075
-
detailed design doc for lookup remote table(Fluid) Add design doc for lookup remote table in Fluid #9068
-
support empty tensor support empty tensor #9338
-
Operators
prefetch_opget value frompserverbyidsand output an SelectedRows as parameter forlookup_table_op. @jacquesqiao use split_ids_op -> prefetch_op -> concat_op to compose a prefetch_op.- add split_ids_op op. add split ids op #9370
- prefetch_op add prefetch_op #9495
- concat_op should support concat SelectedRows: Can use
sum_op - add a RPC interface PrefetchVariable in send_recv.proto for remote table lookup. Add prefetch interface on gRPC server side #9524
- use new PrefetchVariable to support remote_table_lookup. Improve prefetch on server #9555
- pserver should use the new interface to serve the remote table lookup. In the future, it should read parameter from hdfs. run prefetch prog on server #9593
trainer: split_id -> send_id_to_pserver -> recv_result_from_pserver -> concat_result pserver: recv_from_trainer -> lookup_table -> send_back_to_trainer-
lookup_table_op, this op should take parameter(SelectedRows) fromprefetch_op. when use prefetch, we should remove the initialize_op for it's parameter W. Lookup table support selected rows as parameter #9575 -
sgd_opupdate the gradient(SelectedRows) to table parameter(SelectedRows) Sgd support update selected rows #9597 -
distribute_table_initialize_opshould initialize a shard of SelectedRows on ParameterServer by shard_id. In the future, it may need to read parameter from a distributed_fils_system. Initialize large table value randomly #9787
-
Sparse Table
Support auto-grown sparse table, support lookup nonexistent key- Refine SelectedRows to support an auto-grown sparse table, Need a new class to represent some certain interface of Table used in lookup remote table #9841
- lookup sparse table operator, lookup_sparse_table op to lookup from the sparse table #10046
-
Transpilers Dist transpiler support prefetch #9714
- the distributed transpiler should:
- replace
lookup_table_opwithsplit_ids_op -> prefetch_op -> concat_op - add
split_ids_op -> send_vars_opto split table@grad and send them to pserver. - insert
table_optimize_block[sum(splited_grad) -> sgd_op]to pserver_program.
- replace
- the distributed transpiler should:
Problems with the current design
problem: all prefetch input and output vars must share the same variables because there is only one prefetch thread block and prefetch op on pserver, it has to take one input and output. So thesplite_ids_op -> prefetch_op-> concat_opset must be executed one by one and cannot be execute parallelly. There are many code in dist transpiler to insert and delete ops
sulotion: A better solution maybe that we have only one prefetch_op and prefetch_grad_op, it does not depend on Variable but use some internal data structure to communicate with pserver.
Metadata
Metadata
Labels
No labels