-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Description
如题:相应的paddle脚本本地运行通过,MPI集群报错, job 链接为
http://10.86.102.41:8900/fileview.html?path=/home/disk1/normandy/maybach/329760/
相应的脚本见附件
具体错误如下:
r>:+ python27-gcc482/bin/python conf/trainer_config.conf
Fri Aug 18 12:44:11 2017[1,93]:Thread [140666704901888] Forwarding fc_layer_4,
Fri Aug 18 12:44:11 2017[1,93]:*** Aborted at 1503031451 (unix time) try "date -d @1503031451" if you are using GNU date ***
Fri Aug 18 12:44:11 2017[1,93]:PC: @ 0x0 (unknown)
Fri Aug 18 12:44:11 2017[1,93]:*** SIGFPE (@0x7fef7e79ba90) received by PID 2986 (TID 0x7fef84fa3700) from PID 2121906832; stack trace: ***
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef84b7a160 (unknown)
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef7e79ba90 paddle::CpuMatrix::mul<>()
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef7e797ed3 paddle::CpuMatrix::mul()
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef7e5ae59a paddle::FullyConnectedLayer::forward()
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef7e61c562 paddle::NeuralNetwork::forward()
Fri Aug 18 12:44:11 2017[1,93]: @ 0x7fef7e6104c3 paddle::GradientMachine::forwardBackward()
Fri Aug 18 12:44:12 2017[1,93]: @ 0x7fef7e8381c4 GradientMachine::forwardBackward()
Fri Aug 18 12:44:12 2017[1,93]: @ 0x7fef7e4e51d9 _wrap_GradientMachine_forwardBackward
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b4cb9 PyEval_EvalFrameEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6b28 PyEval_EvalCodeEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b5d10 PyEval_EvalFrameEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6b28 PyEval_EvalCodeEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b5d10 PyEval_EvalFrameEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6b28 PyEval_EvalCodeEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b5d10 PyEval_EvalFrameEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6b28 PyEval_EvalCodeEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b5d10 PyEval_EvalFrameEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6b28 PyEval_EvalCodeEx
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4b6c52 PyEval_EvalCode
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4e1c7d PyRun_FileExFlags
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4e3501 PyRun_SimpleFileExFlags
Fri Aug 18 12:44:12 2017[1,93]: @ 0x4159dd Py_Main
Fri Aug 18 12:44:12 2017[1,93]: @ 0x7fef840d4bd5 __libc_start_main
Fri Aug 18 12:44:12 2017[1,93]: @ 0x414b71 (unknown)
Fri Aug 18 12:44:12 2017[1,93]: @ 0x0 (unknown)
Fri Aug 18 12:44:13 2017[1,93]:./train.sh: line 239: 2986 Floating point exceptionpython27-gcc482/bin/python conf/trainer_config.conf
Fri Aug 18 12:44:13 2017[1,93]:+ '[' 136 -ne 0 ']'
Fri Aug 18 12:44:13 2017[1,93]:+ kill_pserver2_exit
Fri Aug 18 12:44:13 2017[1,93]:+ grep -v grep
Fri Aug 18 12:44:13 2017[1,93]:+ xargs kill -9