Open
Description
keras: 3.0.5
tensorflow: 2.15.0
There seems some conflict to use keras 3 in tpu-vm. Kaggle/docker-python#1370 (comment)
import tensorflow as tf
import keras
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="local")
strategy = tf.distribute.TPUStrategy(tpu)
with strategy.scope():
# Construct and compile an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
# Just use `fit` as usual
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
model.fit(x, y, epochs=3)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1712289536.759567 13 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Epoch 1/3
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
Cell In[6], line 11
9 x = np.random.random((1000, 32))
10 y = np.random.random((1000, 1))
---> 11 model.fit(x, y, epochs=3)
File /usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:123, in filter_traceback.<locals>.error_handler(*args, **kwargs)
120 filtered_tb = _process_traceback_frames(e.__traceback__)
121 # To get the full stack trace, call:
122 # `keras.config.disable_traceback_filtering()`
--> 123 raise e.with_traceback(filtered_tb) from None
124 finally:
125 del filtered_tb
File /usr/local/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
51 try:
52 ctx.ensure_initialized()
---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
54 inputs, attrs, num_outputs)
55 except core._NotOkStatusException as e:
56 if name is not None:
NotFoundError: Graph execution error:
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
9 root error(s) found.
(0) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
(1) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
(2) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
(3) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
(4) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
(5) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
(6) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
(7) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_236]]
(8) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_236]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_303]]
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_iterator_2865]