Dmitry.AI

How to suspend / resume the `model_main.py` based training in TensorFlow?

training
tensorflow
(Dmitry Fedyuk) #1

Step 1

Stop the model_main.py process by CTRL+C.

Step 2

Go to model_dir and rename the files of the last checkpoint:

  • model.ckpt-<step>.meta => model.ckpt.meta
  • model.ckpt-<step>.index => model.ckpt.index
  • model.ckpt-<step>.data-00000-of-00001 => model.ckpt.data-00000-of-00001

Step 3

Set the fine_tune_checkpoint parameter to the directory where your last checkpoint is located (e.g., model_dir).

from_detection_checkpoint should be set to true:

Step 4

Run the model_main.py process again with a new model_dir: