I have implemented Stock Price Forecast Series with TensorFlow, but I coded it on a Mac and coded it on a PC with a GPU (TITAN X) rsync. I'm running after synchronizing with. However, when I run a program using the GPU about 200 times, the cuda (7.5) library gives an error and everything after that fails. When I got an error there, I manually restarted the server, but I used Fabric to restart it automatically.
First of all, here is the whole code.
fabfile.py
import time
from fabric.api import run, cd, prefix, env
from fabric.tasks import execute
# -When you log in with ssh by adding i.Path settings such as cuda set in bashrc are valid
env.shell = '/bin/bash -l -i -c'
#Server address
env.hosts = ['10.0.1.17']
#Login user name (no password because it is private key authentication)
env.user = 'akiraak'
#Access with ssh and execute commands
def run_jp():
#Move to the project directory
with cd('~/project/'):
#Enable virtualenv settings
with prefix('. ~/tensorflow-env/bin/activate'):
#Command execution
run('time python run_jp.py', warn_only=True)
#Reboot the server (make sure the passwordless sudo setting is enabled)
run('sudo reboot')
if __name__ == "__main__":
#Repeat command execution and restart on the server
while True:
try:
#ssh execution
execute(run_jp)
except Exception as e:
print(e)
#Wait for a while until the server restarts
time.sleep(60)
Fabric allows you to log in to the server with ssh and write command execution in python code.
Run this code from your Mac to control the server.
run_jp ()
is a function that executes a command after accessing ssh. Normally, the process is executed with the following command.
$ fab run_jp
This time, I want to select "Execute processing"-> "Restart server"-> "Execute processing", so instead of using the fab command, execute the python code normally.
$ python fabfile.py
Here, run_jp ()
is executed and the process of waiting for 60 seconds after restarting is performed.
if __name__ == "__main__":
#Repeat command execution and restart on the server
while True:
try:
#ssh execution
execute(run_jp)
except Exception as e:
print(e)
#Wait for a while until the server restarts
time.sleep(60)
In run_jp ()
that performs the actual processing, move to the directory where the code is located (~ / project /), enable the virtualenv setting (. ~ / Tensorflow-env / bin / activate), and then python run_jp I'm running .py
.
In python run_jp.py
, multiple processes are executed in another process, but as mentioned above, cuda gives an error in the middle, so some processes end in a failed state. So finally, do sudo reboot
. Then go back to ʻif name == "main": `, wait for a reboot in sleep, and then run the same command again.
#Access with ssh and execute commands
def run_jp():
#Move to the project directory
with cd('~/project/'):
#Enable virtualenv settings
with prefix('. ~/tensorflow-env/bin/activate'):
#Command execution
run('time python run_jp.py', warn_only=True)
#Reboot the server (make sure the passwordless sudo setting is enabled)
run('sudo reboot')