Hi All,
I got an error state named E on some of the nodes running sun grid engine. The error message is getting while I issue the command
qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
m1.large@machine1.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
m1.large@machine2.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
m1.large@machine3.com BIP 0/0/1 0.05 lx26-amd64 E
---------------------------------------------------------------------------------
m1.large@machine4.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
As shown above you can see that the machine3 is in E state. On the sungrid engine documentation E state is an error state.
You will get the exact reason for this error state by issuing the command
qstat -f -explain E
The E error state will not clear till the node is rebooted or restarting the grid engine.
I got an error state named E on some of the nodes running sun grid engine. The error message is getting while I issue the command
qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
m1.large@machine1.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
m1.large@machine2.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
m1.large@machine3.com BIP 0/0/1 0.05 lx26-amd64 E
---------------------------------------------------------------------------------
m1.large@machine4.com BIP 0/0/1 -NA- lx26-amd64 au
---------------------------------------------------------------------------------
As shown above you can see that the machine3 is in E state. On the sungrid engine documentation E state is an error state.
You will get the exact reason for this error state by issuing the command
qstat -f -explain E
The E error state will not clear till the node is rebooted or restarting the grid engine.
Because of this error, no job will hook in to that node and job will be in queue wait state.
Once you resolved the issues related to that node, you can issue the command
# qmod -c '*'
This will clear the E state of the node and now job will hook into it.
Cheers
Syamkumar.M
No comments:
Post a Comment