Quest Troubleshooting: Checking the job output file

This page demonstrates an example of looking problems in a job output files when a job fails.

When a job fails, the first place to look is the output/error file for your job. Unless you explicitly directed it elsewhere, it will be in the directory from which you submitted your job. Even if you directed output from your script/program to another location, there is still an output file with information about the job itself. By default, the output file is named <jobname>.o<jobID>. If you haven't joined the output and error files, then there may also be an error file <jobname>.o<jobID>. You can use the cat command to print the contents to the terminal, or open the file in your preferred text editor.

If the job exit value (listed at the bottom of the job information output) is anything other than 0, the scheduler thinks something went wrong with the job.

Here is an example of the error/output file for a job where the job submission script referenced a file that wasn't found. The relevant notice is in the second to last section: line 20: myprog.do: No such file or directory. The line number refers to lines in the job submission script. A file not found can happen if you don't set or change the working directory to the appropriate location in your script, or if you have a typo in the path or filename. The Job exit value: 1 indicates there was a problem with content of the job.

[<netid>@quser10 ~]$ cat testjob3.o19907815
----------------------------------------
PBS: Begin PBS Prologue Thu May 11 09:45:49 CDT 2017 1494513949
PBS: Job ID:		19907815.qsched03.quest.it.northwestern.edu
PBS: Username:		<netid>
PBS: Group:		<netid>
PBS: Executing queue:     short
PBS: Job name:		<jobname>
PBS: Account:		<allocationID>
----------------------------------------
   The following variables are not
   guaranteed to be the same in 
   prologue and the job run script  
----------------------------------------
PBS: Temporary Dir($TMPDIR):	/tmp/19907815.qsched03.quest.it.northwestern.edu
PBS: Master Node($PBS_MSHOST):		qnode5056
PBS: node file($PBS_NODEFILE):  /hpc/opt/torque6/nodes/qnode5056/aux//19907815.qsched03.quest.it.northwestern.edu
PBS: PATH (in prologue) : /bin:/usr/bin
PBS: WORKDIR ($PBS_O_WORKDIR) is:  /home/<netid>
----------------------------------------
PBS: End PBS Prologue Thu May 11 09:45:49 CDT 2017 1494513949
/hpc/opt/torque6/nodes/qnode5056/mom_priv/jobs/19907815.qsched03.quest.it.northwestern.edu.SC: 
line 20: myprog.do: No such file or directory
----------------------------------------
PBS job ended
Begin PBS Epilogue Thu May 11 09:45:54 CDT 2017 1494513954
JobID: 19907815.qsched03.quest.it.northwestern.edu
Session ID:			31155
Resources Used:			cput=00:00:00,energy_used=0,mem=0kb,vmem=0kb,walltime=00:00:00
Job exit value:			1
----------------------------------------

See Also:




Keywords:research computing, troubleshooting,   Doc ID:78619
Owner:Research Computing .Group:Northwestern
Created:2017-12-07 16:21 CSTUpdated:2017-12-07 16:29 CST
Sites:Northwestern
Feedback:  0   1