[torqueusers] job dieing immediately, 0 byte output file being produced
sabujp at gmail.com
Tue Feb 23 09:28:29 MST 2010
On Tue, Feb 23, 2010 at 10:03 AM, Garrick <garrick at usc.edu> wrote:
> Check syslog on the node?
Nothing showing any errors, the drives are not out of space on the
server or the node.
> If you want output, your batch script should print something.
Usually it'll print "this shell has no job control and some other
stuff, but none of that comes out". Just in case I did put echo
"PBS_JOBID = $PBS_JOBID" just before the cat /dev/urandom but still a
0 byte file.
Here's the server configuration:
# Create queues and set their attributes.
# Create and define queue csb
create queue csb
set queue csb queue_type = Execution
set queue csb max_running = 400
set queue csb resources_min.mem = 100mb
set queue csb resources_default.mem = 512mb
set queue csb resources_default.neednodes = csb
set queue csb resources_default.walltime = 01:00:00
set queue csb acl_group_enable = False
set queue csb acl_groups = sbio
set queue csb acl_group_sloppy = True
set queue csb enabled = True
set queue csb started = True
# Set server attributes.
set server scheduling = True
set server acl_hosts = pbsserver
set server managers = root at pbsserver
set server operators = root at pbsserver
set server default_queue = csb
set server log_events = 511
set server mail_from = root
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 859614
Btw, jobs that can't currently run are being queued, because somehow,
jobs that were running are still running. If a job can run it
basically is terminated immediately.
More information about the torqueusers