[torqueusers] NVIDIA GPUs version error

Steve Crusan scrusan at ur.rochester.edu
Fri Sep 9 10:03:44 MDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Al,

	Thanks for the response. 

	I found what you are talking about in the src/resmom/mom_server.c

	Our GPUs are working fine, so I'll wait for the newest release, and then move from there.

~Steve

On Sep 7, 2011, at 2:48 PM, Al Taufer wrote:

> We have only tested Torque using the 260 and 270 Nvidia Drivers so driver versions greater than 270 are not yet recognized.  I am in the process of testing with the 275 and 280 drivers and hope to update Torque this week so it will accept any driver version greater than 260.  There were major changes between the 260 and 270 driver versions and we should be okay with future driver releases as long as the Nvidia interface does not change.
> 
> Al 
> 
> ----- Original Message -----
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Hi all,
>> 
>> 	I'm getting errors in my syslog from our gpu nodes pbs_moms:
>> 
>> 	Aug 22 15:55:09 blugpu07 pbs_mom: LOG_ERROR::a system error occured
>> 	(15205) in generate_server_gpustatus_smi, Unknown Nvidia driver
>> 	version
>> 
>> 	Here is the snipped output of pbsnodes blugpu07:
>> 	<SNIPPED>
>> 	gpu_status =
>> 	gpu[1]=gpu_id=0:15:0;,gpu[0]=gpu_id=0:14:0;,driver_ver=275.09.07,timestamp=Mon
>> 	Aug 22 15:56:41 2011
>> 
>> 
>> 	If I login to the node, and check the pbs_mom logfiles, I see the
>> 	following:
>> 
>> 	08/22/2011 15:57:24;0002;
>> 	pbs_mom;n/a;mom_server_all_update_gpustat;composing gpu status
>> 	update for server
>> 	08/22/2011 15:57:24;0001; pbs_mom;Svr;pbs_mom;LOG_DEBUG::gpus, gpus:
>> 	GPU cmd issued: nvidia-smi -a -x 2>&1
>> 	 08/22/2011 15:57:26;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::a system
>> 	 error occured (15205) in generate_server_gpustatus_smi, Unknown
>> 	 Nvidia driver versio n
>> 	08/22/2011 15:57:26;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::a system
>> 	error occured (15205) in generate_server_gpustatus_smi, Unknown
>> 	Nvidia driver versio n
>> 	08/22/2011 15:57:26;0002;
>> 	pbs_mom;n/a;mom_server_update_gpustat;mom_server_update_gpustat:
>> 	sending to server "timestamp=Mon Aug 22 15:57:26 2011"
>> 	08/22/2011 15:57:26;0002;
>> 	pbs_mom;n/a;mom_server_update_gpustat;mom_server_update_gpustat:
>> 	sending to server "driver_ver=275.09.07"
>> 	08/22/2011 15:57:26;0002;
>> 	pbs_mom;n/a;mom_server_update_gpustat;mom_server_update_gpustat:
>> 	sending to server "gpuid=0:14:0"
>> 	08/22/2011 15:57:26;0002;
>> 	pbs_mom;n/a;mom_server_update_gpustat;mom_server_update_gpustat:
>> 	sending to server "gpuid=0:15:0"
>> 	08/22/2011 15:57:26;0002;
>> 	pbs_mom;n/a;mom_server_update_gpustat;status update successfully
>> 	sent to bhsn-int
>> 
>> 
>> 	Is this driver version we have not supported by torque?
>> 
>> 
>> 
>> 	Environment:
>> 	- TORQUE-2.5.6
>> 	- NVIDIA Driver Version : 275.09.07
>> 	- kernel:	2.6.18-238.12.1.el5
>> 
>> 	- TORQUE client was build via:
>> 	This build was configured with: '''--prefix=/opt/torque/2.5.6'
>> 	'--exec-prefix=/opt/torque/2.5.6/x86_64'
>> 	'--with-server-home=/var/spool/pbs' '--enable-syslog' '--with-scp'
>> 	'--disable-rpp' '--disable-spool' '--with-pam' '--with-cpusets'
>> 	'--with-geometry-requests' '--disable-gui' '--enable-nvidia-gpus'
>> 	'--enable-docs'
>> 
>> 
>> 
>> ----------------------
>> Steve Crusan
>> System Administrator
>> Center for Research Computing
>> University of Rochester
>> https://www.crc.rochester.edu/
>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>> 
>> iQEcBAEBAgAGBQJOUrawAAoJENS19LGOpgqKwkoIAIQrY8rZn+J+vaSgnTElGxvu
>> KcMYlqkiBBZtix7YBCVMsHTv5PcOPT/4l1qHX4/7/P9ZW6Xc542LNKLJrd46FcLa
>> cmbkixUaGRJ5SDCVSyA6YzZZIBDHBjP3AMrIouDwjyOEhR3A9agI5yYPdFTRdcNQ
>> NoagT372lZnhVfPUYrVLM8oVIbS+KsZZGiYA4HShsbPUB/qqU/YqNroLlg7o8lVX
>> gHBY7C231TpC/YAJx1xZ5qjSSl1/mtzK8PuzqZ5mWBFtoXFvlzXFe+C0uqcCHLh2
>> jjkGeRU09YCkHEuqJy+iQ/KDGgvAFSmyuDgWq3RPJX8c7xw+y7saDLjhH9vPdVg=
>> =zdfO
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOajjlAAoJENS19LGOpgqKEbsH/2v4C6yglvkpVgvDOPTnr9Ud
DbygLsflOBypKnD/tJ7yenK65eBhq2P5cDr4qMtOyAya+gs7g2NlTVz4x7skgmFF
mGTy0FKgsUf9rk9LcQZIfeFIle+l5T9TpHGN0+fvF0zhrO7hTreQrtzCw1tm7WsB
0KzpF702c1RL8gLX7OVxE9t5NG9i1eadTEVlRpK6/5eNVVGQYP3P6nEmuDieUy2r
L3MYRLG6/v5AbBED1QztcMuuvUs4zBPwh0k4ItmpISwHDht6/YXOquRD2Yr17Q9S
S3qNfNBV6AlKlqaeDfQHYh6RGpfolRtmixdEFRxu4sbZ8fndoB4kBJ78qXVd744=
=mhIq
-----END PGP SIGNATURE-----


More information about the torqueusers mailing list