Message boards :
Theory Application :
Tasks are finishing prematurely
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 36 |
After the second job finished of a new task the task was killed prematurely. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=167051 Last part of StarterLog: 04/30/16 07:24:28 Create_Process succeeded, pid=4196 04/30/16 07:39:38 condor_read() failed: recv(fd=8) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from <188.184.187.167:9618>. 04/30/16 07:39:38 IO: Failed to read packet header 04/30/16 07:44:39 condor_write(): Socket closed when trying to write 410 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 07:44:39 Buf::write(): condor_write() failed 04/30/16 07:49:39 condor_write(): Socket closed when trying to write 410 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 07:49:39 Buf::write(): condor_write() failed 04/30/16 07:54:40 condor_write(): Socket closed when trying to write 410 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 07:54:40 Buf::write(): condor_write() failed 04/30/16 07:59:41 condor_write(): Socket closed when trying to write 410 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 07:59:41 Buf::write(): condor_write() failed 04/30/16 08:04:41 condor_write(): Socket closed when trying to write 410 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 08:04:41 Buf::write(): condor_write() failed 04/30/16 08:09:06 Process exited, pid=4196, status=0 04/30/16 08:09:06 About to exec Post script: /var/lib/condor/execute/dir_4192/tarOutput.sh 2016-562440-218 04/30/16 08:09:06 Create_Process succeeded, pid=9698 04/30/16 08:09:06 Process exited, pid=9698, status=0 04/30/16 08:09:06 condor_write(): Socket closed when trying to write 581 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 08:09:06 Buf::write(): condor_write() failed 04/30/16 08:09:06 condor_write(): Socket closed when trying to write 363 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 08:09:06 Buf::write(): condor_write() failed 04/30/16 08:09:06 Failed to send job exit status to shadow 04/30/16 08:09:06 JobExit() failed, waiting for job lease to expire or for a reconnect attempt 04/30/16 08:17:47 Got SIGQUIT. Performing fast shutdown. 04/30/16 08:17:47 ShutdownFast all jobs. 04/30/16 08:17:47 condor_write(): Socket closed when trying to write 363 bytes to <188.184.187.167:9618>, fd is 8 04/30/16 08:17:47 Buf::write(): condor_write() failed 04/30/16 08:17:47 Failed to send job exit status to shadow 04/30/16 08:17:47 JobExit() failed, waiting for job lease to expire or for a reconnect attempt |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 36 |
New task only ran 7 minutes -> http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=167108 2016-04-30 09:45:08 (5544): Guest Log: [INFO] Mounting the shared directory 2016-04-30 09:45:08 (5544): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2016-04-30 09:45:08 (5544): Guest Log: [DEBUG] Probing CVMFS ... 2016-04-30 09:45:08 (5544): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2016-04-30 09:45:08 (5544): Guest Log: Probing /cvmfs/sft.cern.ch... OK 2016-04-30 09:45:18 (5544): Guest Log: 0 2016-04-30 09:45:18 (5544): Guest Log: cms.cern.ch not mounted 2016-04-30 09:45:18 (5544): Guest Log: 1 2016-04-30 09:45:18 (5544): Guest Log: [DEBUG] Finished probing. 2016-04-30 09:45:18 (5544): Guest Log: [INFO] Reading volunteer information 2016-04-30 09:45:18 (5544): Guest Log: [INFO] Volunteer: Crystal Pellet (38) Host: 37 2016-04-30 09:45:18 (5544): Guest Log: [INFO] VMID: 5a65677f-3929-47b8-97cd-9212275cc67f 2016-04-30 09:45:18 (5544): Guest Log: [INFO] Requesting an X509 credential from vLHC@home 2016-04-30 09:45:18 (5544): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2016-04-30 09:45:18 (5544): Guest Log: [INFO] Theory application starting. Check log files. 2016-04-30 09:51:00 (5544): Guest Log: [INFO] Condor exited with 0 2016-04-30 09:51:00 (5544): Guest Log: [INFO] Shutting Down. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
This is typical of no jobs available. |
Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 |
Just submitted a new bunch. Sorry for the delay. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I had tasks finishing after about 7 min in the period from 01.46-3.11UTC this morning.(no jobs?) Then, the tasks worked again.(kept running) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 36 |
This task was running normally and killed, where it was busy with a Sherpa job 3 minutes elapsed events processing and estimated to do 2-3 hours and suddenly VM was idling without 'nobody' processes. |
©2024 CERN