41)
Message boards :
News :
Poll
(Message 1773)
Posted 1 Feb 2016 by Yeti Post: HM, may be I was a Little bit to fast. It Looks like the Options 1 - 4 are not enough clear differenced. I forgot about the big differences your "Sub-Projects" Need. Sixtrack doesn't Need VirtualBox and VT-X, so this shouldn't be in the same Project-Container than Projects that Need VirtualBox and VT-X For me, it would be best to bring all These into the same Project-Container, that have similar Needs, like VirtualBox, VT-X and Memory-Hunger. So, if you get a Client working in one Sub-Project it will work for all other of these Sub-Projects. And the question about Beta-Testing will be: If you are really willing to do Beta-Tests, then it is right in the same Project-Container. But for Alpha-Testing it would be better to have a separate Project-Container |
42)
Message boards :
News :
Poll
(Message 1772)
Posted 1 Feb 2016 by Yeti Post: I voted for option 4 (6 prod/6 dev) simply because I would like to see the individual projects remain separate entities and effort for the individual project be identifiable. Currently if you run a CMS beta task at vLHC the credit gets added to your vLHC credit total and there is nothing to show what was done for each project. Will the other existing CERN projects just get their scores added to the vLHC scores when they get assimilated ? HM, I understand Option 4 different then you. I see one BOINC-Project (like vLHC) and there you can run and choose up to 6 different Sub-Projects. You can choose to run all Projects, or only some you marked and you can choose if you are willing to run Beta-Test and there you can again choose from which Projects you like to Beta-Test applications I prefer this Option 4 |
43)
Message boards :
Number crunching :
issue of the day
(Message 1472)
Posted 16 Nov 2015 by Yeti Post: Yes, a normal running WU spottes so many Errors, that no normal Cruncher has a chance to find the real problem |
44)
Message boards :
Number crunching :
BOINC_USERID is not an integer
(Message 1447)
Posted 12 Nov 2015 by Yeti Post: Yes, looks like something reset and then the user-info became available again. Perhaps a new glide-in starting? A reboot of the whole system |
45)
Message boards :
Number crunching :
BOINC_USERID is not an integer
(Message 1442)
Posted 12 Nov 2015 by Yeti Post: I was talking with David and Rom about it, finally Rom thought it must have been your part. Found this code-snippet: 14:58:01 +0100 2015-11-12 [INFO] Starting CMS Application - Run 7 14:58:01 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information 14:58:02 +0100 2015-11-12 [INFO] Volunteer: () Host: 14:58:02 +0100 2015-11-12 [INFO] BOINC_USERID is not an integer 14:58:02 +0100 2015-11-12 [INFO] Going to sleep for 1 hour 16:00:01 +0100 2015-11-12 [INFO] Starting CMS Application - Run 8 16:00:01 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information 16:00:02 +0100 2015-11-12 [INFO] Volunteer: () Host: 16:00:02 +0100 2015-11-12 [INFO] BOINC_USERID is not an integer 16:00:02 +0100 2015-11-12 [INFO] Going to sleep for 1 hour 17:02:50 +0100 2015-11-12 [INFO] Starting CMS Application - Run 1 17:02:50 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information 17:02:51 +0100 2015-11-12 [INFO] Volunteer: Yeti (250) Host: 495 Since then it is working again |
46)
Message boards :
Number crunching :
BOINC_USERID is not an integer
(Message 1440)
Posted 12 Nov 2015 by Yeti Post: Don't know if this has to do with changes to the Alpha 7.6.15 Release |
47)
Message boards :
News :
No new jobs
(Message 1361)
Posted 28 Oct 2015 by Yeti Post: what does this mean ? 10/28/15 18:27:33 (pid:22064) Create_Process succeeded, pid=22068 10/28/15 19:07:32 (pid:22064) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9623 10/28/15 19:07:32 (pid:22064) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9623 failed; will try to reconnect in 60 seconds. 10/28/15 19:08:33 (pid:22064) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9623 as ccbid 130.246.180.120:9623#108698 10/28/15 19:12:45 (pid:22064) condor_write(): Socket closed when trying to write 562 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:12:45 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:17:45 (pid:22064) condor_write(): Socket closed when trying to write 562 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:17:45 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:22:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:22:46 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:27:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:27:46 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:32:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:32:46 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:37:47 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:37:47 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:42:47 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:42:47 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:47:48 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:47:48 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:52:48 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:52:48 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:55:52 (pid:22064) Process exited, pid=22068, status=0 10/28/15 19:55:52 (pid:22064) condor_write(): Socket closed when trying to write 617 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:55:52 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:55:53 (pid:22064) condor_write(): Socket closed when trying to write 190 bytes to <130.246.180.120:9818>, fd is 11 10/28/15 19:55:53 (pid:22064) Buf::write(): condor_write() failed 10/28/15 19:55:53 (pid:22064) Failed to send job exit status to shadow 10/28/15 19:55:53 (pid:22064) JobExit() failed, waiting for job lease to expire or for a reconnect attempt 10/28/15 19:55:53 (pid:22064) Returning from CStarter::JobReaper() |
48)
Message boards :
Number crunching :
Proxy Error
(Message 1346)
Posted 27 Oct 2015 by Yeti Post: and one box with ProxyError so I will see if something develops with this ProxyError Meanwhile this box has recovered from this error and is crunching fine without any intervention from my side Will bring up the next one and take a close look |
49)
Message boards :
Number crunching :
Expect errors eventually
(Message 1345)
Posted 27 Oct 2015 by Yeti Post: if necessary, you can make it selectable for the user if he wants to process only small or even larger Jobs |
50)
Message boards :
Number crunching :
Proxy Error
(Message 1336)
Posted 26 Oct 2015 by Yeti Post: Have aborted CMS on 8 machines with ProxyError. Now I have 1 running fine and one box with ProxyError so I will see if something develops with this ProxyError |
51)
Message boards :
Number crunching :
Proxy Error
(Message 1335)
Posted 26 Oct 2015 by Yeti Post: Looks as if most of my Hosts are blocked by this: |
52)
Message boards :
News :
No new jobs
(Message 1311)
Posted 23 Oct 2015 by Yeti Post: My box is sitting idle here and I just found in http://localhost:54201/logs/run-1/glide_TZLLYf/StartdLog: 10/23/15 04:40:33 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8 10/23/15 04:40:33 (pid:7593) Buf::write(): condor_write() failed 10/23/15 04:50:57 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8 10/23/15 04:50:57 (pid:7593) Buf::write(): condor_write() failed ... 10/23/15 07:11:09 (pid:7593) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9619 10/23/15 07:11:09 (pid:7593) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9619 failed; will try to reconnect in 60 seconds. 10/23/15 07:12:10 (pid:7593) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#101094 10/23/15 07:16:33 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8 10/23/15 07:16:33 (pid:7593) Buf::write(): condor_write() failed ... 10/23/15 12:38:57 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8 10/23/15 12:38:57 (pid:7593) Buf::write(): condor_write() failed 10/23/15 12:49:21 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8 10/23/15 12:49:21 (pid:7593) Buf::write(): condor_write() failed So, what's gpoing on ? |
53)
Message boards :
Number crunching :
Job retries on same host
(Message 1305)
Posted 23 Oct 2015 by Yeti Post: If we can work out a way to get the host's IP into the VM's hostname (instead of localhost.localdomain) then you could use the procedure I posted earlier tonight to search for your hosts. Better would be the username I'm running more than 10 hosts and would prefer to search for the username |
54)
Message boards :
News :
New jobs available
(Message 1272)
Posted 20 Oct 2015 by Yeti Post: My machine sits here for hours now and seems to Loop through something endless: Thanks to Microsoft, they had published a patch, this forced my Desktop to reboot and now I'm back crunching |
55)
Message boards :
News :
New jobs available
(Message 1271)
Posted 20 Oct 2015 by Yeti Post: My machine sits here for hours now and seems to Loop through something endless: from http://localhost:52386/logs/run-1/glide_ETgbjq/StartdLog ----------------------------------------- 10/20/15 17:50:38 (pid:7603) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9620 as ccbid 130.246.180.120:9620#102700 10/20/15 18:00:13 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:00:13 (pid:7603) Buf::write(): condor_write() failed 10/20/15 18:11:29 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:11:29 (pid:7603) Buf::write(): condor_write() failed 10/20/15 18:22:45 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:22:45 (pid:7603) Buf::write(): condor_write() failed 10/20/15 18:34:01 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:34:01 (pid:7603) Buf::write(): condor_write() failed 10/20/15 18:45:18 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:45:18 (pid:7603) Buf::write(): condor_write() failed 10/20/15 18:56:34 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 18:56:34 (pid:7603) Buf::write(): condor_write() failed 10/20/15 19:07:50 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 19:07:50 (pid:7603) Buf::write(): condor_write() failed 10/20/15 19:19:06 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 19:19:06 (pid:7603) Buf::write(): condor_write() failed 10/20/15 19:30:22 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8 10/20/15 19:30:22 (pid:7603) Buf::write(): condor_write() failed ---------------------------------------------------------- And this is last 5 minutes from http://localhost:52386/logs/run-1/glide_ETgbjq/ProcLog: ---------------------------------------------------------- 10/20/15 19:31:00 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:00 : gathering usage data for family with root pid 11118 10/20/15 19:31:01 : taking a snapshot... 10/20/15 19:31:01 : ...snapshot complete 10/20/15 19:31:05 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:05 : gathering usage data for family with root pid 11118 10/20/15 19:31:10 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:10 : gathering usage data for family with root pid 11118 10/20/15 19:31:15 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:15 : gathering usage data for family with root pid 11118 10/20/15 19:31:16 : taking a snapshot... 10/20/15 19:31:16 : ...snapshot complete 10/20/15 19:31:20 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:20 : gathering usage data for family with root pid 11118 10/20/15 19:31:25 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:25 : gathering usage data for family with root pid 11118 10/20/15 19:31:30 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:30 : gathering usage data for family with root pid 11118 10/20/15 19:31:31 : taking a snapshot... 10/20/15 19:31:31 : ...snapshot complete 10/20/15 19:31:35 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:35 : gathering usage data for family with root pid 11118 10/20/15 19:31:40 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:40 : gathering usage data for family with root pid 11118 10/20/15 19:31:45 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:45 : gathering usage data for family with root pid 11118 10/20/15 19:31:46 : taking a snapshot... 10/20/15 19:31:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682 10/20/15 19:31:46 : ...snapshot complete 10/20/15 19:31:50 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:50 : gathering usage data for family with root pid 11118 10/20/15 19:31:55 : PROC_FAMILY_GET_USAGE 10/20/15 19:31:55 : gathering usage data for family with root pid 11118 10/20/15 19:32:00 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:00 : gathering usage data for family with root pid 11118 10/20/15 19:32:01 : taking a snapshot... 10/20/15 19:32:01 : ...snapshot complete 10/20/15 19:32:05 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:05 : gathering usage data for family with root pid 11118 10/20/15 19:32:10 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:10 : gathering usage data for family with root pid 11118 10/20/15 19:32:15 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:15 : gathering usage data for family with root pid 11118 10/20/15 19:32:16 : taking a snapshot... 10/20/15 19:32:16 : ...snapshot complete 10/20/15 19:32:20 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:20 : gathering usage data for family with root pid 11118 10/20/15 19:32:25 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:25 : gathering usage data for family with root pid 11118 10/20/15 19:32:30 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:30 : gathering usage data for family with root pid 11118 10/20/15 19:32:31 : taking a snapshot... 10/20/15 19:32:31 : ...snapshot complete 10/20/15 19:32:35 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:35 : gathering usage data for family with root pid 11118 10/20/15 19:32:40 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:40 : gathering usage data for family with root pid 11118 10/20/15 19:32:45 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:45 : gathering usage data for family with root pid 11118 10/20/15 19:32:46 : taking a snapshot... 10/20/15 19:32:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682 10/20/15 19:32:46 : ...snapshot complete 10/20/15 19:32:50 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:50 : gathering usage data for family with root pid 11118 10/20/15 19:32:55 : PROC_FAMILY_GET_USAGE 10/20/15 19:32:55 : gathering usage data for family with root pid 11118 10/20/15 19:33:00 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:00 : gathering usage data for family with root pid 11118 10/20/15 19:33:01 : taking a snapshot... 10/20/15 19:33:01 : ...snapshot complete 10/20/15 19:33:05 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:05 : gathering usage data for family with root pid 11118 10/20/15 19:33:10 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:10 : gathering usage data for family with root pid 11118 10/20/15 19:33:15 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:15 : gathering usage data for family with root pid 11118 10/20/15 19:33:16 : taking a snapshot... 10/20/15 19:33:16 : ...snapshot complete 10/20/15 19:33:20 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:20 : gathering usage data for family with root pid 11118 10/20/15 19:33:25 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:25 : gathering usage data for family with root pid 11118 10/20/15 19:33:30 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:30 : gathering usage data for family with root pid 11118 10/20/15 19:33:31 : taking a snapshot... 10/20/15 19:33:31 : ...snapshot complete 10/20/15 19:33:35 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:35 : gathering usage data for family with root pid 11118 10/20/15 19:33:40 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:40 : gathering usage data for family with root pid 11118 10/20/15 19:33:45 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:45 : gathering usage data for family with root pid 11118 10/20/15 19:33:46 : taking a snapshot... 10/20/15 19:33:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682 10/20/15 19:33:46 : ...snapshot complete 10/20/15 19:33:50 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:50 : gathering usage data for family with root pid 11118 10/20/15 19:33:55 : PROC_FAMILY_GET_USAGE 10/20/15 19:33:55 : gathering usage data for family with root pid 11118 10/20/15 19:34:00 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:00 : gathering usage data for family with root pid 11118 10/20/15 19:34:01 : taking a snapshot... 10/20/15 19:34:01 : ...snapshot complete 10/20/15 19:34:05 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:05 : gathering usage data for family with root pid 11118 10/20/15 19:34:10 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:10 : gathering usage data for family with root pid 11118 10/20/15 19:34:15 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:15 : gathering usage data for family with root pid 11118 10/20/15 19:34:16 : taking a snapshot... 10/20/15 19:34:16 : ...snapshot complete 10/20/15 19:34:20 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:20 : gathering usage data for family with root pid 11118 10/20/15 19:34:25 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:25 : gathering usage data for family with root pid 11118 10/20/15 19:34:30 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:30 : gathering usage data for family with root pid 11118 10/20/15 19:34:31 : taking a snapshot... 10/20/15 19:34:31 : ...snapshot complete 10/20/15 19:34:35 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:35 : gathering usage data for family with root pid 11118 10/20/15 19:34:40 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:40 : gathering usage data for family with root pid 11118 10/20/15 19:34:45 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:45 : gathering usage data for family with root pid 11118 10/20/15 19:34:46 : taking a snapshot... 10/20/15 19:34:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682 10/20/15 19:34:46 : ...snapshot complete 10/20/15 19:34:50 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:50 : gathering usage data for family with root pid 11118 10/20/15 19:34:55 : PROC_FAMILY_GET_USAGE 10/20/15 19:34:55 : gathering usage data for family with root pid 11118 10/20/15 19:35:00 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:00 : gathering usage data for family with root pid 11118 10/20/15 19:35:01 : taking a snapshot... 10/20/15 19:35:01 : ...snapshot complete 10/20/15 19:35:05 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:05 : gathering usage data for family with root pid 11118 10/20/15 19:35:10 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:10 : gathering usage data for family with root pid 11118 10/20/15 19:35:15 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:15 : gathering usage data for family with root pid 11118 10/20/15 19:35:16 : taking a snapshot... 10/20/15 19:35:16 : ...snapshot complete 10/20/15 19:35:20 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:20 : gathering usage data for family with root pid 11118 10/20/15 19:35:25 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:25 : gathering usage data for family with root pid 11118 10/20/15 19:35:31 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:31 : gathering usage data for family with root pid 11118 10/20/15 19:35:31 : taking a snapshot... 10/20/15 19:35:31 : ...snapshot complete 10/20/15 19:35:36 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:36 : gathering usage data for family with root pid 11118 10/20/15 19:35:41 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:41 : gathering usage data for family with root pid 11118 10/20/15 19:35:46 : taking a snapshot... 10/20/15 19:35:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350683 10/20/15 19:35:46 : ...snapshot complete 10/20/15 19:35:46 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:46 : gathering usage data for family with root pid 11118 10/20/15 19:35:51 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:51 : gathering usage data for family with root pid 11118 10/20/15 19:35:56 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:56 : gathering usage data for family with root pid 11118 10/20/15 19:35:56 : PROC_FAMILY_GET_USAGE 10/20/15 19:35:56 : gathering usage data for family with root pid 11118 10/20/15 19:36:01 : taking a snapshot... 10/20/15 19:36:01 : ...snapshot complete 10/20/15 19:36:01 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:01 : gathering usage data for family with root pid 11118 10/20/15 19:36:06 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:06 : gathering usage data for family with root pid 11118 10/20/15 19:36:11 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:11 : gathering usage data for family with root pid 11118 10/20/15 19:36:16 : taking a snapshot... 10/20/15 19:36:16 : ...snapshot complete 10/20/15 19:36:16 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:16 : gathering usage data for family with root pid 11118 10/20/15 19:36:21 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:21 : gathering usage data for family with root pid 11118 10/20/15 19:36:26 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:26 : gathering usage data for family with root pid 11118 10/20/15 19:36:31 : taking a snapshot... 10/20/15 19:36:31 : ...snapshot complete 10/20/15 19:36:31 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:31 : gathering usage data for family with root pid 11118 10/20/15 19:36:36 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:36 : gathering usage data for family with root pid 11118 10/20/15 19:36:41 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:41 : gathering usage data for family with root pid 11118 10/20/15 19:36:46 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:46 : gathering usage data for family with root pid 11118 10/20/15 19:36:46 : taking a snapshot... 10/20/15 19:36:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350683 10/20/15 19:36:46 : ...snapshot complete 10/20/15 19:36:51 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:51 : gathering usage data for family with root pid 11118 10/20/15 19:36:56 : PROC_FAMILY_GET_USAGE 10/20/15 19:36:56 : gathering usage data for family with root pid 11118 10/20/15 19:37:01 : taking a snapshot... 10/20/15 19:37:01 : ...snapshot complete 10/20/15 19:37:01 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:01 : gathering usage data for family with root pid 11118 10/20/15 19:37:06 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:06 : gathering usage data for family with root pid 11118 10/20/15 19:37:11 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:11 : gathering usage data for family with root pid 11118 10/20/15 19:37:16 : taking a snapshot... 10/20/15 19:37:16 : ...snapshot complete 10/20/15 19:37:16 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:16 : gathering usage data for family with root pid 11118 10/20/15 19:37:21 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:21 : gathering usage data for family with root pid 11118 10/20/15 19:37:26 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:26 : gathering usage data for family with root pid 11118 10/20/15 19:37:31 : taking a snapshot... 10/20/15 19:37:31 : ...snapshot complete 10/20/15 19:37:31 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:31 : gathering usage data for family with root pid 11118 10/20/15 19:37:36 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:36 : gathering usage data for family with root pid 11118 10/20/15 19:37:41 : PROC_FAMILY_GET_USAGE 10/20/15 19:37:41 : gathering usage data for family with root pid 11118 10/20/15 19:37:46 : taking a snapshot... 10/20/15 19:37:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682 10/20/15 19:37:46 : ...snapshot complete 10/20/15 19:37:46 : PROC_FAMILY_GET_USAGE |
56)
Message boards :
News :
New developments
(Message 1262)
Posted 19 Oct 2015 by Yeti Post: Okay, I have set "No New Work" for CMS. Let me know if it makes sense to return to crunching |
57)
Message boards :
News :
New developments
(Message 1258)
Posted 19 Oct 2015 by Yeti Post: So, now at the Moment, are we running or not? I see this in my logs: 12:18:01 +0200 2015-10-19 [INFO] CMS glidein Run 13 ended 12:19:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 14 12:19:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:19:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:19:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:19:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=1181170921 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:03:58 (5.4 days) 12:19:06 +0200 2015-10-19 [INFO] Downloading glidein 12:19:07 +0200 2015-10-19 [INFO] Running glidein (check logs) 12:25:01 +0200 2015-10-19 [INFO] CMS glidein Run 14 ended 12:26:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 15 12:26:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:26:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:26:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:26:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 130:00:00 (5.4 days) 12:26:02 +0200 2015-10-19 [INFO] Downloading glidein 12:26:03 +0200 2015-10-19 [INFO] Running glidein (check logs) 12:31:01 +0200 2015-10-19 [INFO] CMS glidein Run 15 ended 12:32:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 16 12:32:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:32:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:32:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:32:02 +0200 2015-10-19 [INFO] Requesting an X509 credential 12:32:03 +0200 2015-10-19 [ERROR] Proxy error 12:32:03 +0200 2015-10-19 [INFO] Going to sleep for 1 hour 12:33:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 17 12:33:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:33:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:33:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:33:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:53:00 (5.4 days) 12:33:02 +0200 2015-10-19 [INFO] Downloading glidein 12:33:03 +0200 2015-10-19 [INFO] Running glidein (check logs) and in the stderr: ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ----------------------------- Should we go standby or stay beeing online ? |
58)
Message boards :
Number crunching :
issue of the day
(Message 1224)
Posted 12 Oct 2015 by Yeti Post: Requesting an X509 credential Proxy error Going to sleep for 1 hour ... This seems to turn every minute |
59)
Message boards :
Number crunching :
issue of the day
(Message 1211)
Posted 9 Oct 2015 by Yeti Post: Back to CMS here, I will send you a PM |
60)
Message boards :
Number crunching :
issue of the day
(Message 1208)
Posted 9 Oct 2015 by Yeti Post: [rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant] It's very easy: Each VM-Job is only ONE Atlas-Job, so, if your Atlas-Job is failing you see it here: (Couldn't find user m at Atlas, so I took my account): http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=6&appid= and http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=5&appid= |
©2024 CERN