41) Message boards : News : Poll (Message 1773)
Posted 1 Feb 2016 by Yeti
Post:
HM, may be I was a Little bit to fast. It Looks like the Options 1 - 4 are not enough clear differenced.

I forgot about the big differences your "Sub-Projects" Need. Sixtrack doesn't Need VirtualBox and VT-X, so this shouldn't be in the same Project-Container than Projects that Need VirtualBox and VT-X

For me, it would be best to bring all These into the same Project-Container, that have similar Needs, like VirtualBox, VT-X and Memory-Hunger.

So, if you get a Client working in one Sub-Project it will work for all other of these Sub-Projects.

And the question about Beta-Testing will be: If you are really willing to do Beta-Tests, then it is right in the same Project-Container. But for Alpha-Testing it would be better to have a separate Project-Container
42) Message boards : News : Poll (Message 1772)
Posted 1 Feb 2016 by Yeti
Post:
I voted for option 4 (6 prod/6 dev) simply because I would like to see the individual projects remain separate entities and effort for the individual project be identifiable. Currently if you run a CMS beta task at vLHC the credit gets added to your vLHC credit total and there is nothing to show what was done for each project. Will the other existing CERN projects just get their scores added to the vLHC scores when they get assimilated ?

HM, I understand Option 4 different then you.

I see one BOINC-Project (like vLHC) and there you can run and choose up to 6 different Sub-Projects. You can choose to run all Projects, or only some you marked and you can choose if you are willing to run Beta-Test and there you can again choose from which Projects you like to Beta-Test applications

I prefer this Option 4
43) Message boards : Number crunching : issue of the day (Message 1472)
Posted 16 Nov 2015 by Yeti
Post:
Yes, a normal running WU spottes so many Errors, that no normal Cruncher has a chance to find the real problem
44) Message boards : Number crunching : BOINC_USERID is not an integer (Message 1447)
Posted 12 Nov 2015 by Yeti
Post:
Yes, looks like something reset and then the user-info became available again. Perhaps a new glide-in starting?

A reboot of the whole system
45) Message boards : Number crunching : BOINC_USERID is not an integer (Message 1442)
Posted 12 Nov 2015 by Yeti
Post:
I was talking with David and Rom about it, finally Rom thought it must have been your part.

Found this code-snippet:

14:58:01 +0100 2015-11-12 [INFO] Starting CMS Application - Run 7
14:58:01 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information
14:58:02 +0100 2015-11-12 [INFO] Volunteer: () Host:
14:58:02 +0100 2015-11-12 [INFO] BOINC_USERID is not an integer
14:58:02 +0100 2015-11-12 [INFO] Going to sleep for 1 hour
16:00:01 +0100 2015-11-12 [INFO] Starting CMS Application - Run 8
16:00:01 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information
16:00:02 +0100 2015-11-12 [INFO] Volunteer: () Host:
16:00:02 +0100 2015-11-12 [INFO] BOINC_USERID is not an integer
16:00:02 +0100 2015-11-12 [INFO] Going to sleep for 1 hour
17:02:50 +0100 2015-11-12 [INFO] Starting CMS Application - Run 1
17:02:50 +0100 2015-11-12 [INFO] Reading the BOINC volunteer's information
17:02:51 +0100 2015-11-12 [INFO] Volunteer: Yeti (250) Host: 495

Since then it is working again
46) Message boards : Number crunching : BOINC_USERID is not an integer (Message 1440)
Posted 12 Nov 2015 by Yeti
Post:


Don't know if this has to do with changes to the Alpha 7.6.15 Release
47) Message boards : News : No new jobs (Message 1361)
Posted 28 Oct 2015 by Yeti
Post:
what does this mean ?

10/28/15 18:27:33 (pid:22064) Create_Process succeeded, pid=22068
10/28/15 19:07:32 (pid:22064) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9623
10/28/15 19:07:32 (pid:22064) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9623 failed; will try to reconnect in 60 seconds.
10/28/15 19:08:33 (pid:22064) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9623 as ccbid 130.246.180.120:9623#108698
10/28/15 19:12:45 (pid:22064) condor_write(): Socket closed when trying to write 562 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:12:45 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:17:45 (pid:22064) condor_write(): Socket closed when trying to write 562 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:17:45 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:22:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:22:46 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:27:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:27:46 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:32:46 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:32:46 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:37:47 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:37:47 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:42:47 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:42:47 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:47:48 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:47:48 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:52:48 (pid:22064) condor_write(): Socket closed when trying to write 563 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:52:48 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:55:52 (pid:22064) Process exited, pid=22068, status=0
10/28/15 19:55:52 (pid:22064) condor_write(): Socket closed when trying to write 617 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:55:52 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:55:53 (pid:22064) condor_write(): Socket closed when trying to write 190 bytes to <130.246.180.120:9818>, fd is 11
10/28/15 19:55:53 (pid:22064) Buf::write(): condor_write() failed
10/28/15 19:55:53 (pid:22064) Failed to send job exit status to shadow
10/28/15 19:55:53 (pid:22064) JobExit() failed, waiting for job lease to expire or for a reconnect attempt
10/28/15 19:55:53 (pid:22064) Returning from CStarter::JobReaper()
48) Message boards : Number crunching : Proxy Error (Message 1346)
Posted 27 Oct 2015 by Yeti
Post:
and one box with ProxyError so I will see if something develops with this ProxyError

Meanwhile this box has recovered from this error and is crunching fine without any intervention from my side

Will bring up the next one and take a close look
49) Message boards : Number crunching : Expect errors eventually (Message 1345)
Posted 27 Oct 2015 by Yeti
Post:
if necessary, you can make it selectable for the user if he wants to process only small or even larger Jobs
50) Message boards : Number crunching : Proxy Error (Message 1336)
Posted 26 Oct 2015 by Yeti
Post:
Have aborted CMS on 8 machines with ProxyError.

Now I have 1 running fine and one box with ProxyError so I will see if something develops with this ProxyError
51) Message boards : Number crunching : Proxy Error (Message 1335)
Posted 26 Oct 2015 by Yeti
Post:
Looks as if most of my Hosts are blocked by this:

52) Message boards : News : No new jobs (Message 1311)
Posted 23 Oct 2015 by Yeti
Post:
My box is sitting idle here and I just found in http://localhost:54201/logs/run-1/glide_TZLLYf/StartdLog:

10/23/15 04:40:33 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8
10/23/15 04:40:33 (pid:7593) Buf::write(): condor_write() failed
10/23/15 04:50:57 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8
10/23/15 04:50:57 (pid:7593) Buf::write(): condor_write() failed

...

10/23/15 07:11:09 (pid:7593) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9619
10/23/15 07:11:09 (pid:7593) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9619 failed; will try to reconnect in 60 seconds.
10/23/15 07:12:10 (pid:7593) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#101094
10/23/15 07:16:33 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8
10/23/15 07:16:33 (pid:7593) Buf::write(): condor_write() failed

...

10/23/15 12:38:57 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8
10/23/15 12:38:57 (pid:7593) Buf::write(): condor_write() failed
10/23/15 12:49:21 (pid:7593) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 8
10/23/15 12:49:21 (pid:7593) Buf::write(): condor_write() failed


So, what's gpoing on ?
53) Message boards : Number crunching : Job retries on same host (Message 1305)
Posted 23 Oct 2015 by Yeti
Post:
If we can work out a way to get the host's IP into the VM's hostname (instead of localhost.localdomain) then you could use the procedure I posted earlier tonight to search for your hosts.

It might be better to use the host ID, people might not want any details of their machines made public.

Better would be the username

I'm running more than 10 hosts and would prefer to search for the username
54) Message boards : News : New jobs available (Message 1272)
Posted 20 Oct 2015 by Yeti
Post:
My machine sits here for hours now and seems to Loop through something endless:

Thanks to Microsoft, they had published a patch, this forced my Desktop to reboot and now I'm back crunching
55) Message boards : News : New jobs available (Message 1271)
Posted 20 Oct 2015 by Yeti
Post:
My machine sits here for hours now and seems to Loop through something endless:

from http://localhost:52386/logs/run-1/glide_ETgbjq/StartdLog

-----------------------------------------
10/20/15 17:50:38 (pid:7603) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9620 as ccbid 130.246.180.120:9620#102700
10/20/15 18:00:13 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:00:13 (pid:7603) Buf::write(): condor_write() failed
10/20/15 18:11:29 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:11:29 (pid:7603) Buf::write(): condor_write() failed
10/20/15 18:22:45 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:22:45 (pid:7603) Buf::write(): condor_write() failed
10/20/15 18:34:01 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:34:01 (pid:7603) Buf::write(): condor_write() failed
10/20/15 18:45:18 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:45:18 (pid:7603) Buf::write(): condor_write() failed
10/20/15 18:56:34 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 18:56:34 (pid:7603) Buf::write(): condor_write() failed
10/20/15 19:07:50 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 19:07:50 (pid:7603) Buf::write(): condor_write() failed
10/20/15 19:19:06 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 19:19:06 (pid:7603) Buf::write(): condor_write() failed
10/20/15 19:30:22 (pid:7603) condor_write(): Socket closed when trying to write 4096 bytes to collector lcggwms02.gridpp.rl.ac.uk:9620, fd is 8
10/20/15 19:30:22 (pid:7603) Buf::write(): condor_write() failed
----------------------------------------------------------

And this is last 5 minutes from http://localhost:52386/logs/run-1/glide_ETgbjq/ProcLog:
----------------------------------------------------------
10/20/15 19:31:00 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:00 : gathering usage data for family with root pid 11118
10/20/15 19:31:01 : taking a snapshot...
10/20/15 19:31:01 : ...snapshot complete
10/20/15 19:31:05 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:05 : gathering usage data for family with root pid 11118
10/20/15 19:31:10 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:10 : gathering usage data for family with root pid 11118
10/20/15 19:31:15 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:15 : gathering usage data for family with root pid 11118
10/20/15 19:31:16 : taking a snapshot...
10/20/15 19:31:16 : ...snapshot complete
10/20/15 19:31:20 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:20 : gathering usage data for family with root pid 11118
10/20/15 19:31:25 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:25 : gathering usage data for family with root pid 11118
10/20/15 19:31:30 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:30 : gathering usage data for family with root pid 11118
10/20/15 19:31:31 : taking a snapshot...
10/20/15 19:31:31 : ...snapshot complete
10/20/15 19:31:35 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:35 : gathering usage data for family with root pid 11118
10/20/15 19:31:40 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:40 : gathering usage data for family with root pid 11118
10/20/15 19:31:45 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:45 : gathering usage data for family with root pid 11118
10/20/15 19:31:46 : taking a snapshot...
10/20/15 19:31:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682
10/20/15 19:31:46 : ...snapshot complete
10/20/15 19:31:50 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:50 : gathering usage data for family with root pid 11118
10/20/15 19:31:55 : PROC_FAMILY_GET_USAGE
10/20/15 19:31:55 : gathering usage data for family with root pid 11118
10/20/15 19:32:00 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:00 : gathering usage data for family with root pid 11118
10/20/15 19:32:01 : taking a snapshot...
10/20/15 19:32:01 : ...snapshot complete
10/20/15 19:32:05 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:05 : gathering usage data for family with root pid 11118
10/20/15 19:32:10 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:10 : gathering usage data for family with root pid 11118
10/20/15 19:32:15 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:15 : gathering usage data for family with root pid 11118
10/20/15 19:32:16 : taking a snapshot...
10/20/15 19:32:16 : ...snapshot complete
10/20/15 19:32:20 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:20 : gathering usage data for family with root pid 11118
10/20/15 19:32:25 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:25 : gathering usage data for family with root pid 11118
10/20/15 19:32:30 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:30 : gathering usage data for family with root pid 11118
10/20/15 19:32:31 : taking a snapshot...
10/20/15 19:32:31 : ...snapshot complete
10/20/15 19:32:35 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:35 : gathering usage data for family with root pid 11118
10/20/15 19:32:40 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:40 : gathering usage data for family with root pid 11118
10/20/15 19:32:45 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:45 : gathering usage data for family with root pid 11118
10/20/15 19:32:46 : taking a snapshot...
10/20/15 19:32:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682
10/20/15 19:32:46 : ...snapshot complete
10/20/15 19:32:50 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:50 : gathering usage data for family with root pid 11118
10/20/15 19:32:55 : PROC_FAMILY_GET_USAGE
10/20/15 19:32:55 : gathering usage data for family with root pid 11118
10/20/15 19:33:00 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:00 : gathering usage data for family with root pid 11118
10/20/15 19:33:01 : taking a snapshot...
10/20/15 19:33:01 : ...snapshot complete
10/20/15 19:33:05 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:05 : gathering usage data for family with root pid 11118
10/20/15 19:33:10 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:10 : gathering usage data for family with root pid 11118
10/20/15 19:33:15 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:15 : gathering usage data for family with root pid 11118
10/20/15 19:33:16 : taking a snapshot...
10/20/15 19:33:16 : ...snapshot complete
10/20/15 19:33:20 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:20 : gathering usage data for family with root pid 11118
10/20/15 19:33:25 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:25 : gathering usage data for family with root pid 11118
10/20/15 19:33:30 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:30 : gathering usage data for family with root pid 11118
10/20/15 19:33:31 : taking a snapshot...
10/20/15 19:33:31 : ...snapshot complete
10/20/15 19:33:35 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:35 : gathering usage data for family with root pid 11118
10/20/15 19:33:40 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:40 : gathering usage data for family with root pid 11118
10/20/15 19:33:45 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:45 : gathering usage data for family with root pid 11118
10/20/15 19:33:46 : taking a snapshot...
10/20/15 19:33:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682
10/20/15 19:33:46 : ...snapshot complete
10/20/15 19:33:50 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:50 : gathering usage data for family with root pid 11118
10/20/15 19:33:55 : PROC_FAMILY_GET_USAGE
10/20/15 19:33:55 : gathering usage data for family with root pid 11118
10/20/15 19:34:00 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:00 : gathering usage data for family with root pid 11118
10/20/15 19:34:01 : taking a snapshot...
10/20/15 19:34:01 : ...snapshot complete
10/20/15 19:34:05 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:05 : gathering usage data for family with root pid 11118
10/20/15 19:34:10 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:10 : gathering usage data for family with root pid 11118
10/20/15 19:34:15 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:15 : gathering usage data for family with root pid 11118
10/20/15 19:34:16 : taking a snapshot...
10/20/15 19:34:16 : ...snapshot complete
10/20/15 19:34:20 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:20 : gathering usage data for family with root pid 11118
10/20/15 19:34:25 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:25 : gathering usage data for family with root pid 11118
10/20/15 19:34:30 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:30 : gathering usage data for family with root pid 11118
10/20/15 19:34:31 : taking a snapshot...
10/20/15 19:34:31 : ...snapshot complete
10/20/15 19:34:35 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:35 : gathering usage data for family with root pid 11118
10/20/15 19:34:40 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:40 : gathering usage data for family with root pid 11118
10/20/15 19:34:45 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:45 : gathering usage data for family with root pid 11118
10/20/15 19:34:46 : taking a snapshot...
10/20/15 19:34:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682
10/20/15 19:34:46 : ...snapshot complete
10/20/15 19:34:50 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:50 : gathering usage data for family with root pid 11118
10/20/15 19:34:55 : PROC_FAMILY_GET_USAGE
10/20/15 19:34:55 : gathering usage data for family with root pid 11118
10/20/15 19:35:00 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:00 : gathering usage data for family with root pid 11118
10/20/15 19:35:01 : taking a snapshot...
10/20/15 19:35:01 : ...snapshot complete
10/20/15 19:35:05 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:05 : gathering usage data for family with root pid 11118
10/20/15 19:35:10 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:10 : gathering usage data for family with root pid 11118
10/20/15 19:35:15 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:15 : gathering usage data for family with root pid 11118
10/20/15 19:35:16 : taking a snapshot...
10/20/15 19:35:16 : ...snapshot complete
10/20/15 19:35:20 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:20 : gathering usage data for family with root pid 11118
10/20/15 19:35:25 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:25 : gathering usage data for family with root pid 11118
10/20/15 19:35:31 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:31 : gathering usage data for family with root pid 11118
10/20/15 19:35:31 : taking a snapshot...
10/20/15 19:35:31 : ...snapshot complete
10/20/15 19:35:36 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:36 : gathering usage data for family with root pid 11118
10/20/15 19:35:41 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:41 : gathering usage data for family with root pid 11118
10/20/15 19:35:46 : taking a snapshot...
10/20/15 19:35:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350683
10/20/15 19:35:46 : ...snapshot complete
10/20/15 19:35:46 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:46 : gathering usage data for family with root pid 11118
10/20/15 19:35:51 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:51 : gathering usage data for family with root pid 11118
10/20/15 19:35:56 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:56 : gathering usage data for family with root pid 11118
10/20/15 19:35:56 : PROC_FAMILY_GET_USAGE
10/20/15 19:35:56 : gathering usage data for family with root pid 11118
10/20/15 19:36:01 : taking a snapshot...
10/20/15 19:36:01 : ...snapshot complete
10/20/15 19:36:01 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:01 : gathering usage data for family with root pid 11118
10/20/15 19:36:06 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:06 : gathering usage data for family with root pid 11118
10/20/15 19:36:11 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:11 : gathering usage data for family with root pid 11118
10/20/15 19:36:16 : taking a snapshot...
10/20/15 19:36:16 : ...snapshot complete
10/20/15 19:36:16 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:16 : gathering usage data for family with root pid 11118
10/20/15 19:36:21 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:21 : gathering usage data for family with root pid 11118
10/20/15 19:36:26 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:26 : gathering usage data for family with root pid 11118
10/20/15 19:36:31 : taking a snapshot...
10/20/15 19:36:31 : ...snapshot complete
10/20/15 19:36:31 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:31 : gathering usage data for family with root pid 11118
10/20/15 19:36:36 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:36 : gathering usage data for family with root pid 11118
10/20/15 19:36:41 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:41 : gathering usage data for family with root pid 11118
10/20/15 19:36:46 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:46 : gathering usage data for family with root pid 11118
10/20/15 19:36:46 : taking a snapshot...
10/20/15 19:36:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350683
10/20/15 19:36:46 : ...snapshot complete
10/20/15 19:36:51 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:51 : gathering usage data for family with root pid 11118
10/20/15 19:36:56 : PROC_FAMILY_GET_USAGE
10/20/15 19:36:56 : gathering usage data for family with root pid 11118
10/20/15 19:37:01 : taking a snapshot...
10/20/15 19:37:01 : ...snapshot complete
10/20/15 19:37:01 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:01 : gathering usage data for family with root pid 11118
10/20/15 19:37:06 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:06 : gathering usage data for family with root pid 11118
10/20/15 19:37:11 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:11 : gathering usage data for family with root pid 11118
10/20/15 19:37:16 : taking a snapshot...
10/20/15 19:37:16 : ...snapshot complete
10/20/15 19:37:16 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:16 : gathering usage data for family with root pid 11118
10/20/15 19:37:21 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:21 : gathering usage data for family with root pid 11118
10/20/15 19:37:26 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:26 : gathering usage data for family with root pid 11118
10/20/15 19:37:31 : taking a snapshot...
10/20/15 19:37:31 : ...snapshot complete
10/20/15 19:37:31 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:31 : gathering usage data for family with root pid 11118
10/20/15 19:37:36 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:36 : gathering usage data for family with root pid 11118
10/20/15 19:37:41 : PROC_FAMILY_GET_USAGE
10/20/15 19:37:41 : gathering usage data for family with root pid 11118
10/20/15 19:37:46 : taking a snapshot...
10/20/15 19:37:46 : ProcAPI: new boottime = 1445350682; old_boottime = 1445350682; /proc/stat boottime = 1445350682; /proc/uptime boottime = 1445350682
10/20/15 19:37:46 : ...snapshot complete
10/20/15 19:37:46 : PROC_FAMILY_GET_USAGE
56) Message boards : News : New developments (Message 1262)
Posted 19 Oct 2015 by Yeti
Post:
Okay, I have set "No New Work" for CMS.

Let me know if it makes sense to return to crunching
57) Message boards : News : New developments (Message 1258)
Posted 19 Oct 2015 by Yeti
Post:
So, now at the Moment, are we running or not?

I see this in my logs:

12:18:01 +0200 2015-10-19 [INFO] CMS glidein Run 13 ended
12:19:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 14
12:19:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information
12:19:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495
12:19:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90
12:19:02 +0200 2015-10-19 [INFO] Requesting an X509 credential
subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=1181170921
issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250
identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 129:03:58 (5.4 days)
12:19:06 +0200 2015-10-19 [INFO] Downloading glidein
12:19:07 +0200 2015-10-19 [INFO] Running glidein (check logs)
12:25:01 +0200 2015-10-19 [INFO] CMS glidein Run 14 ended
12:26:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 15
12:26:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information
12:26:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495
12:26:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90
12:26:02 +0200 2015-10-19 [INFO] Requesting an X509 credential
subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940
issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250
identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 130:00:00 (5.4 days)
12:26:02 +0200 2015-10-19 [INFO] Downloading glidein
12:26:03 +0200 2015-10-19 [INFO] Running glidein (check logs)
12:31:01 +0200 2015-10-19 [INFO] CMS glidein Run 15 ended
12:32:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 16
12:32:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information
12:32:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495
12:32:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90
12:32:02 +0200 2015-10-19 [INFO] Requesting an X509 credential
12:32:03 +0200 2015-10-19 [ERROR] Proxy error
12:32:03 +0200 2015-10-19 [INFO] Going to sleep for 1 hour
12:33:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 17
12:33:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information
12:33:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495
12:33:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90
12:33:02 +0200 2015-10-19 [INFO] Requesting an X509 credential
subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940
issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250
identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 129:53:00 (5.4 days)
12:33:02 +0200 2015-10-19 [INFO] Downloading glidein
12:33:03 +0200 2015-10-19 [INFO] Running glidein (check logs)

and in the stderr:

ERROR: Couldn't read proxy from: /tmp/x509up_u500
globus_credential: Error reading proxy credential
globus_credential: Error reading proxy credential: Couldn't read PEM from bio
OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line

Use -debug for further information.

ERROR: Couldn't read proxy from: /tmp/x509up_u500
globus_credential: Error reading proxy credential
globus_credential: Error reading proxy credential: Couldn't read PEM from bio
OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line

Use -debug for further information.

ERROR: Couldn't read proxy from: /tmp/x509up_u500
globus_credential: Error reading proxy credential
globus_credential: Error reading proxy credential: Couldn't read PEM from bio
OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line

Use -debug for further information.

-----------------------------

Should we go standby or stay beeing online ?
58) Message boards : Number crunching : issue of the day (Message 1224)
Posted 12 Oct 2015 by Yeti
Post:
Requesting an X509 credential
Proxy error
Going to sleep for 1 hour
...
This seems to turn every minute
59) Message boards : Number crunching : issue of the day (Message 1211)
Posted 9 Oct 2015 by Yeti
Post:
Back to CMS here, I will send you a PM
60) Message boards : Number crunching : issue of the day (Message 1208)
Posted 9 Oct 2015 by Yeti
Post:
[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant]

It's very easy: Each VM-Job is only ONE Atlas-Job, so, if your Atlas-Job is failing you see it here: (Couldn't find user m at Atlas, so I took my account):

http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=6&appid=
and

http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=5&appid=


Previous 20 · Next 20


©2024 CERN