Info | Message |
---|---|
1) Message boards : Cafe : Lego Update
Message 3408 Posted 19 May 2016 by Phil |
I notice a recent update to this thread (in spite of weasels):There has been talk of rewards once we really get off the ground, perhaps as far as a guided tour of CERN and CMS. For the moment, though, that's as close as SETI@Home's mythical toaster. here. |
2) Message boards : CMS Application : Credentials
Message 3304 Posted 10 May 2016 by Phil |
Credentials not working.Just saw that myself and reported it. Its across all projects. |
3) Message boards : Theory Application : New version with app_config.xml
Message 3198 Posted 3 May 2016 by Phil |
Laurence: BOINC Manager is still saying vLHCathome-dev: Notice from BOINC Your app_config.xml file refers to an unknown application 'ALICE'. Known applications: 'CMS', 'Theory', 'LHCb', 'ATLAS' 03/05/2016 17:25:00 |
4) Message boards : CMS Application : Busy for a bit...
Message 3197 Posted 3 May 2016 by Phil |
Update: just got back from CERN -- at 1 AM local! (Lots of woes on the way back: plane 90 mins late; then another 10 minutes or so on tarmac at LHR waiting for ground power to be connected 'cos the onboard generator was borked so the engines had to be kept running for electrickery; then 30 minutes to slowly snake through the non-EU Immigration queue. Usually its the opposite - wait an hour in departure lounge because they cant start the plane (sorry, the AUX generator wont go, we are waiting for a mobile supply) of course they have a dozen mobile generators but they're all for Airbus and this is the only Boeing flight the airline has, and the only generator with a Boeing plug on it is at another terminal 4 miles away. The 2nd-last 350 bus of the night was about to leave when I got to the stand at 0011; then a 14 minute wait for the last U5 bus at West Drayton -- the last U3 was right behind it, both get me to my nearest stop.) I usually arrive West Drayton late on a train, and stand around wondering if all the Last Busses have gone. In the lonely darkness can be seen an advert for a taxi company, and usually while contemplating dialing, The Last Bus arrives. Recently though, it has been supplemented with a big LED Display that flashes CALL NOW FOR TAXI followed by a 12-digit number thats visible for about 250ms and impossible to remember - weirdly its totally different from the phone number on the painted sign. While wondering of this is an update, the sign then flashes MERRY XMAS so I'm not sure if this new sign is 4 months out of date or 8 months into the future. Sometimes I have the opposite problem - arrive at Uxbridge and look for a bus toward West Drayton. There will be 20-30 buses stood around at Uxbridge, all with hopeful-looking numbers on the front. But of course, they are all just queued to be put into the garage for the night. Theres usually around a dozen Drunks&Wierdos who keep clambering aboard each one in turn, shouting, vomiting and worse. Aah, well. I'll look at issues raised when I surface later today, must watch the BBC news on catch-up now to see what they say about "The Weasel that Killed the LHC"! Seems The Weasel Surge has arrived here, I've just discovered a RAM board is faulty, so thats off for Lifetime Warranty replacement (Sorry but this has No Life Left, so it isnt under warranty}. |
5) Message boards : Cafe : The Do Nothing Award
Message 2940 Posted 22 Apr 2016 by Phil |
I look forward to doing nothing, and getting an award! [edit] Aah, no good. I do contribute to the project, which is not doing nothing... I have not (yet) created a screenable profile, so I am not eligible..., Damn[/edit] |
6) Message boards : CMS Application : New Refactored Version (47.01)
Message 2938 Posted 22 Apr 2016 by Phil |
Am investigating ... It is a surprise to wake up and discover that 100+ jobs have gone by... |
7) Message boards : ATLAS Application : New Experimental ATLAS Application
Message 2873 Posted 20 Apr 2016 by Phil |
A new version (v0.2) is available with the memory set to 2241MB. 2 competed fine so far. |
8) Message boards : News : Project Configuration Update
Message 2853 Posted 19 Apr 2016 by Phil |
That's probably due to the Fibre internet connection he has. I've seen it actually. On the roof of the Howell Centre you can just see a Heinz Soup can, with a long piece of string that runs to one of those tiny top-floor windows of Tower D. |
9) Message boards : News : Project Configuration Update
Message 2849 Posted 19 Apr 2016 by Phil |
Have put the limit to 5 tasks in progress. It seems to have a terrible performance: State: All (66) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (16) · Invalid (0) · Error (50) |
10) Message boards : ATLAS Application : New Experimental ATLAS Application
Message 2846 Posted 19 Apr 2016 by Phil |
The Condor Server fell over last night due to a full disk. It is up and running again now. Yep I grabbed some and they're running jobs. Looks like someone fixed apache to show the logs now, still need to up the mem allocation to avoid paging out tho. |
11) Message boards : News : Project Configuration Update
Message 2845 Posted 19 Apr 2016 by Phil |
Have put the limit to 5 tasks in progress. Yep, thats about the limit for a 16GB host. |
12) Message boards : News : Project Configuration Update
Message 2826 Posted 18 Apr 2016 by Phil |
That's a great idea. If it causes Furious Confusion amongst the community, I'm sure I/we could cobble up an FAQ showing How To Get What You Want. One thing that worries me though - you'll have to replace Ivan's cute logo with something more generically CERN! |
13) Message boards : ATLAS Application : New Experimental ATLAS Application
Message 2824 Posted 18 Apr 2016 by Phil |
Why 13:14:43 in the logs? They may be stale messages from the original image build. The application was not released until 13:48:41 UTC. CONDOR seems to like Pacific rather than UTC! 70mins into the job now, the Swap is at 200M. [edit] My first run ended, but now the log says: 04/18/16 13:16:44 Running job as user nobody 04/18/16 13:16:44 Create_Process succeeded, pid=4668 04/18/16 14:53:28 Process exited, pid=4668, status=0 04/18/16 14:54:02 condor_write(): Socket closed when trying to write 65536 bytes to daemon at <188.184.187.167:9618>, fd is 14 04/18/16 14:54:02 ReliSock::put_bytes_nobuffer: Send failed. 04/18/16 14:54:02 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1) 04/18/16 14:54:02 DoUpload: STARTER at 10.0.2.15 failed to send file(s) to <188.184.187.167:9618>: error sending /var/lib/condor/execute/dir_4661/EVNT.06480895._029668.pool.root.1 04/18/16 14:54:02 File transfer failed, forcing disconnect. 04/18/16 14:54:02 Returning from CStarter::JobReaper() 04/18/16 14:55:03 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 188.184.187.167,alicondorce01.cern.ch, hostname size = 1, original ip address = 188.184.187.167 04/18/16 14:55:11 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason 04/18/16 14:55:28 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason 04/18/16 14:56:00 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason [/edit] |
14) Message boards : ATLAS Application : New Experimental ATLAS Application
Message 2821 Posted 18 Apr 2016 by Phil |
Show graphics-no output. Have a look at localhost:port/logs Mine started doing this every 2 minutes: 04/18/16 13:14:43 ** condor_starter (CONDOR_STARTER) STARTING UP 04/18/16 13:14:43 ** /usr/sbin/condor_starter 04/18/16 13:14:43 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1) 04/18/16 13:14:43 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON 04/18/16 13:14:43 ** $CondorVersion: 8.0.6 Feb 01 2014 BuildID: 225363 $ 04/18/16 13:14:43 ** $CondorPlatform: x86_64_RedHat6 $ 04/18/16 13:14:43 ** PID = 31388 04/18/16 13:14:43 ** Log last touched 4/18 13:14:42 04/18/16 13:14:43 ****************************************************** 04/18/16 13:14:43 Using config source: /etc/condor/condor_config 04/18/16 13:14:43 Using local config sources: 04/18/16 13:14:43 /etc/condor/config.d/10_security.config 04/18/16 13:14:43 /etc/condor/config.d/14_network.config 04/18/16 13:14:43 /etc/condor/config.d/20_workernode.config 04/18/16 13:14:43 /etc/condor/config.d/30_lease.config 04/18/16 13:14:43 /etc/condor/config.d/35_atlas.config 04/18/16 13:14:43 /etc/condor/config.d/40_ccb.config 04/18/16 13:14:43 /etc/condor/condor_config.local 04/18/16 13:14:43 Daemon Log is logging: D_ALWAYS D_ERROR 04/18/16 13:14:43 DaemonCore: command socket at <10.0.2.15:33345?noUDP> 04/18/16 13:14:43 DaemonCore: private command socket at <10.0.2.15:33345> 04/18/16 13:14:43 ERROR: Could not open canonicalization file '/etc/condor/certificate_mapfile' (No such file or directory) 04/18/16 13:14:44 CCBListener: heartbeat disabled because interval is configured to be 0 04/18/16 13:14:44 CCBListener: registered with CCB server alicondor01.cern.ch as ccbid 188.184.129.127:9618?addrs=188.184.129.127-9618&noUDP&sock=collector#9181 04/18/16 13:14:44 Communicating with shadow <188.184.187.167:9618?addrs=188.184.187.167-9618&noUDP&sock=6941_4ff3_269122> 04/18/16 13:14:44 Submitting machine is "alicondorce01.cern.ch" 04/18/16 13:14:45 setting the orig job name in starter 04/18/16 13:14:45 setting the orig job iwd in starter 04/18/16 13:14:45 Job has WantIOProxy=true 04/18/16 13:14:45 Initialized IO Proxy. 04/18/16 13:14:45 Done setting resource limits 04/18/16 13:14:45 condor_write(): Socket closed when trying to write 53 bytes to daemon at <10.0.2.15:54469>, fd is 14 04/18/16 13:14:45 Buf::write(): condor_write() failed 04/18/16 13:14:45 ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at <10.0.2.15:54469> (try 1 of 3): CEDAR:6002:failed to send EOM 04/18/16 13:14:45 File transfer completed successfully. 04/18/16 13:14:46 Job 268269.0 set to execute immediately 04/18/16 13:14:46 Starting a VANILLA universe job with ID: 268269.0 04/18/16 13:14:46 IWD: /var/lib/condor/execute/dir_31388 04/18/16 13:14:46 Output file: /var/lib/condor/execute/dir_31388/_condor_stdout 04/18/16 13:14:46 Error file: /var/lib/condor/execute/dir_31388/_condor_stderr 04/18/16 13:14:46 Renice expr "10" evaluated to 10 04/18/16 13:14:46 Using wrapper /usr/local/bin/job-wrapper to exec /var/lib/condor/execute/dir_31388/condor_exec.exe 04/18/16 13:14:46 Setting job's virtual memory rlimit to 0 megabytes 04/18/16 13:14:46 Running job as user nobody 04/18/16 13:14:46 Create_Process succeeded, pid=31395 04/18/16 13:16:31 Process exited, pid=31395, status=0 04/18/16 13:16:39 Got SIGQUIT. Performing fast shutdown. 04/18/16 13:16:39 ShutdownFast all jobs. 04/18/16 13:16:39 **** condor_starter (condor_STARTER) pid 31388 EXITING WITH STATUS 0 But now its picked up a job thats been running 20mins so far. It seems to be close on available memory, pagefile at 125M and climbing throught the run. |
15) Message boards : Theory Application : The Theory Application
Message 2753 Posted 14 Apr 2016 by Phil |
Back to getting failure to start work due to... I just had one of those for a BOINC job that started with Sherpa. I have also seen a number of EXT4 inode addressing errors (not recorded on the logs, alas), I'll get a new VDI. |
16) Message boards : News : Change of project name
Message 2294 Posted 9 Mar 2016 by Phil |
There is a bad redirect. I will try to fix it as soon as I can. Aah, easily done. Got a job now, running normally. |
17) Message boards : News : Change of project name
Message 2290 Posted 9 Mar 2016 by Phil |
Hmm, did you do an update for the project? For me it downloaded ok and started running. I did the same, detaching and joining at the new address, and get the same checksum errors. [edit] This is because [in Windows] files vboxwrapper_26184_windows_x86_64.exe, vboxwrapper_26184_windows_x86_64.pdb and CMS_2016_03_03.VDI are all incorrectly renamed html documents and not the correct files.[/edit] |
18) Message boards : News : New jobs available
Message 1331 Posted 26 Oct 2015 by Phil |
Yep, done 2 already. |
19) Message boards : News : No new jobs
Message 1235 Posted 12 Oct 2015 by Phil |
Through the magic of admin privileges, I have identified the user... Whew! glad its not me.... |
20) Message boards : News : No new jobs
Message 1176 Posted 3 Oct 2015 by Phil |
I did ask if I could break it... This is all down to mind-set isnt it. I once was working for a European manufacturer who had released a completely re-designed product and I was asked to test it. One of its innovations was a whole row of shiny buttons to switch through its operations. So I happily pushed all the buttons and after a few minutes everything stopped. It took a while to duplicate the error and then a day or so to find a fix for the controller so this type of error was removed. I sent my findings to the lab and got this returned: Problem: Your control problem Diagnosis: Your problem is not a Problem. Nobody would consider operating the controls in the manner you have suggested. |
©2025 CERN