Posts by Phil

1) Message boards : Cafe : Lego Update (Message 3408) Posted 19 May 2016 by Phil Post: I notice a recent update to this thread (in spite of weasels): There has been talk of rewards once we really get off the ground, perhaps as far as a guided tour of CERN and CMS. For the moment, though, that's as close as SETI@Home's mythical toaster. There is another BOINC project that has handed out Lego kits for special challenges - after a bit of research I discovered Lego is suitable for collider simulations! Indeed! here.
2) Message boards : CMS Application : Credentials (Message 3304) Posted 10 May 2016 by Phil Post: Credentials not working. Thanks to tasks failing, quota ruined. Just saw that myself and reported it. Its across all projects.
3) Message boards : Theory Application : New version with app_config.xml (Message 3198) Posted 3 May 2016 by Phil Post: Laurence: BOINC Manager is still saying vLHCathome-dev: Notice from BOINC Your app_config.xml file refers to an unknown application 'ALICE'. Known applications: 'CMS', 'Theory', 'LHCb', 'ATLAS' 03/05/2016 17:25:00
4) Message boards : CMS Application : Busy for a bit... (Message 3197) Posted 3 May 2016 by Phil Post: Update: just got back from CERN -- at 1 AM local! (Lots of woes on the way back: plane 90 mins late; then another 10 minutes or so on tarmac at LHR waiting for ground power to be connected 'cos the onboard generator was borked so the engines had to be kept running for electrickery; then 30 minutes to slowly snake through the non-EU Immigration queue. Usually its the opposite - wait an hour in departure lounge because they cant start the plane (sorry, the AUX generator wont go, we are waiting for a mobile supply) of course they have a dozen mobile generators but they're all for Airbus and this is the only Boeing flight the airline has, and the only generator with a Boeing plug on it is at another terminal 4 miles away. The 2nd-last 350 bus of the night was about to leave when I got to the stand at 0011; then a 14 minute wait for the last U5 bus at West Drayton -- the last U3 was right behind it, both get me to my nearest stop.) I usually arrive West Drayton late on a train, and stand around wondering if all the Last Busses have gone. In the lonely darkness can be seen an advert for a taxi company, and usually while contemplating dialing, The Last Bus arrives. Recently though, it has been supplemented with a big LED Display that flashes CALL NOW FOR TAXI followed by a 12-digit number thats visible for about 250ms and impossible to remember - weirdly its totally different from the phone number on the painted sign. While wondering of this is an update, the sign then flashes MERRY XMAS so I'm not sure if this new sign is 4 months out of date or 8 months into the future. Sometimes I have the opposite problem - arrive at Uxbridge and look for a bus toward West Drayton. There will be 20-30 buses stood around at Uxbridge, all with hopeful-looking numbers on the front. But of course, they are all just queued to be put into the garage for the night. Theres usually around a dozen Drunks&Wierdos who keep clambering aboard each one in turn, shouting, vomiting and worse. Aah, well. I'll look at issues raised when I surface later today, must watch the BBC news on catch-up now to see what they say about "The Weasel that Killed the LHC"! Seems The Weasel Surge has arrived here, I've just discovered a RAM board is faulty, so thats off for Lifetime Warranty replacement (Sorry but this has No Life Left, so it isnt under warranty}.
5) Message boards : Cafe : The Do Nothing Award (Message 2940) Posted 22 Apr 2016 by Phil Post: I look forward to doing nothing, and getting an award! [edit] Aah, no good. I do contribute to the project, which is not doing nothing... I have not (yet) created a screenable profile, so I am not eligible..., Damn[/edit]
6) Message boards : CMS Application : New Refactored Version (47.01) (Message 2938) Posted 22 Apr 2016 by Phil Post: Am investigating ... It is a surprise to wake up and discover that 100+ jobs have gone by...
7) Message boards : ATLAS Application : New Experimental ATLAS Application (Message 2873) Posted 20 Apr 2016 by Phil Post: A new version (v0.2) is available with the memory set to 2241MB. 2 competed fine so far.
8) Message boards : News : Project Configuration Update (Message 2853) Posted 19 Apr 2016 by Phil Post: That's probably due to the Fibre internet connection he has. Shame the Fibre in question is a piece of damp string ;-) I've seen it actually. On the roof of the Howell Centre you can just see a Heinz Soup can, with a long piece of string that runs to one of those tiny top-floor windows of Tower D.
9) Message boards : News : Project Configuration Update (Message 2849) Posted 19 Apr 2016 by Phil Post: Have put the limit to 5 tasks in progress. Yep, thats about the limit for a 16GB host. Aww, but what about my 128 GB host (with 20 cores...)? It seems to have a terrible performance: State: All (66) · In progress (0) · Validation pending (0) · Validation inconclusive (0) · Valid (16) · Invalid (0) · Error (50)
10) Message boards : ATLAS Application : New Experimental ATLAS Application (Message 2846) Posted 19 Apr 2016 by Phil Post: The Condor Server fell over last night due to a full disk. It is up and running again now. Yep I grabbed some and they're running jobs. Looks like someone fixed apache to show the logs now, still need to up the mem allocation to avoid paging out tho.
11) Message boards : News : Project Configuration Update (Message 2845) Posted 19 Apr 2016 by Phil Post: Have put the limit to 5 tasks in progress. Yep, thats about the limit for a 16GB host.
12) Message boards : News : Project Configuration Update (Message 2826) Posted 18 Apr 2016 by Phil Post: That's a great idea. If it causes Furious Confusion amongst the community, I'm sure I/we could cobble up an FAQ showing How To Get What You Want. One thing that worries me though - you'll have to replace Ivan's cute logo with something more generically CERN!
13) Message boards : ATLAS Application : New Experimental ATLAS Application (Message 2824) Posted 18 Apr 2016 by Phil Post: Why 13:14:43 in the logs? They may be stale messages from the original image build. The application was not released until 13:48:41 UTC. CONDOR seems to like Pacific rather than UTC! 70mins into the job now, the Swap is at 200M. [edit] My first run ended, but now the log says: 04/18/16 13:16:44 Running job as user nobody 04/18/16 13:16:44 Create_Process succeeded, pid=4668 04/18/16 14:53:28 Process exited, pid=4668, status=0 04/18/16 14:54:02 condor_write(): Socket closed when trying to write 65536 bytes to daemon at <188.184.187.167:9618>, fd is 14 04/18/16 14:54:02 ReliSock::put_bytes_nobuffer: Send failed. 04/18/16 14:54:02 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1) 04/18/16 14:54:02 DoUpload: STARTER at 10.0.2.15 failed to send file(s) to <188.184.187.167:9618>: error sending /var/lib/condor/execute/dir_4661/EVNT.06480895._029668.pool.root.1 04/18/16 14:54:02 File transfer failed, forcing disconnect. 04/18/16 14:54:02 Returning from CStarter::JobReaper() 04/18/16 14:55:03 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: WRITE authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 188.184.187.167,alicondorce01.cern.ch, hostname size = 1, original ip address = 188.184.187.167 04/18/16 14:55:11 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason 04/18/16 14:55:28 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason 04/18/16 14:56:00 PERMISSION DENIED to submit-side@matchsession from host 188.184.187.167 for command 1200 (CA_CMD), access level WRITE: reason: cached result for WRITE; see first case for the full reason [/edit]
14) Message boards : ATLAS Application : New Experimental ATLAS Application (Message 2821) Posted 18 Apr 2016 by Phil Post: Show graphics-no output. Have a look at localhost:port/logs Mine started doing this every 2 minutes: 04/18/16 13:14:43 condor_starter (CONDOR_STARTER) STARTING UP 04/18/16 13:14:43 /usr/sbin/condor_starter 04/18/16 13:14:43 SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1) 04/18/16 13:14:43 Configuration: subsystem:STARTER local:<NONE> class:DAEMON 04/18/16 13:14:43 $CondorVersion: 8.0.6 Feb 01 2014 BuildID: 225363 $ 04/18/16 13:14:43 $CondorPlatform: x86_64_RedHat6 $ 04/18/16 13:14:43 PID = 31388 04/18/16 13:14:43 Log last touched 4/18 13:14:42 04/18/16 13:14:43 **************************************************** 04/18/16 13:14:43 Using config source: /etc/condor/condor_config 04/18/16 13:14:43 Using local config sources: 04/18/16 13:14:43 /etc/condor/config.d/10_security.config 04/18/16 13:14:43 /etc/condor/config.d/14_network.config 04/18/16 13:14:43 /etc/condor/config.d/20_workernode.config 04/18/16 13:14:43 /etc/condor/config.d/30_lease.config 04/18/16 13:14:43 /etc/condor/config.d/35_atlas.config 04/18/16 13:14:43 /etc/condor/config.d/40_ccb.config 04/18/16 13:14:43 /etc/condor/condor_config.local 04/18/16 13:14:43 Daemon Log is logging: D_ALWAYS D_ERROR 04/18/16 13:14:43 DaemonCore: command socket at <10.0.2.15:33345?noUDP> 04/18/16 13:14:43 DaemonCore: private command socket at <10.0.2.15:33345> 04/18/16 13:14:43 ERROR: Could not open canonicalization file '/etc/condor/certificate_mapfile' (No such file or directory) 04/18/16 13:14:44 CCBListener: heartbeat disabled because interval is configured to be 0 04/18/16 13:14:44 CCBListener: registered with CCB server alicondor01.cern.ch as ccbid 188.184.129.127:9618?addrs=188.184.129.127-9618&noUDP&sock=collector#9181 04/18/16 13:14:44 Communicating with shadow <188.184.187.167:9618?addrs=188.184.187.167-9618&noUDP&sock=6941_4ff3_269122> 04/18/16 13:14:44 Submitting machine is "alicondorce01.cern.ch" 04/18/16 13:14:45 setting the orig job name in starter 04/18/16 13:14:45 setting the orig job iwd in starter 04/18/16 13:14:45 Job has WantIOProxy=true 04/18/16 13:14:45 Initialized IO Proxy. 04/18/16 13:14:45 Done setting resource limits 04/18/16 13:14:45 condor_write(): Socket closed when trying to write 53 bytes to daemon at <10.0.2.15:54469>, fd is 14 04/18/16 13:14:45 Buf::write(): condor_write() failed 04/18/16 13:14:45 ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at <10.0.2.15:54469> (try 1 of 3): CEDAR:6002:failed to send EOM 04/18/16 13:14:45 File transfer completed successfully. 04/18/16 13:14:46 Job 268269.0 set to execute immediately 04/18/16 13:14:46 Starting a VANILLA universe job with ID: 268269.0 04/18/16 13:14:46 IWD: /var/lib/condor/execute/dir_31388 04/18/16 13:14:46 Output file: /var/lib/condor/execute/dir_31388/_condor_stdout 04/18/16 13:14:46 Error file: /var/lib/condor/execute/dir_31388/_condor_stderr 04/18/16 13:14:46 Renice expr "10" evaluated to 10 04/18/16 13:14:46 Using wrapper /usr/local/bin/job-wrapper to exec /var/lib/condor/execute/dir_31388/condor_exec.exe 04/18/16 13:14:46 Setting job's virtual memory rlimit to 0 megabytes 04/18/16 13:14:46 Running job as user nobody 04/18/16 13:14:46 Create_Process succeeded, pid=31395 04/18/16 13:16:31 Process exited, pid=31395, status=0 04/18/16 13:16:39 Got SIGQUIT. Performing fast shutdown. 04/18/16 13:16:39 ShutdownFast all jobs. 04/18/16 13:16:39 ** condor_starter (condor_STARTER) pid 31388 EXITING WITH STATUS 0 But now its picked up a job thats been running 20mins so far. It seems to be close on available memory, pagefile at 125M and climbing throught the run.
15) Message boards : Theory Application : The Theory Application (Message 2753) Posted 14 Apr 2016 by Phil Post: Back to getting failure to start work due to... 2016-04-14 12:17:09 (16392): Guest Log: [INFO] Theory application starting. Check log files. 2016-04-14 12:23:10 (16392): Guest Log: [ERROR] App is not supported. Shutting down! I just had one of those for a BOINC job that started with Sherpa. I have also seen a number of EXT4 inode addressing errors (not recorded on the logs, alas), I'll get a new VDI.
16) Message boards : News : Change of project name (Message 2294) Posted 9 Mar 2016 by Phil Post: There is a bad redirect. I will try to fix it as soon as I can. Aah, easily done. Got a job now, running normally.
17) Message boards : News : Change of project name (Message 2290) Posted 9 Mar 2016 by Phil Post: Hmm, did you do an update for the project? For me it downloaded ok and started running. But there is the credential issue for the VM that we should soon have sorted out with Laurence. I detached and reattached, but I've been following Dave Colling's meeting so not entirely concentrating. I'll try again. I did the same, detaching and joining at the new address, and get the same checksum errors. [edit] This is because [in Windows] files vboxwrapper_26184_windows_x86_64.exe, vboxwrapper_26184_windows_x86_64.pdb and CMS_2016_03_03.VDI are all incorrectly renamed html documents and not the correct files.[/edit]
18) Message boards : News : New jobs available (Message 1331) Posted 26 Oct 2015 by Phil Post: Yep, done 2 already.
19) Message boards : News : No new jobs (Message 1235) Posted 12 Oct 2015 by Phil Post: Through the magic of admin privileges, I have identified the user... Whew! glad its not me....
20) Message boards : News : No new jobs (Message 1176) Posted 3 Oct 2015 by Phil Post: I did ask if I could break it... Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. I was just pressing things without knowing whether what I was doing was valid or not ! Well done that volunteer! I'd encourage you to make that report, looks like an internal (if highly obscure) bug that needs to be squashed [if only by restricting public access...]. I understand that one testing technique in vogue these days is to throw random input at a process; it tests robustness against incorrect input and may show up vulnerabilities before the black-hats find them. This is all down to mind-set isnt it. I once was working for a European manufacturer who had released a completely re-designed product and I was asked to test it. One of its innovations was a whole row of shiny buttons to switch through its operations. So I happily pushed all the buttons and after a few minutes everything stopped. It took a while to duplicate the error and then a day or so to find a fix for the controller so this type of error was removed. I sent my findings to the lab and got this returned: Problem: Your control problem Diagnosis: Your problem is not a Problem. Nobody would consider operating the controls in the manner you have suggested.

Next 20

Development for LHC@home