Message boards : News : No new jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 13 · Next

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1166 - Posted: 2 Oct 2015, 22:53:00 UTC - in response to Message 1163.  
Last modified: 2 Oct 2015, 23:24:37 UTC

So why does the graph not even show a dip, when we know, there was no work running for several hours?

Well, for one, there were still slow jobs from the previous batch trickling in; for another Dashboard didn't necessarily know immediately that I'd canned the jobs. Dashboard is intrinsically always behind times and is a cause of lots of confusion and bewilderment in the CMS community -- for example the second graph seems to overstate the number of failed jobs because these are later realised to have succeeded but the data aren't adjusted a posteriori. So really they should be taken with a grain of salt -- "objects in the rear mirror may be closer than they appear".
[Edit] I'll add a link to a graphic showing the command and data flow of CMS@Home when I get the chance, tho' it currently doesn't show how and where Dashboard hooks into it. It's not going to be easy to do from home so I might leave it for Monday afternoon or so; I need to do some gardening for the first time in about five years this weekend... :-( [/Edit]
ID: 1166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1167 - Posted: 2 Oct 2015, 23:01:21 UTC - in response to Message 1165.  

I did ask if I could break it...

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, dashboard-alarms@cern.ch and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.15 (Red Hat) Server at dashb-cms-job.cern.ch Port 80


I was just pressing things without knowing whether what I was doing was valid or not !

Well done that volunteer! I'd encourage you to make that report, looks like an internal (if highly obscure) bug that needs to be squashed [if only by restricting public access...]. I understand that one testing technique in vogue these days is to throw random input at a process; it tests robustness against incorrect input and may show up vulnerabilities before the black-hats find them. Of course it also points out the need for a more-intuitive user interface, I'll admit Dashboard isn't the easiest thing to drive.
ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,646,681
RAC: 15,851
Message 1168 - Posted: 2 Oct 2015, 23:06:33 UTC - in response to Message 1167.  

I've moved onto ..

The server failed to serve your request, please retry. In case of new failure contact the admin.

I'll have another go tomorrow and try to reduce it to the least amount of clicks to make it reproducible before I report it.
ID: 1168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1169 - Posted: 2 Oct 2015, 23:10:58 UTC - in response to Message 1168.  

Cheers!
ID: 1169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1170 - Posted: 2 Oct 2015, 23:22:25 UTC

Just to confirm.

Site T3s

T3_CH_Volunteers

That is us, users working trough boinc cms-dev project?
ID: 1170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1171 - Posted: 2 Oct 2015, 23:31:53 UTC - in response to Message 1170.  

Just to confirm.

Site T3s

T3_CH_Volunteers

That is us, users working trough boinc cms-dev project?

Yes, we are now officially CMS site T3_CH_Volunteer (no "s"). T3 means Tier 3 (Tier 0 is CERN, Tier 1 is large (usually national) centres (RAL in the UK, INFN in Italy, FNAL in the US...), Tier 2 are the regionals (London, Scotgrid, etc.) and Tier3 are sites not expected to host data, just allow use by "local" users, so usually at a departmental scale); CH is Switzerland of course, and Volunteer is mainly self-explanatory.
ID: 1171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 1172 - Posted: 2 Oct 2015, 23:40:43 UTC - in response to Message 1171.  

So we are officially contributing results to the larger simulation database?
ID: 1172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1173 - Posted: 2 Oct 2015, 23:52:40 UTC - in response to Message 1172.  
Last modified: 3 Oct 2015, 0:07:30 UTC

So we are officially contributing results to the larger simulation database?

No, not yet. Apart from anything else we need "validation" before our results are trusted. I'll remind you of the "-dev" suffix in our name...
We hope to make a large step towards that recognition when I give a presentation at CERN on the 15th (Major panic today when I was booking my flights; after my bank accepted my credit card but before BA could confirm my booking, ba.com went belly-up! Luckily I eventually got a confirmation e-mail from them an hour or two later.)
What I hope to do by then is transfer the result files accumulated so far from where they now land (the so-called "data bridge") to the Storage Element at my Uni, so I can aggregate and analyse all the output files, and compare them to similar results produced by conventional means (i.e. normal GRID/CRAB3 jobs). This requires a lot of behind-the-scenes setting-up of trusted credentials. Luckily a colleague at Imperial College has the means and the (rusty) expertise to help in this; I hope to $DEITY it's completed early next week or I may not have the time for a full analysis.
ID: 1173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1174 - Posted: 3 Oct 2015, 0:00:15 UTC - in response to Message 1173.  

What is the difference between CMS-dev and the other cern boinc projects(atlas, lhc)?
I know, it is late, so, whenever you have the time...
ID: 1174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1175 - Posted: 3 Oct 2015, 0:23:46 UTC - in response to Message 1174.  

What is the difference between CMS-dev and the other cern boinc projects(atlas, lhc)?
I know, it is late, so, whenever you have the time...

Mainly, that we haven't been verified by the collaboration, and that some people in the Collaboration haven't even realised that we exist. And that we're not into an automatic production status -- I'm doing all the job creation and submission by myself, for example. Hopefully that will change soon, progress is being made on a few fronts. (Note that my role has expanded considerably from what I thought I'd signed up for...)
So, if all goes well by the 15th I'll have concrete results to set before the Collaboration, and job submission will have moved to a more-sustainable model while result files will be able to be transferred under strict security control from the data-bridge to GRID storage.
When that comes together, I'd hope to be able to declare a beta phase, and the project will move to a more unified framework being set up to encompass all LHC volunteer computing effort. Then we should be able to move quickly to set up CMS@Home proper, and invite volunteers from around the world to participate. I'd sort of like that to happen before I reach retirement age next year...

Now, if you'll excuse me, I haven't started on this weekend's Guardian Prize crossword yet.
ID: 1175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 1176 - Posted: 3 Oct 2015, 11:36:33 UTC - in response to Message 1167.  

I did ask if I could break it...

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.


I was just pressing things without knowing whether what I was doing was valid or not !

Well done that volunteer! I'd encourage you to make that report, looks like an internal (if highly obscure) bug that needs to be squashed [if only by restricting public access...]. I understand that one testing technique in vogue these days is to throw random input at a process; it tests robustness against incorrect input and may show up vulnerabilities before the black-hats find them.

This is all down to mind-set isnt it.
I once was working for a European manufacturer who had released a completely re-designed product and I was asked to test it. One of its innovations was a whole row of shiny buttons to switch through its operations. So I happily pushed all the buttons and after a few minutes everything stopped. It took a while to duplicate the error and then a day or so to find a fix for the controller so this type of error was removed. I sent my findings to the lab and got this returned:

Problem: Your control problem

Diagnosis: Your problem is not a Problem. Nobody would consider operating the controls in the manner you have suggested.
ID: 1176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 17 Aug 15
Posts: 62
Credit: 296,695
RAC: 0
Message 1177 - Posted: 3 Oct 2015, 12:48:26 UTC

I was working for Honeywell Information Systems Italy when it launched its first UNIX computer. We sent one to a technical journalist so that he could test it. The nest day he phoned: your computer does not work. But what have you done? we asked him. I typed "diskformat" he said. Out of the more 300 UNIX command he had chosen the only one that erased the disk. We eliminated that command.
Tullio
ID: 1177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1178 - Posted: 3 Oct 2015, 14:43:24 UTC - in response to Message 1177.  

I was working for Honeywell Information Systems Italy when it launched its first UNIX computer. We sent one to a technical journalist so that he could test it. The nest day he phoned: your computer does not work. But what have you done? we asked him. I typed "diskformat" he said. Out of the more 300 UNIX command he had chosen the only one that erased the disk. We eliminated that command.

Well, obviously that's better than questioning his Italian machismo by popping up a confirmation, "Are you sure?"
ID: 1178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1182 - Posted: 5 Oct 2015, 15:41:15 UTC - in response to Message 1147.  

How long is this time-out ? ... Two hours.


Outsch. Way too short for normal BOINC-Users

As Long as this is an Alpha-Project this will be okay but for later on you should re-think about this

I can change it. At the moment it doesn't seem a big problem; if/when we start growing then I can do a statistical analysis on how many jobs time out as a function of LeaseTime but at the moment it's in the noise.


Here is a stripe from a log from a normal BOINC-Session on my Desktop:

05/10/2015 16:43:20 | | Suspending computation - an exclusive app is running
05/10/2015 16:43:20 | | Suspending network activity - an exclusive app is running
05/10/2015 16:51:47 | ATLAS@home | update requested by user
05/10/2015 16:51:48 | ATLAS@home | Sending scheduler request: Requested by user.
05/10/2015 16:51:48 | ATLAS@home | Requesting new tasks for CPU
05/10/2015 16:51:50 | ATLAS@home | Scheduler request completed: got 0 new tasks
05/10/2015 16:51:50 | ATLAS@home | No tasks sent
05/10/2015 17:11:47 | ATLAS@home | update requested by user
05/10/2015 17:11:48 | ATLAS@home | Sending scheduler request: Requested by user.
05/10/2015 17:11:48 | ATLAS@home | Requesting new tasks for CPU
05/10/2015 17:11:50 | ATLAS@home | Scheduler request completed: got 0 new tasks
05/10/2015 17:11:50 | ATLAS@home | No tasks sent
05/10/2015 17:21:17 | | Resuming network activity
05/10/2015 17:22:23 | | Resuming computation
05/10/2015 17:22:26 | CMS-dev | [checkpoint] result CMS_8172_1427806845.384120_0 checkpointed
05/10/2015 17:22:26 | ATLAS@home | [checkpoint] result xnUMDmQVB3mnDDn7oo6G73TpABFKDmABFKDmoCNKDmABFKDmcAurXn_0 checkpointed
05/10/2015 17:22:26 | ATLAS@home | [checkpoint] result 6lhLDmiVD3mnDDn7oo6G73TpABFKDmABFKDmiSHKDmABFKDmE6a95n_0 checkpointed
05/10/2015 17:22:33 | | Suspending computation - CPU is busy
05/10/2015 17:23:06 | | Resuming computation
05/10/2015 17:23:27 | ATLAS@home | Sending scheduler request: To fetch work.
05/10/2015 17:23:27 | ATLAS@home | Requesting new tasks for CPU
05/10/2015 17:23:31 | ATLAS@home | Scheduler request completed: got 0 new tasks
05/10/2015 17:23:31 | ATLAS@home | No tasks sent
05/10/2015 17:26:21 | | Suspending computation - CPU is busy
05/10/2015 17:26:32 | | Resuming computation
05/10/2015 17:26:53 | | Suspending computation - CPU is busy
05/10/2015 17:27:03 | | Resuming computation
05/10/2015 17:27:25 | | Suspending computation - CPU is busy
05/10/2015 17:27:35 | | Resuming computation

As you can see, a lot of things are done by BOINC to Keep my Workstation running well as a normal Desktop; These pauses are normal for BOINC-Users that have not exclusiv-crunching PCs.

So, I think this collides with your timeout from 2 hours; no normal cruncher is Aware of this.

What happens, if I come over your timeout ? My Client will continue crunching the outtimed WU, that is already given to another cruncher and waste crucnhing time or will it be aborted by the Condor (?) Queue ?

You should consider adjusting the timeout to 12/24 hours

Can jobs like these become the unknown error-jobs ?
ID: 1182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1183 - Posted: 5 Oct 2015, 18:38:49 UTC - in response to Message 1182.  

What happens, if I come over your timeout ? My Client will continue crunching the outtimed WU, that is already given to another cruncher and waste crucnhing time or will it be aborted by the Condor (?) Queue ?
It should move to the end of the queue and be re-sent when its time comes, unless it has been sent three times already.

You should consider adjusting the timeout to 12/24 hours
OK, tell me if you see a difference from now on.

Can jobs like these become the unknown error-jobs ?
It's possible, but I don't think so; as far as I know Condor tells Dashboard when it re-schedules a job.
ID: 1183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,610,376
RAC: 1,406
Message 1184 - Posted: 6 Oct 2015, 5:11:44 UTC

HTCondor 8.4.0 DAGMan
ID: 1184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1185 - Posted: 6 Oct 2015, 9:11:39 UTC - in response to Message 1166.  

[Edit] I'll add a link to a graphic showing the command and data flow of CMS@Home when I get the chance, tho' it currently doesn't show how and where Dashboard hooks into it. It's not going to be easy to do from home so I might leave it for Monday afternoon or so.[/Edit]


ID: 1185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,610,376
RAC: 1,406
Message 1187 - Posted: 6 Oct 2015, 19:01:52 UTC - in response to Message 1185.  

DataBridge
ID: 1187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1194 - Posted: 8 Oct 2015, 6:00:34 UTC

Are we out of Jobs ?
ID: 1194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,896
RAC: 266
Message 1196 - Posted: 8 Oct 2015, 14:06:41 UTC - in response to Message 1194.  

Are we out of Jobs ?

No: http://boincai05.cern.ch/CMS-dev/cms_job.php -- around 2,000 to go in this batch. It's going to go past the seven-day default proxy certificate at about 1200 UTC tomorrow, but I've just worked out how to put in a new one; I'll try that tomorrow morning in case I do something wrong.
ID: 1196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 13 · Next

Message boards : News : No new jobs


©2024 CERN