Message boards : News : No new jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 13 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1146 - Posted: 1 Oct 2015, 20:24:49 UTC - in response to Message 1140.  

automagically


I like that word,Ivan.It is very descriptive.
ID: 1146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1147 - Posted: 1 Oct 2015, 22:03:52 UTC - in response to Message 1144.  

How long is this time-out ? ... Two hours.


Outsch. Way too short for normal BOINC-Users

As Long as this is an Alpha-Project this will be okay but for later on you should re-think about this

I can change it. At the moment it doesn't seem a big problem; if/when we start growing then I can do a statistical analysis on how many jobs time out as a function of LeaseTime but at the moment it's in the noise.
ID: 1147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1148 - Posted: 1 Oct 2015, 22:12:13 UTC - in response to Message 1146.  
Last modified: 1 Oct 2015, 22:16:10 UTC

automagically


I like that word,Ivan.It is very descriptive.
I'm afraid to say, I believe I got it from Jerry Pournelle, with whom I've had my differences...

But, it worked! Jobs are joining the queue without my intervention. Props to Marco and Hassen at CERN and Andrew at RAL who worked overtime to discover the problem, find a cure, and apply it. I'm not sure whether we lost jobs or not, the job numbers are in the 8,000s (out of 10,000) but maybe it'll cycle around back to the low numbers again. Bottom line is I think we've got enough queued to last the night, I can sweat the details in the morning.
Thanks everyone for your patience, and remember that these are a different class of job to recent ones, so your run-times, etc., will be different.
ID: 1148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 1149 - Posted: 1 Oct 2015, 23:44:01 UTC - in response to Message 1148.  

automagically


I like that word,Ivan.It is very descriptive.

I'm afraid to say, I believe I got it from Jerry Pournelle, with whom I've had my differences...

It's been in use at SETI since at least August 2004, which was the test phase of their conversion to BOINC. So you may have caught it from someone else...
ID: 1149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 738
Credit: 11,558,539
RAC: 1,940
Message 1150 - Posted: 2 Oct 2015, 3:58:55 UTC

(I only run these on 5 hosts but they always have CMS-dev work)
Mad Scientist For Life
ID: 1150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1151 - Posted: 2 Oct 2015, 7:47:12 UTC - in response to Message 1149.  

automagically


I like that word,Ivan.It is very descriptive.

I'm afraid to say, I believe I got it from Jerry Pournelle, with whom I've had my differences...

It's been in use at SETI since at least August 2004, which was the test phase of their conversion to BOINC. So you may have caught it from someone else...

Well, this was more like 1984, when he was writing a column in BYTE.
ID: 1151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1152 - Posted: 2 Oct 2015, 8:55:43 UTC - in response to Message 1148.  

Unfortunately the jobs were all returning errors so I removed them and submitted a new batch. These haven't found their way into the Condor queue yet.
ID: 1152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1153 - Posted: 2 Oct 2015, 12:06:13 UTC - in response to Message 1152.  

Unfortunately the jobs were all returning errors so I removed them and submitted a new batch. These haven't found their way into the Condor queue yet.

OK, a new batch is up and filling the queue. Thanks for your patience.
ID: 1153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 1154 - Posted: 2 Oct 2015, 12:36:09 UTC - in response to Message 1153.  

I might be wrong, but I don't think these new ones are any good either. They are going extremely fast, will look at logs to see what is going on...
ID: 1154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 1155 - Posted: 2 Oct 2015, 12:57:55 UTC - in response to Message 1153.  


OK, a new batch is up and filling the queue. Thanks for your patience.

Running well. 25 events / job
ID: 1155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 1156 - Posted: 2 Oct 2015, 13:01:59 UTC - in response to Message 1155.  

Have latched on to job 40, but only at 13:10 BST, some time after they appeared in the queue and you posted.
ID: 1156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1157 - Posted: 2 Oct 2015, 15:42:31 UTC - in response to Message 1154.  

I might be wrong, but I don't think these new ones are any good either. They are going extremely fast, will look at logs to see what is going on...

They are taking between 18 minutes and two hours according to Dashboard.
ID: 1157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1158 - Posted: 2 Oct 2015, 15:52:53 UTC - in response to Message 1156.  
Last modified: 2 Oct 2015, 15:53:17 UTC

Have latched on to job 40, but only at 13:10 BST, some time after they appeared in the queue and you posted.

I posted at 13:06 BST; the submission was at 13:03, job 40 is listed as running from 13:04 to 14:18. Mixing BST and UTC can get confusing, roll on 25/10/2015!
ID: 1158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 1159 - Posted: 2 Oct 2015, 16:08:27 UTC - in response to Message 1158.  

Have latched on to job 40, but only at 13:10 BST, some time after they appeared in the queue and you posted.

I posted at 13:06 BST; the submission was at 13:03, job 40 is listed as running from 13:04 to 14:18. Mixing BST and UTC can get confusing, roll on 25/10/2015!

It was easier when I was sitting on a Pacific island !
ID: 1159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1160 - Posted: 2 Oct 2015, 16:08:45 UTC
Last modified: 2 Oct 2015, 16:11:38 UTC

I have 30 run directories. When the new lot was issued, it started to mix the results into run-1 and run-2 directories.
I would be easier, to continue the run directories, not reset the numbering and mix the files into existing directories.

And is suggest, to keep everything to UTC(no BST or CEST), as all other worldwide projects do.
ID: 1160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1161 - Posted: 2 Oct 2015, 19:25:30 UTC

The "running jobs " graphics are not really showing running jobs.
They a showing computers, that are running tasks, but not necessarily processing jobs.
I think, this should be labeled more appropriately.
ID: 1161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 1162 - Posted: 2 Oct 2015, 21:43:43 UTC - in response to Message 1161.  
Last modified: 2 Oct 2015, 21:54:23 UTC

The "running jobs " graphics are not really showing running jobs.
They a showing computers, that are running tasks, but not necessarily processing jobs.
I think, this should be labeled more appropriately.

No, that's not quite right. The statistics at the top come from the Condor queue; in full the summary line looks like:
1073 jobs; 0 completed, 0 removed, 1002 idle, 64 running, 7 held, 0 suspended

The graphs come from Dashboard -- http://dashboard.cern.ch/cms/ -- which knows nothing about our BOINC tasks, only what Condor reports back to it about job status. (Most of the links on that page require CMS credentials; I am told that you can get through to my jobs on the "CRAB3 User Summary" link, but, if you do, make sure you adjust "filters" to a longer time period than one day.) The plot is somehow averaging the Condor running-jobs statistic over one hour. The plots come from the "Historical view" link but I'm not sure if that's publicly accessible.
ID: 1162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1163 - Posted: 2 Oct 2015, 22:02:37 UTC - in response to Message 1162.  

So why does the graph not even show a dip, when we know, there was no work running for several hours?
ID: 1163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 1164 - Posted: 2 Oct 2015, 22:35:14 UTC - in response to Message 1162.  


The graphs come from Dashboard -- http://dashboard.cern.ch/cms/ -- which knows nothing about our BOINC tasks, only what Condor reports back to it about job status. (Most of the links on that page require CMS credentials; I am told that you can get through to my jobs on the "CRAB3 User Summary" link, but, if you do, make sure you adjust "filters" to a longer time period than one day.) The plot is somehow averaging the Condor running-jobs statistic over one hour. The plots come from the "Historical view" link but I'm not sure if that's publicly accessible.

Historical View is accessible and you can see those charts if you select T3s, then select T3_CH_Volunteer (click Done) then click on 'Completed, Submitted, Pending, Running Jobs' button.

The 'NEW: Historical View' menu item lower down in the list just times out for me.
ID: 1164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 1165 - Posted: 2 Oct 2015, 22:49:13 UTC - in response to Message 1164.  

I did ask if I could break it...

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, dashboard-alarms@cern.ch and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.15 (Red Hat) Server at dashb-cms-job.cern.ch Port 80


I was just pressing things without knowing whether what I was doing was valid or not !
ID: 1165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 13 · Next

Message boards : News : No new jobs


©2024 CERN