Message boards : Number crunching : Current issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 1647 - Posted: 27 Jan 2016, 10:57:13 UTC - in response to Message 1646.  

A current task should pick up a new job once its one-hour pause is over. That's the theory, anyway...

Thanks, Laurence.

In practice too!

Returned successful the first job after the restart: jobNumber=385
ID: 1647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1778 - Posted: 1 Feb 2016, 13:32:45 UTC

Another "runaway".
10+ fails from the same IP address and still continuing.
ID: 1778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,871,767
RAC: 16,520
Message 1841 - Posted: 3 Feb 2016, 22:21:39 UTC

Yesterday's task that started at 5:30pm was reported at 20:26 today, not a problem.

However I now have 7 tasks listed as having been sent to the computer (472) all at 20:26:27...

77329 68196 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77355 68758 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77356 68888 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77365 68749 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77366 68824 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77377 68431 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77277 68012 472 3 Feb 2016, 20:26:27 UTC 10 Feb 2016, 20:26:27 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)

Looking on the computer there is only 1 task listed, it is using 1 full core (on a Helix job), the one it has is the bottom one in the list above (name: CMS_14774_1427806996.975027_1)

Nothing changed my end !
ID: 1841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1842 - Posted: 3 Feb 2016, 22:59:29 UTC

Would it be possible to allow for a second task on one computer or, even better,allow a setting to specify the number of cores to be used?
ID: 1842 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,660
RAC: 6,664
Message 1843 - Posted: 3 Feb 2016, 23:18:03 UTC - in response to Message 1841.  

Yesterday's task that started at 5:30pm was reported at 20:26 today, not a problem.

However I now have 7 tasks listed as having been sent to the computer (472) all at 20:26:27...


Looking on the computer there is only 1 task listed, it is using 1 full core (on a Helix job), the one it has is the bottom one in the list above (name: CMS_14774_1427806996.975027_1)

Nothing changed my end !


I sort of had the same thing
http://boincai05.cern.ch/CMS-dev/results.php?userid=192

I sent in one and got one back to do but when I look here at my *Tasks* it has 2 in progress and one of them is the same number as the one just sent back.

(I am only testing these on one host now)
ID: 1843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,871,767
RAC: 16,520
Message 1844 - Posted: 3 Feb 2016, 23:28:10 UTC - in response to Message 1843.  
Last modified: 3 Feb 2016, 23:37:05 UTC

I sort of had the same thing
http://boincai05.cern.ch/CMS-dev/results.php?userid=192

I sent in one and got one back to do but when I look here at my *Tasks* it has 2 in progress and one of them is the same number as the one just sent back.

(I am only testing these on one host now)

I can see you have two sent out but the numbers don't match the one returned (unless the task name is the same which I can't see, I can I was too lazy to click !)...

77354 68905 3 Feb 2016, 19:08:02 UTC 10 Feb 2016, 19:08:02 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77211 66415 3 Feb 2016, 19:08:02 UTC 10 Feb 2016, 19:08:02 UTC In progress --- --- --- CMS Simulation v46.22 (vbox64)
77171 66820 2 Feb 2016, 10:18:38 UTC 3 Feb 2016, 14:32:27 UTC Completed and validated

At least you have 8 cores for them to run on, my old computer only has 4 HT cores so no idea why it thinks it should have 7 tasks to do !
ID: 1844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,660
RAC: 6,664
Message 1845 - Posted: 4 Feb 2016, 1:02:16 UTC - in response to Message 1844.  

Well they thing is that those *2* new tasks it says are in progress are NOT even on this host.

Just one of them. (77354)

That other one is not on the host.
ID: 1845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 1847 - Posted: 4 Feb 2016, 14:32:33 UTC
Last modified: 4 Feb 2016, 14:34:57 UTC

I got 1, but within a few minutes 16 assigned, but only task 65665 is on my system.

77583 56057 4 Feb 2016, 13:02:33 UTC 11 Feb 2016, 13:02:33 UTC In progress
77098 65665 4 Feb 2016, 13:02:33 UTC 11 Feb 2016, 13:02:33 UTC In progress
77777 63563 4 Feb 2016, 13:00:20 UTC 11 Feb 2016, 13:00:20 UTC In progress
77778 58580 4 Feb 2016, 13:00:20 UTC 11 Feb 2016, 13:00:20 UTC In progress
77819 68313 4 Feb 2016, 13:00:20 UTC 11 Feb 2016, 13:00:20 UTC In progress
77820 68006 4 Feb 2016, 13:00:20 UTC 11 Feb 2016, 13:00:20 UTC In progress
77802 67138 4 Feb 2016, 13:00:08 UTC 11 Feb 2016, 13:00:08 UTC In progress
77803 66530 4 Feb 2016, 13:00:08 UTC 11 Feb 2016, 13:00:08 UTC In progress
77804 65222 4 Feb 2016, 13:00:08 UTC 11 Feb 2016, 13:00:08 UTC In progress
77805 66706 4 Feb 2016, 13:00:08 UTC 11 Feb 2016, 13:00:08 UTC In progress
77405 69140 4 Feb 2016, 12:59:23 UTC 11 Feb 2016, 12:59:23 UTC In progress
77438 69057 4 Feb 2016, 12:59:23 UTC 11 Feb 2016, 12:59:23 UTC In progress
77440 69783 4 Feb 2016, 12:59:23 UTC 11 Feb 2016, 12:59:23 UTC In progress
77798 64754 4 Feb 2016, 12:57:49 UTC 11 Feb 2016, 12:57:49 UTC In progress
77807 62932 4 Feb 2016, 12:57:49 UTC 11 Feb 2016, 12:57:49 UTC In progress
77808 67308 4 Feb 2016, 12:57:49 UTC 11 Feb 2016, 12:57:49 UTC In progress

Not strange that the project status shows 536 tasks in progress.

Although I had 31GB free diskspace available for BOINC, I got a few times:

04-Feb-2016 13:57:46 [CMS-dev] Sending scheduler request: To fetch work.
04-Feb-2016 13:57:46 [CMS-dev] Requesting new tasks for CPU
04-Feb-2016 13:57:48 [CMS-dev] Scheduler request completed: got 0 new tasks
04-Feb-2016 13:57:48 [CMS-dev] No tasks sent
04-Feb-2016 13:57:48 [CMS-dev] CMS Simulation needs 5895.94MB more disk space. You currently have 3640.80 MB available and it needs 9536.74 MB.


Probably related, but did not get tasks at that moment.
After restarting BOINC client, I got task 65665.
ID: 1847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1848 - Posted: 4 Feb 2016, 14:37:55 UTC

Because of that the "Task ready to send" are falling like a rock.
Please keep an eye on that.
ID: 1848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 1849 - Posted: 4 Feb 2016, 16:08:49 UTC - in response to Message 1848.  

Because of that the "Task ready to send" are falling like a rock.
Please keep an eye on that.

We're just surviving, cause all last ~thousand tasks are resends of workunits original created in March 2015.
ID: 1849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 1850 - Posted: 4 Feb 2016, 16:47:06 UTC
Last modified: 4 Feb 2016, 16:50:15 UTC

ID: 1850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,871,767
RAC: 16,520
Message 1851 - Posted: 4 Feb 2016, 16:51:49 UTC - in response to Message 1850.  
Last modified: 4 Feb 2016, 16:52:25 UTC

I was going to congratulate you on winning the trophy for most jobs, puts me down to 3rd now :-(

No wonder all the jobs are disappearing from the queue !

Edit: Down to 4th now !
ID: 1851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 1852 - Posted: 4 Feb 2016, 16:55:28 UTC - in response to Message 1851.  

I was going to congratulate you on winning the trophy for most jobs, puts me down to 3rd now :-(

That are not mine hosts, I've 'only' 16 virtual tasks and 1 real task in progress.
ID: 1852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,871,767
RAC: 16,520
Message 1853 - Posted: 4 Feb 2016, 17:07:09 UTC - in response to Message 1852.  

That are not mine hosts, I've 'only' 16 virtual tasks and 1 real task in progress.

I know, that's why you are 3rd and I'm now 4th.
ID: 1853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,660
RAC: 6,664
Message 1856 - Posted: 4 Feb 2016, 20:09:12 UTC

THIS has to be an *Issue*




Mad Scientist For Life
ID: 1856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1857 - Posted: 4 Feb 2016, 20:19:06 UTC - in response to Message 1856.  

I know, the invitation code is a problem........
ID: 1857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,945,813
RAC: 2,949
Message 1858 - Posted: 4 Feb 2016, 20:46:54 UTC - in response to Message 1849.  

Because of that the "Task ready to send" are falling like a rock.
Please keep an eye on that.

We're just surviving, cause all last ~thousand tasks are resends of workunits original created in March 2015.

March 2015 was probably the last time I created tasks (and first time, Daniele had done them before that)... I'm stumped, Dashboard, as unreliable as it is, is showing recent progress though there is a huge spike in failures on the jobs graph. Let's leave it overnight, I have to dig out the recipe on how to create more tasks, and see what happens to the progress charts and tables.
ID: 1858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 87
Message 1859 - Posted: 4 Feb 2016, 22:09:57 UTC - in response to Message 1847.  

Any ideas what is causing this behaviour?
ID: 1859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1860 - Posted: 4 Feb 2016, 22:48:54 UTC - in response to Message 1859.  

I have not seen this on my computer,but we need to know:

a)When did it start?
b)Is it also present at vLHC?(with cms-simulation tasks)
c)Do the computers, it happened to, have something in common* that makes them different to the other ones.

Any other suggestions/comments?

* (like os, boinc version, vbox version etc)
ID: 1860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,871,767
RAC: 16,520
Message 1861 - Posted: 4 Feb 2016, 23:00:41 UTC - in response to Message 1860.  

I would think the fact that I got 7 tasks all sent out at the exact same time means it started at the server end rather than the client end.

You haven't upgraded anything ?
ID: 1861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Current issues


©2024 CERN