Message boards :
Number crunching :
Current issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
But our computers request a certain number of tasks. They were sent out at the same time, because the are essentially zero length. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
They would still have to have removed the limit at the server end restricting tasks to one. If that was removed I assume it would resort to the cache settings ? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Correction, the clients request a certain number of seconds, and the server works out, how many tasks to send. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 46 |
Any ideas what is causing this behaviour? As far as I can tell, we're still sending out jobs. Which implies we're sending out tasks, unless there are a lot of tasks still within their 24+ hour limit. I'll look up the recipe and send more tasks to CMS-dev. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Does this also happen with CMS-simulation tasks at vLHC? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
Does this also happen with CMS-simulation tasks at vLHC? We don't know, because VirtualLHC@Home doesn't have CMS-tasks during the last 2-3 days. If it does, it should also affect vLHC-tasks and that doesn't happen. So it's a CMS-dev server issue. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
As far as I can tell, we're still sending out jobs. Which implies we're sending out tasks, unless there are a lot of tasks still within their 24+ hour limit. I'll look up the recipe and send more tasks to CMS-dev. The very few tasks sent out are resend tasks, because aborted, compute error or timed-out after the 7 days deadline. The 24 hour limit has changed to 36 hours. The graceful shutdown after a run finished after 24+ hours doesn't work. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
I just got told I needed more space too, event log shows... 05/02/2016 11:16:26 | CMS-dev | Message from server: CMS Simulation needs 5788.37MB more disk space. You currently have 3748.37 MB available and it needs 9536.74 MB. 05/02/2016 11:16:28 | CMS-dev | Started download of CMS_2016_01_28.xml 05/02/2016 11:16:28 | CMS-dev | [file_xfer] URL: http://boincai05.cern.ch/CMS-dev/download/CMS_2016_01_28.xml 05/02/2016 11:16:28 | | [http_xfer] [ID#2091] HTTP: wrote 514 bytes 05/02/2016 11:16:29 | CMS-dev | [file_xfer] http op done; retval 0 (Success) 05/02/2016 11:16:29 | CMS-dev | [file_xfer] file transfer status 0 (Success) 05/02/2016 11:16:29 | CMS-dev | Finished download of CMS_2016_01_28.xml 05/02/2016 11:16:29 | CMS-dev | [file_xfer] Throughput 3807 bytes/sec Boinc itself is correctly reporting there is over 30Gb of space to use |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I got two (vlhc-tasks) yesterday. If i request work here, i get disk low warning "needs 9000Mb" or so. If i request it at vLHC, i do not get that message. Both have 0 tasks to send, but only cms is giving that message. (disk space is not low) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
I still haven't received a task to run on my laptop but the server shows I have 16 so these new tasks will quickly disappear again :-( Edit: Every time I click Update I get more phantoms on my list but nothing is received. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 46 |
Any ideas what is causing this behaviour? Well, create_work + make_work is still chugging away on boincai05, but tasks-ready-to-send is now showing 5001. :-) I'm not convinced it will make a difference, tho'-but... |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
I tried turning some logging options on to see if anything else useful might be there... 05/02/2016 12:13:54 | CMS-dev | [work_fetch] set_request() for CPU: ninst 4 nused_total 0.00 nidle_now 2.00 fetch share 1.00 req_inst 4.00 req_secs 1406952.06 05/02/2016 12:13:54 | CMS-dev | [sched_op] Starting scheduler request 05/02/2016 12:13:54 | CMS-dev | [work_fetch] request: CPU (1406952.06 sec, 4.00 inst) 05/02/2016 12:13:55 | CMS-dev | Sending scheduler request: Requested by user. 05/02/2016 12:13:55 | CMS-dev | Requesting new tasks for CPU 05/02/2016 12:13:55 | CMS-dev | [sched_op] CPU work request: 1406952.06 seconds; 4.00 devices 05/02/2016 12:13:55 | | [http_xfer] [ID#0] HTTP: wrote 1180 bytes 05/02/2016 12:13:55 | | [http_xfer] [ID#0] HTTP: wrote 4198 bytes 05/02/2016 12:13:55 | | [http_xfer] [ID#1] HTTP: wrote 1298 bytes 05/02/2016 12:13:55 | | [http_xfer] [ID#1] HTTP: wrote 5183 bytes 05/02/2016 12:13:56 | CMS-dev | Scheduler request completed: got 0 new tasks 05/02/2016 12:13:56 | CMS-dev | [sched_op] Server version 707 05/02/2016 12:13:56 | CMS-dev | No tasks sent 05/02/2016 12:13:56 | CMS-dev | Message from server: CMS Simulation needs 5787.37MB more disk space. You currently have 3749.37 MB available and it needs 9536.74 MB. No tasks received but list on the server grows longer. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Tasks in progress has gone up fro 906 to 1136 in about 15 min. But "Tasks ready to send" remains at 5001.???? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
Detached and re-attached the CMS-dev, but fruitless. As I hoped the old ghosts are gone (abandoned), but have 32 new ghosts and no single one on my computer. Because of the irrelevant disk space messages, I tried increasing what normally makes no sense (31GB free for BOINC) with the next messages: 05 Feb 13:34:46 max disk usage: 40.00GB 05 Feb 13:35:58 Message from server: CMS Simulation needs 5218.37MB more disk space. You currently have 4318.37 MB available and it needs 9536.74 MB. 05 Feb 13:42:03 max disk usage: 100.00GB 05 Feb 13:42:42 Message from server: CMS Simulation needs 2406.63MB more disk space. You currently have 7130.11 MB available and it needs 9536.74 MB. 05 Feb 13:44:14 max disk usage: 200.00GB 05 Feb 13:45:00 Message from server: CMS Simulation needs 4923.73MB more disk space. You currently have 4613.02 MB available and it needs 9536.74 MB. After the new attach to the project and the first MB-needs message, 3 tasks were added to my list, after the second request 9 and after the 3rd request 20 tasks were added. Edit: 05 Feb 14:37:58 [disk_usage] allowed 40960.00MB used 9476.97MB 19830 CMS-dev 05 Feb 14:37:58 [disk_usage] usage 1377.75MB share 1553.12MB 19861 CMS-dev 05 Feb 14:37:58 Sending scheduler request: Requested by user. 19862 CMS-dev 05 Feb 14:37:58 Requesting new tasks for CPU 19863 CMS-dev 05 Feb 14:38:00 Scheduler request completed: got 0 new tasks 19864 CMS-dev 05 Feb 14:38:00 No tasks sent 19865 CMS-dev 05 Feb 14:38:00 Message from server: CMS Simulation needs 6663.95MB more disk space. You currently have 2872.80 MB available and it needs 9536.74 MB. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 46 |
Yes, I'm getting the same message. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
Yes, I'm getting the same message. If I wouldn't be so innocent, I would think CMS-dev has found a manner to push all of us over to VirtualLHC@home to crunch CMS-tasks there :P |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
I tried to get a CMS task at vLHC and couldn't, that was after more jobs had been released and appearing in the queue (and being taken by others) ! |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 |
There is CMS work now at vLHC, and it is working properly. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
That are not mine hosts, I've 'only' 16 virtual tasks and 1 real task in progress. Since new tasks were loaded, we have a new competition and maybe a new leader: Host with 381 tasks in progress. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I think i know, why the Jobs graph is so screwed up. It is caused by the WMAgent backfill jobs, that are ALL failing. Any news on fixing the server? |
©2024 CERN