Message boards : Number crunching : Current issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1863 - Posted: 4 Feb 2016, 23:13:04 UTC - in response to Message 1861.  

But our computers request a certain number of tasks.
They were sent out at the same time, because the are essentially zero length.
ID: 1863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,125,114
RAC: 8,980
Message 1864 - Posted: 4 Feb 2016, 23:16:00 UTC - in response to Message 1863.  

They would still have to have removed the limit at the server end restricting tasks to one. If that was removed I assume it would resort to the cache settings ?
ID: 1864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1866 - Posted: 4 Feb 2016, 23:22:40 UTC - in response to Message 1864.  

Correction, the clients request a certain number of seconds, and the server works out, how many tasks to send.
ID: 1866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,182,521
RAC: 2,043
Message 1876 - Posted: 5 Feb 2016, 10:34:29 UTC - in response to Message 1859.  

Any ideas what is causing this behaviour?

As far as I can tell, we're still sending out jobs. Which implies we're sending out tasks, unless there are a lot of tasks still within their 24+ hour limit. I'll look up the recipe and send more tasks to CMS-dev.
ID: 1876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1877 - Posted: 5 Feb 2016, 10:52:35 UTC

Does this also happen with CMS-simulation tasks at vLHC?
ID: 1877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 20
Message 1878 - Posted: 5 Feb 2016, 11:14:17 UTC - in response to Message 1877.  
Last modified: 5 Feb 2016, 11:22:12 UTC

Does this also happen with CMS-simulation tasks at vLHC?

We don't know, because VirtualLHC@Home doesn't have CMS-tasks during the last 2-3 days.
If it does, it should also affect vLHC-tasks and that doesn't happen.
So it's a CMS-dev server issue.
ID: 1878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 20
Message 1879 - Posted: 5 Feb 2016, 11:16:52 UTC - in response to Message 1876.  
Last modified: 5 Feb 2016, 11:18:37 UTC

As far as I can tell, we're still sending out jobs. Which implies we're sending out tasks, unless there are a lot of tasks still within their 24+ hour limit. I'll look up the recipe and send more tasks to CMS-dev.

The very few tasks sent out are resend tasks, because aborted, compute error or timed-out after the 7 days deadline.
The 24 hour limit has changed to 36 hours. The graceful shutdown after a run finished after 24+ hours doesn't work.
ID: 1879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,125,114
RAC: 8,980
Message 1881 - Posted: 5 Feb 2016, 11:22:42 UTC - in response to Message 1879.  

I just got told I needed more space too, event log shows...

05/02/2016 11:16:26 | CMS-dev | Message from server: CMS Simulation needs 5788.37MB more disk space. You currently have 3748.37 MB available and it needs 9536.74 MB.
05/02/2016 11:16:28 | CMS-dev | Started download of CMS_2016_01_28.xml
05/02/2016 11:16:28 | CMS-dev | [file_xfer] URL: http://boincai05.cern.ch/CMS-dev/download/CMS_2016_01_28.xml
05/02/2016 11:16:28 | | [http_xfer] [ID#2091] HTTP: wrote 514 bytes
05/02/2016 11:16:29 | CMS-dev | [file_xfer] http op done; retval 0 (Success)
05/02/2016 11:16:29 | CMS-dev | [file_xfer] file transfer status 0 (Success)
05/02/2016 11:16:29 | CMS-dev | Finished download of CMS_2016_01_28.xml
05/02/2016 11:16:29 | CMS-dev | [file_xfer] Throughput 3807 bytes/sec

Boinc itself is correctly reporting there is over 30Gb of space to use
ID: 1881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1882 - Posted: 5 Feb 2016, 11:24:05 UTC - in response to Message 1878.  
Last modified: 5 Feb 2016, 11:28:43 UTC

I got two (vlhc-tasks) yesterday.
If i request work here, i get disk low warning "needs 9000Mb" or so.
If i request it at vLHC, i do not get that message.
Both have 0 tasks to send, but only cms is giving that message.
(disk space is not low)
ID: 1882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,125,114
RAC: 8,980
Message 1883 - Posted: 5 Feb 2016, 11:39:00 UTC - in response to Message 1882.  
Last modified: 5 Feb 2016, 11:41:13 UTC

I still haven't received a task to run on my laptop but the server shows I have 16 so these new tasks will quickly disappear again :-(

Edit: Every time I click Update I get more phantoms on my list but nothing is received.
ID: 1883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,182,521
RAC: 2,043
Message 1884 - Posted: 5 Feb 2016, 12:21:58 UTC - in response to Message 1876.  

Any ideas what is causing this behaviour?

As far as I can tell, we're still sending out jobs. Which implies we're sending out tasks, unless there are a lot of tasks still within their 24+ hour limit. I'll look up the recipe and send more tasks to CMS-dev.

Well, create_work + make_work is still chugging away on boincai05, but tasks-ready-to-send is now showing 5001. :-) I'm not convinced it will make a difference, tho'-but...
ID: 1884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,125,114
RAC: 8,980
Message 1885 - Posted: 5 Feb 2016, 12:23:11 UTC - in response to Message 1883.  

I tried turning some logging options on to see if anything else useful might be there...

05/02/2016 12:13:54 | CMS-dev | [work_fetch] set_request() for CPU: ninst 4 nused_total 0.00 nidle_now 2.00 fetch share 1.00 req_inst 4.00 req_secs 1406952.06
05/02/2016 12:13:54 | CMS-dev | [sched_op] Starting scheduler request
05/02/2016 12:13:54 | CMS-dev | [work_fetch] request: CPU (1406952.06 sec, 4.00 inst)
05/02/2016 12:13:55 | CMS-dev | Sending scheduler request: Requested by user.
05/02/2016 12:13:55 | CMS-dev | Requesting new tasks for CPU
05/02/2016 12:13:55 | CMS-dev | [sched_op] CPU work request: 1406952.06 seconds; 4.00 devices
05/02/2016 12:13:55 | | [http_xfer] [ID#0] HTTP: wrote 1180 bytes
05/02/2016 12:13:55 | | [http_xfer] [ID#0] HTTP: wrote 4198 bytes
05/02/2016 12:13:55 | | [http_xfer] [ID#1] HTTP: wrote 1298 bytes
05/02/2016 12:13:55 | | [http_xfer] [ID#1] HTTP: wrote 5183 bytes
05/02/2016 12:13:56 | CMS-dev | Scheduler request completed: got 0 new tasks
05/02/2016 12:13:56 | CMS-dev | [sched_op] Server version 707
05/02/2016 12:13:56 | CMS-dev | No tasks sent
05/02/2016 12:13:56 | CMS-dev | Message from server: CMS Simulation needs 5787.37MB more disk space. You currently have 3749.37 MB available and it needs 9536.74 MB.

No tasks received but list on the server grows longer.
ID: 1885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1886 - Posted: 5 Feb 2016, 12:32:40 UTC

Tasks in progress has gone up fro 906 to 1136 in about 15 min.
But "Tasks ready to send" remains at 5001.????
ID: 1886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 20
Message 1887 - Posted: 5 Feb 2016, 13:08:47 UTC
Last modified: 5 Feb 2016, 13:40:44 UTC

Detached and re-attached the CMS-dev, but fruitless.
As I hoped the old ghosts are gone (abandoned), but have 32 new ghosts and no single one on my computer.

Because of the irrelevant disk space messages, I tried increasing what normally makes no sense (31GB free for BOINC) with the next messages:

05 Feb 13:34:46 max disk usage: 40.00GB
05 Feb 13:35:58 Message from server: CMS Simulation needs 5218.37MB more disk space. You currently have 4318.37 MB available and it needs 9536.74 MB.
05 Feb 13:42:03 max disk usage: 100.00GB
05 Feb 13:42:42 Message from server: CMS Simulation needs 2406.63MB more disk space. You currently have 7130.11 MB available and it needs 9536.74 MB.
05 Feb 13:44:14 max disk usage: 200.00GB
05 Feb 13:45:00 Message from server: CMS Simulation needs 4923.73MB more disk space. You currently have 4613.02 MB available and it needs 9536.74 MB.


After the new attach to the project and the first MB-needs message, 3 tasks were added to my list, after the second request 9 and after the 3rd request 20 tasks were added.

Edit: 05 Feb 14:37:58 [disk_usage] allowed 40960.00MB used 9476.97MB
19830 CMS-dev 05 Feb 14:37:58 [disk_usage] usage 1377.75MB share 1553.12MB
19861 CMS-dev 05 Feb 14:37:58 Sending scheduler request: Requested by user.
19862 CMS-dev 05 Feb 14:37:58 Requesting new tasks for CPU
19863 CMS-dev 05 Feb 14:38:00 Scheduler request completed: got 0 new tasks
19864 CMS-dev 05 Feb 14:38:00 No tasks sent
19865 CMS-dev 05 Feb 14:38:00 Message from server: CMS Simulation needs 6663.95MB more disk space. You currently have 2872.80 MB available and it needs 9536.74 MB.
ID: 1887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,182,521
RAC: 2,043
Message 1888 - Posted: 5 Feb 2016, 15:51:14 UTC - in response to Message 1887.  

Yes, I'm getting the same message.
ID: 1888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 20
Message 1889 - Posted: 5 Feb 2016, 15:56:39 UTC - in response to Message 1888.  

Yes, I'm getting the same message.

If I wouldn't be so innocent, I would think CMS-dev has found a manner to push all of us over to VirtualLHC@home to crunch CMS-tasks there :P
ID: 1889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,125,114
RAC: 8,980
Message 1890 - Posted: 5 Feb 2016, 16:15:51 UTC - in response to Message 1889.  

I tried to get a CMS task at vLHC and couldn't, that was after more jobs had been released and appearing in the queue (and being taken by others) !
ID: 1890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 1891 - Posted: 5 Feb 2016, 16:40:08 UTC

There is CMS work now at vLHC, and it is working properly.
ID: 1891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 20
Message 1893 - Posted: 6 Feb 2016, 1:23:29 UTC - in response to Message 1853.  

That are not mine hosts, I've 'only' 16 virtual tasks and 1 real task in progress.

I know, that's why you are 3rd and I'm now 4th.

Since new tasks were loaded, we have a new competition and maybe a new leader: Host with 381 tasks in progress.
ID: 1893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1898 - Posted: 6 Feb 2016, 16:08:23 UTC

I think i know, why the Jobs graph is so screwed up.
It is caused by the WMAgent backfill jobs, that are ALL failing.

Any news on fixing the server?
ID: 1898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Current issues


©2024 CERN