Message boards :
Theory Application :
New Native App - Linux Only
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
OK. now. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752887 Don't know how often MCPlots is updated, but they aren't there yet. Slots seem OK now, too... just got to go and clean up the previious left-overs. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Don't know how often MCPlots is updated, but they aren't there yet.Your host 1497 #date_d ngood nbad total 2016-10-21 2 0 2 2019-02-19 5 0 5 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
The results should be visible in MCPlots under the vLHCdev project. Only 1 of my currently active hosts is included in the list (3406), but without recent work. Host 3718 does not appear. http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=408 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The results should be visible in MCPlots under the vLHCdev project. There is a bit a of a delay between when a result is returned and when MCPlots is updated. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project. Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version. |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure). Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version. The first Computer for Windows testing will be from...... MAGIC! |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I have changed some limits on the server to try to reduce the number of tasks taken by blackhole hosts. Please let me know if you are unable to get new tasks. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Please let me know if you are unable to get new tasks. Can't get more than 2 tasks although the host simulates 3 cores and the web preferences are set to max #tasks = 4. Was able to get 4 tasks earlier today on the same host (configured to simulate 4 cores). |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Please let me know if you are unable to get new tasks. I set max_wus_in_progress to 2. <max_wus_in_progress> M </max_wus_in_progress> Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS and the max GPU jobs in progress is M*NGPUs. Otherwise, the overall maximum is N*NCPUS + M*NGPUS). See the following section for a more powerful way of expressing limits on in-progress jobs. This should scale with the number of cores and I will hoping that this would leave no more than one job waiting per core. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Strange runtime/CPU-time values. If I run a single Theory task, runtime and CPU-time are exactly the same. See: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752866 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752582 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752892 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752583 If I run more than 1 singlecore tasks concurrently on the same host, runtime and CPU-time show significant differences. See: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752980 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752982 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752976 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752977 Does anybody know why? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
If it was your purpose to set a maximum of 2 task per core you had to set <max_wus_to_send>2</max_wus_to_send> for that, I suppose.<max_wus_in_progress> M </max_wus_in_progress> Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS and the max GPU jobs in progress is M*NGPUs. Otherwise, the overall maximum is N*NCPUS + M*NGPUS). See the following section for a more powerful way of expressing limits on in-progress jobs. Your above setting is the max per host. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
If it was your purpose to set a maximum of 2 task per core you had to set From there perspective of the project In Progress means handed out to the client. Therefore I should have set the maximum a client can take to 2 x N, where N is the number of cores. If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). The current setting is 2 wu's per host. I've a quad-core host, 2 tasks running and when I request more work, I get: di 19 feb 2019 21:08:45 CET | lhcathome-dev | Requesting new tasks for CPU di 19 feb 2019 21:08:47 CET | lhcathome-dev | Scheduler request completed: got 0 new tasks di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks sent di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks are available for Theory Simulation di 19 feb 2019 21:08:47 CET | lhcathome-dev | This computer has reached a limit on tasks in progress |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know?I agree the documentation is confusing when you read Maximum jobs returned per scheduler RPC, you would expect that it is a maximum per request, but in fact it is setting the limit of wu's per core. <max_wus_to_send>1</max_wus_to_send> would send for each core 1 task. <max_wus_to_send>2</max_wus_to_send> would send for each core 2 tasks, so 1 'Ready to Start' for each core. When a host is requesting more work, it will send the number of cores and how many tasks of the project are already loaded on the host/client. I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>. You could start with <max_wus_to_send>1</max_wus_to_send> |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>. I have set my test-VM to no new work and increase my number of CPU's starting with 1 one by one to see how it works. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4.I have reduced my ncpus to 2 and have now 2 running and 2 Ready to start. On next work request: This computer has reach a limit on tasks in progress. |
©2024 CERN