Thread 'New Native App

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5940 - Posted: 18 Feb 2019, 20:24:31 UTC - in response to Message 5938. Strange, nothing much changed. Will look into it a bit later this evening. Am investigating... A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project. ID: 5940 · Rating: 0 · rate: / Reply Quote

m Volunteer tester Send message Joined: 20 Mar 15 Posts: 243 Credit: 901,716 RAC: 43	Message 5941 - Posted: 18 Feb 2019, 20:58:08 UTC - in response to Message 5940. Last modified: 18 Feb 2019, 21:33:25 UTC A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. OK. now. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752887 Don't know how often MCPlots is updated, but they aren't there yet. Slots seem OK now, too... just got to go and clean up the previious left-overs. ID: 5941 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5942 - Posted: 19 Feb 2019, 6:35:02 UTC - in response to Message 5941. Don't know how often MCPlots is updated, but they aren't there yet. Your host 1497 #date_d ngood nbad total 2016-10-21 2 0 2 2019-02-19 5 0 5 ID: 5942 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 5943 - Posted: 19 Feb 2019, 7:33:40 UTC - in response to Message 5940. The results should be visible in MCPlots under the vLHCdev project. Only 1 of my currently active hosts is included in the list (3406), but without recent work. Host 3718 does not appear. http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=408 ID: 5943 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5944 - Posted: 19 Feb 2019, 8:33:07 UTC - in response to Message 5943. The results should be visible in MCPlots under the vLHCdev project. Only 1 of my currently active hosts is included in the list (3406), but without recent work. Host 3718 does not appear. http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=408 There is a bit a of a delay between when a result is returned and when MCPlots is updated. ID: 5944 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5945 - Posted: 19 Feb 2019, 8:35:05 UTC - in response to Message 5940. A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project. Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version. ID: 5945 · Rating: 0 · rate: / Reply Quote

gyllic Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0	Message 5946 - Posted: 19 Feb 2019, 8:51:12 UTC - in response to Message 5945. Last modified: 19 Feb 2019, 8:53:09 UTC So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure). Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned. ID: 5946 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 5947 - Posted: 19 Feb 2019, 9:19:24 UTC - in response to Message 5945. Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version. The first Computer for Windows testing will be from...... MAGIC! ID: 5947 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5948 - Posted: 19 Feb 2019, 12:23:07 UTC - in response to Message 5945. I have changed some limits on the server to try to reduce the number of tasks taken by blackhole hosts. Please let me know if you are unable to get new tasks. ID: 5948 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 5949 - Posted: 19 Feb 2019, 12:49:28 UTC - in response to Message 5948. Please let me know if you are unable to get new tasks. Can't get more than 2 tasks although the host simulates 3 cores and the web preferences are set to max #tasks = 4. Was able to get 4 tasks earlier today on the same host (configured to simulate 4 cores). ID: 5949 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5950 - Posted: 19 Feb 2019, 13:05:56 UTC - in response to Message 5949. Last modified: 19 Feb 2019, 13:06:48 UTC Please let me know if you are unable to get new tasks. Can't get more than 2 tasks although the host simulates 3 cores and the web preferences are set to max #tasks = 4. Was able to get 4 tasks earlier today on the same host (configured to simulate 4 cores). I set max_wus_in_progress to 2. <max_wus_in_progress> M </max_wus_in_progress> Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is NNCPUS and the max GPU jobs in progress is MNGPUs. Otherwise, the overall maximum is NNCPUS + MNGPUS). See the following section for a more powerful way of expressing limits on in-progress jobs. This should scale with the number of cores and I will hoping that this would leave no more than one job waiting per core. ID: 5950 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 5951 - Posted: 19 Feb 2019, 13:32:34 UTC Strange runtime/CPU-time values. If I run a single Theory task, runtime and CPU-time are exactly the same. See: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752866 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752582 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752892 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752583 If I run more than 1 singlecore tasks concurrently on the same host, runtime and CPU-time show significant differences. See: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752980 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752982 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752976 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752977 Does anybody know why? ID: 5951 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5952 - Posted: 19 Feb 2019, 14:09:41 UTC - in response to Message 5950. Last modified: 19 Feb 2019, 14:13:51 UTC <max_wus_in_progress> M </max_wus_in_progress> Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is NNCPUS and the max GPU jobs in progress is MNGPUs. Otherwise, the overall maximum is NNCPUS + MNGPUS). See the following section for a more powerful way of expressing limits on in-progress jobs. This should scale with the number of cores and I will hoping that this would leave no more than one job waiting per core. If it was your purpose to set a maximum of 2 task per core you had to set <max_wus_to_send>2</max_wus_to_send> for that, I suppose. Your above setting is the max per host. ID: 5952 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5953 - Posted: 19 Feb 2019, 18:34:40 UTC - in response to Message 5952. If it was your purpose to set a maximum of 2 task per core you had to set 2 for that, I suppose. Your above setting is the max per host. From there perspective of the project In Progress means handed out to the client. Therefore I should have set the maximum a client can take to 2 x N, where N is the number of cores. If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). ID: 5953 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5954 - Posted: 19 Feb 2019, 20:09:55 UTC - in response to Message 5953. If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). The current setting is 2 wu's per host. I've a quad-core host, 2 tasks running and when I request more work, I get: di 19 feb 2019 21:08:45 CET \| lhcathome-dev \| Requesting new tasks for CPU di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| Scheduler request completed: got 0 new tasks di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| No tasks sent di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| No tasks are available for Theory Simulation di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| This computer has reached a limit on tasks in progress ID: 5954 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5955 - Posted: 19 Feb 2019, 20:29:33 UTC - in response to Message 5954. If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request). The current setting is 2 wu's per host. I've a quad-core host, 2 tasks running and when I request more work, I get: di 19 feb 2019 21:08:45 CET \| lhcathome-dev \| Requesting new tasks for CPU di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| Scheduler request completed: got 0 new tasks di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| No tasks sent di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| No tasks are available for Theory Simulation di 19 feb 2019 21:08:47 CET \| lhcathome-dev \| This computer has reached a limit on tasks in progress Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know? ID: 5955 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5956 - Posted: 20 Feb 2019, 8:13:19 UTC - in response to Message 5955. Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know? I agree the documentation is confusing when you read Maximum jobs returned per scheduler RPC, you would expect that it is a maximum per request, but in fact it is setting the limit of wu's per core. <max_wus_to_send>1</max_wus_to_send> would send for each core 1 task. <max_wus_to_send>2</max_wus_to_send> would send for each core 2 tasks, so 1 'Ready to Start' for each core. When a host is requesting more work, it will send the number of cores and how many tasks of the project are already loaded on the host/client. I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>. You could start with <max_wus_to_send>1</max_wus_to_send> ID: 5956 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5957 - Posted: 20 Feb 2019, 8:16:33 UTC - in response to Message 5956. Last modified: 20 Feb 2019, 8:17:26 UTC I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>. You could start with <max_wus_to_send>1</max_wus_to_send> I have set my test-VM to no new work and increase my number of CPU's starting with 1 one by one to see how it works. ID: 5957 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 5958 - Posted: 20 Feb 2019, 9:04:38 UTC - in response to Message 5957. I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4. ID: 5958 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 5960 - Posted: 20 Feb 2019, 9:16:09 UTC - in response to Message 5958. I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4. I have reduced my ncpus to 2 and have now 2 running and 2 Ready to start. On next work request: This computer has reach a limit on tasks in progress. ID: 5960 · Rating: 0 · rate: / Reply Quote

Development for LHC@home