Message boards : Theory Application : New Native App - Linux Only
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5940 - Posted: 18 Feb 2019, 20:24:31 UTC - in response to Message 5938.  


Strange, nothing much changed. Will look into it a bit later this evening.


Am investigating...


A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project.
ID: 5940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 101
Message 5941 - Posted: 18 Feb 2019, 20:58:08 UTC - in response to Message 5940.  
Last modified: 18 Feb 2019, 21:33:25 UTC


A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories.

OK. now.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752887
Don't know how often MCPlots is updated, but they aren't there yet.
Slots seem OK now, too... just got to go and clean up the previious left-overs.
ID: 5941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5942 - Posted: 19 Feb 2019, 6:35:02 UTC - in response to Message 5941.  

Don't know how often MCPlots is updated, but they aren't there yet.
Your host 1497
#date_d 	ngood 	nbad 	total
2016-10-21	2	0	2
2019-02-19	5	0	5
ID: 5942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 261
Message 5943 - Posted: 19 Feb 2019, 7:33:40 UTC - in response to Message 5940.  

The results should be visible in MCPlots under the vLHCdev project.

Only 1 of my currently active hosts is included in the list (3406), but without recent work.
Host 3718 does not appear.

http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=408
ID: 5943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5944 - Posted: 19 Feb 2019, 8:33:07 UTC - in response to Message 5943.  

The results should be visible in MCPlots under the vLHCdev project.

Only 1 of my currently active hosts is included in the list (3406), but without recent work.
Host 3718 does not appear.

http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=408


There is a bit a of a delay between when a result is returned and when MCPlots is updated.
ID: 5944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5945 - Posted: 19 Feb 2019, 8:35:05 UTC - in response to Message 5940.  

A new version is available. The issue was a missing file in the latest batch of jobs. The code has been updated to be more robust in such situations and in addition the file system is cleaned after the jobs as I noticed that we were collecting slot directories. The results should be visible in MCPlots under the vLHCdev project.


Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version.
ID: 5945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5946 - Posted: 19 Feb 2019, 8:51:12 UTC - in response to Message 5945.  
Last modified: 19 Feb 2019, 8:53:09 UTC

So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure).
Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned.
ID: 5946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,901,432
RAC: 5,043
Message 5947 - Posted: 19 Feb 2019, 9:19:24 UTC - in response to Message 5945.  

Everything seems fine so let me know if there are any issues. CP pointed out that suspend/resume is not working as it should. Containers support this feature so I need to investigate. First I will now attempt the Windows version.


The first Computer for Windows testing will be from...... MAGIC!
ID: 5947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5948 - Posted: 19 Feb 2019, 12:23:07 UTC - in response to Message 5945.  

I have changed some limits on the server to try to reduce the number of tasks taken by blackhole hosts. Please let me know if you are unable to get new tasks.
ID: 5948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 261
Message 5949 - Posted: 19 Feb 2019, 12:49:28 UTC - in response to Message 5948.  

Please let me know if you are unable to get new tasks.

Can't get more than 2 tasks although the host simulates 3 cores and the web preferences are set to max #tasks = 4.
Was able to get 4 tasks earlier today on the same host (configured to simulate 4 cores).
ID: 5949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5950 - Posted: 19 Feb 2019, 13:05:56 UTC - in response to Message 5949.  
Last modified: 19 Feb 2019, 13:06:48 UTC

Please let me know if you are unable to get new tasks.

Can't get more than 2 tasks although the host simulates 3 cores and the web preferences are set to max #tasks = 4.
Was able to get 4 tasks earlier today on the same host (configured to simulate 4 cores).


I set max_wus_in_progress to 2.

<max_wus_in_progress> M </max_wus_in_progress>
    Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS and the max GPU jobs in progress is M*NGPUs. Otherwise, the overall maximum is N*NCPUS + M*NGPUS). 

    See the following section for a more powerful way of expressing limits on in-progress jobs.


This should scale with the number of cores and I will hoping that this would leave no more than one job waiting per core.
ID: 5950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 261
Message 5951 - Posted: 19 Feb 2019, 13:32:34 UTC

ID: 5951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5952 - Posted: 19 Feb 2019, 14:09:41 UTC - in response to Message 5950.  
Last modified: 19 Feb 2019, 14:13:51 UTC

<max_wus_in_progress> M </max_wus_in_progress>
    Limit the number of jobs in progress on a given host (and thus limit average turnaround time). Starting with 6.8, the BOINC client report the resources used by in-progress jobs; in this case, the max CPU jobs in progress is N*NCPUS and the max GPU jobs in progress is M*NGPUs. Otherwise, the overall maximum is N*NCPUS + M*NGPUS). 

    See the following section for a more powerful way of expressing limits on in-progress jobs.


This should scale with the number of cores and I will hoping that this would leave no more than one job waiting per core.
If it was your purpose to set a maximum of 2 task per core you had to set <max_wus_to_send>2</max_wus_to_send> for that, I suppose.
Your above setting is the max per host.
ID: 5952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5953 - Posted: 19 Feb 2019, 18:34:40 UTC - in response to Message 5952.  

If it was your purpose to set a maximum of 2 task per core you had to set 2 for that, I suppose.
Your above setting is the max per host.


From there perspective of the project In Progress means handed out to the client. Therefore I should have set the maximum a client can take to 2 x N, where N is the number of cores. If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request).
ID: 5953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5954 - Posted: 19 Feb 2019, 20:09:55 UTC - in response to Message 5953.  

If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request).

The current setting is 2 wu's per host.
I've a quad-core host, 2 tasks running and when I request more work, I get:
di 19 feb 2019 21:08:45 CET | lhcathome-dev | Requesting new tasks for CPU
di 19 feb 2019 21:08:47 CET | lhcathome-dev | Scheduler request completed: got 0 new tasks
di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks sent
di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks are available for Theory Simulation
di 19 feb 2019 21:08:47 CET | lhcathome-dev | This computer has reached a limit on tasks in progress
ID: 5954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5955 - Posted: 19 Feb 2019, 20:29:33 UTC - in response to Message 5954.  

If one task is run per core and all cores are used, there should be one task running and one waiting to run. I want to avoid bad hosts bunkering tasks. The max_wus_to_send is per RPC (update request).

The current setting is 2 wu's per host.
I've a quad-core host, 2 tasks running and when I request more work, I get:
di 19 feb 2019 21:08:45 CET | lhcathome-dev | Requesting new tasks for CPU
di 19 feb 2019 21:08:47 CET | lhcathome-dev | Scheduler request completed: got 0 new tasks
di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks sent
di 19 feb 2019 21:08:47 CET | lhcathome-dev | No tasks are available for Theory Simulation
di 19 feb 2019 21:08:47 CET | lhcathome-dev | This computer has reached a limit on tasks in progress


Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know?
ID: 5955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5956 - Posted: 20 Feb 2019, 8:13:19 UTC - in response to Message 5955.  

Maybe NCPUs in the documentation is physical CPU (sockets) rather than logical (cores). Does anyone know?
I agree the documentation is confusing when you read Maximum jobs returned per scheduler RPC, you would expect that it is a maximum per request,
but in fact it is setting the limit of wu's per core.
<max_wus_to_send>1</max_wus_to_send> would send for each core 1 task.
<max_wus_to_send>2</max_wus_to_send> would send for each core 2 tasks, so 1 'Ready to Start' for each core.

When a host is requesting more work, it will send the number of cores and how many tasks of the project are already loaded on the host/client.

I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>.
You could start with <max_wus_to_send>1</max_wus_to_send>
ID: 5956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5957 - Posted: 20 Feb 2019, 8:16:33 UTC - in response to Message 5956.  
Last modified: 20 Feb 2019, 8:17:26 UTC

I suggest give the above mentioned config a try and of-course remove or increase <max_wus_in_progress>.
You could start with <max_wus_to_send>1</max_wus_to_send>

I have set my test-VM to no new work and increase my number of CPU's starting with 1 one by one to see how it works.
ID: 5957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5958 - Posted: 20 Feb 2019, 9:04:38 UTC - in response to Message 5957.  

I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4.
ID: 5958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5960 - Posted: 20 Feb 2019, 9:16:09 UTC - in response to Message 5958.  

I will increase one value at a time. Firstly max_wus_in_progress has been increased to 4.
I have reduced my ncpus to 2 and have now 2 running and 2 Ready to start.
On next work request: This computer has reach a limit on tasks in progress.
ID: 5960 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Theory Application : New Native App - Linux Only


©2024 CERN