Message boards : News : Task and CPU limiter
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 327,073
RAC: 133
Message 3902 - Posted: 31 Jul 2016, 7:48:14 UTC
Last modified: 31 Jul 2016, 7:57:12 UTC

The server has just been updated to add the feature that limits Tasks and CPUs per user. This limit can be controlled in the project preferences.

Together with my changes to the scheduler, per-project limits on jobs in progress and #CPUs should now work. But I haven't actually tested this. Laurence, please try it and tell me if it doesn't work.
-- David


Please post any feedback in this thread.
ID: 3902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3903 - Posted: 31 Jul 2016, 8:00:36 UTC - in response to Message 3902.  

Max JOBs...

Do you mean boinc-tasks or jobs?
ID: 3903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 3904 - Posted: 31 Jul 2016, 8:56:48 UTC - in response to Message 3902.  

Please post any feedback in this thread.

My settings:
Max # jobs 2
Max # CPUs 3
and I have 4 tasks (dual core VM's) in progress:

vLHCathome-dev Theory_13879_1469828750.132079_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:56:22 (05:24:34) 16,246 15:08:25 06 Aug 16:42:00 Running 100,0 [65] 00:00:00 134.68 MB 476.84 MB
vLHCathome-dev Theory_13882_1469828750.195681_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:55:54 (05:25:45) 16,202 15:08:55 06 Aug 16:42:00 Running 100,0 [62] 00:00:00 133.99 MB 476.84 MB
vLHCathome-dev Theory_13881_1469828750.174532_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:54:58 (05:24:06) 16,120 15:09:52 06 Aug 16:42:01 Running 100,0 [62] 00:03:19 134.74 MB 476.84 MB
vLHCathome-dev Theory_13875_1469828750.054634_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:53:44 (05:22:15) 16,012 15:11:10 06 Aug 16:42:01 Running 100,0 [60] 00:00:00 133.90 MB 476.84 MB

Updated the project after settings changed and afterwards asking for 1 day of work and would
expect "This computer has reached a limit on tasks in progress", but got 4 tasks:

709 vLHCathome-dev 31 Jul 10:45:47 update requested by user
710 vLHCathome-dev 31 Jul 10:45:49 Sending scheduler request: Requested by user.
711 vLHCathome-dev 31 Jul 10:45:49 Not requesting tasks: "no new tasks" requested via Manager
712 vLHCathome-dev 31 Jul 10:45:50 Scheduler request completed
716 vLHCathome-dev 31 Jul 10:46:33 work fetch resumed by user
720 vLHCathome-dev 31 Jul 10:47:00 update requested by user
721 vLHCathome-dev 31 Jul 10:47:01 Sending scheduler request: Requested by user.
722 vLHCathome-dev 31 Jul 10:47:01 Requesting new tasks for CPU
723 vLHCathome-dev 31 Jul 10:47:03 Scheduler request completed: got 4 new tasks
ID: 3904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3906 - Posted: 31 Jul 2016, 9:24:46 UTC

My settings:
Max # jobs 1
Max # CPUs 4
and I have 1 task (4 core VM's) in progress.

It downloaded 4 tasks and is running one of them.
No app_config.
(Work-buffer set to 0.5d)
ID: 3906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 3907 - Posted: 31 Jul 2016, 11:40:40 UTC

Hmm, I set my home machine (Mint Linux) to "home" location, and set my "home" preferences to Theory only, one job, 1 CPU. I had the "standard" app_config.xml:

<app_config>
<project_max_concurrent>1</project_max_concurrent>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>ALICE</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>CMS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>LHCb</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>Theory</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>


When I set the project to "allow new tasks" it started downloading the vdi (took 18 mins altogether) and gradually downloaded tasks as well, until it had fetched 8 tasks and was getting the "computer has reached limits" message. I set NNT and waited for the vdi to finish. When it was all there, it started up a Theory task -- using 8 CPUs! Not what I expected...

BTW, some jobs errored out early with a rather disturbing message:

make: Entering directory `/var/lib/condor/execute/dir_4392/rivetvm'
g++ yoda2flat-split.cc -o yoda2flat-split.exe -Wfatal-errors -Wl,-rpath /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/yoda/1.5.5/x86_64-slc6-gcc47-opt/lib `/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/yoda/1.5.5/x86_64-slc6-gcc47-opt/bin/yoda-config --cppflags --libs`
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.

See
<http://gcc.gnu.org/bugs.html>
for instructions.
make: *** [yoda2flat-split.exe] Error 4
make: Leaving directory `/var/lib/condor/execute/dir_4392/rivetvm'
./runRivet.sh: line 479: /var/lib/condor/execute/dir_4392/rivetvm/yoda2flat-split.exe: No such file or directory
ERROR: missing file name
ERROR: failed to unpack data histograms

ID: 3907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 327,073
RAC: 133
Message 3908 - Posted: 31 Jul 2016, 12:10:24 UTC - in response to Message 3907.  
Last modified: 31 Jul 2016, 12:10:36 UTC

I have just looked at David's commits and can't see anything related to the scheduler. This suggests that the values may not yet be taken into consideration.
ID: 3908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 3909 - Posted: 31 Jul 2016, 14:36:20 UTC - in response to Message 3908.  
Last modified: 31 Jul 2016, 14:54:57 UTC

Yes. It got worse. I added the app_version section for one core to app_config.xml and did "read config files", expecting the change to take effect when the current task died. Instead, it immediately backed off, allowing S@H jobs to start up again. When I looked in the top console, it showed cvmfs2 taking nearly 600% CPU. Eventually, things settled down a bit but no jobs were consuming a significant amount of CPU time. As well, the system clock was running significantly ahead of local time.
I aborted it, expecting a new 1-CPU task to start, but instead it started another 8-CPU one. At that point I paused the app and aborted all its tasks.

[Added] I then resumed the app and allowed it to download a task. This is now running as a 1-CPU task[*]. So, it looks like BOINC's behaviour is consistent -- it's the server that determines what features a task will run with, such as S@H's sending tasks designated as GPU or CPU despite the fact that the file to be analysed doesn't determine which analysis programme runs over it. In my case, it thought I could run 8-core jobs, so it sent me 8 jobs thus designated and the local client could not override that.

[*] Well, no, actually. It's taking up one CPU of my PC, but the top console shows it running a Pythia-8 and a rivetvm job simultaneously. This is getting harder to understand with each passing second! Oh, looking at the logs I see only one job which seems to be running both executables, so I guess that's the expected behaviour but I wasn't aware of it. Theoretical physicists are getting harder to understand by the second...
ID: 3909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 3910 - Posted: 31 Jul 2016, 15:36:51 UTC - in response to Message 3909.  

Hi Ivan,

It may confuse one, but it's not the server determining those VM-settings when using an app_config.
After changing and re-reading your app_config, the number of free cores are immediately recalculated, although the running VM-task still needs what's in the VM-settings.
The not yet started tasks will create VM's with your new settings, but maybe in BOINC Manager you see still the wrong number X-cpu's. Probably cause these tasks are already included in the client_state.xml. The latter is not a real problem only cosmetic.
The first problem could lead to overcomitting your machine, but the other BOINC-processes will run on lowest priority and slow down until your VM-task is ready.
The opposite can also happen.
Increasing the # of CPU's in app_config with a running VM, will set some other BOINC-tasks to 'waiting', although that VM will not use more CPU.
This will cause temporary under-committing.
ID: 3910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 3911 - Posted: 31 Jul 2016, 15:53:31 UTC - in response to Message 3909.  
Last modified: 31 Jul 2016, 15:55:13 UTC

As well, the system clock was running significantly ahead of local time.

OK, now I've got a stable 1-CPU task, it looks like the VM's clock is set to CEST rather than BST. I'd not noticed that before.
ID: 3911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 19 Aug 15
Posts: 46
Credit: 3,564,533
RAC: 274
Message 3912 - Posted: 31 Jul 2016, 18:21:21 UTC

For me the CPU limiter seems to work fine. I set one PC to 2 core and the other to 4 in the default and home locations and these were respected on the client with no app_config

The task limiter doesn't seem to work as expected, on the 4 core PC just one task was running, on the other PC, I think 9 task loaded up.
ID: 3912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 327,073
RAC: 133
Message 3913 - Posted: 31 Jul 2016, 19:02:30 UTC - in response to Message 3912.  

Received a message from Rom that the scheduler was updated, see this commit.
ID: 3913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 3914 - Posted: 31 Jul 2016, 20:05:35 UTC

Strange things are happening without using app_config:
With Max # jobs 2 and Max # CPUs 3 in my preferences and
limited in my BOINC Manager the number of cores to use to 2, I got 1 task, whereof 1 VM with 2 processors was created.
Thereafter I set the number of cores to use in BM to 4, asked new tasks and got 3 new tasks all dedicated to start a VM with 4 cores.

The --cmdline in client_state for MB RAM disappeared and the 2 core-VM is running with RAM for 1 job.
ID: 3914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 19 Aug 15
Posts: 46
Credit: 3,564,533
RAC: 274
Message 3916 - Posted: 1 Aug 2016, 8:38:02 UTC

Interesting this morning, the one PC that was se to 1/4 Job/CPU is running an 8core task no app congig

The other which is set to 1/2 with appconfig of max 2, is running one 2 core task and BOINC is running nothing else.
ID: 3916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,602,269
RAC: 1,722
Message 3917 - Posted: 1 Aug 2016, 9:20:21 UTC

I keep getting 8-core tasks after making the setting 2 core

Tried a couple times and aborted so I guess I will try a reset again.......except it now tells me I had my daily limit after the aborts so I will try to d/l again tomorrow.
Mad Scientist For Life
ID: 3917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thund3rb1rd

Send message
Joined: 20 Jun 16
Posts: 20
Credit: 1,590,002
RAC: 11
Message 3921 - Posted: 1 Aug 2016, 22:05:02 UTC

I've been successfully tweaking a combination of app_config and BOINC local settings to tailor my machines to what I want.

All of this really only affects three projects and the tasks don't take that long to run, so frankly, I'm not sure what problem is being addressed here.

As far as how much work is being downloaded, is that really an issue? Personally, I take what I can get and don't throw a fit.

Or am I so far off-base here that this posting is simply ludicrous?
ID: 3921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 327,073
RAC: 133
Message 3928 - Posted: 2 Aug 2016, 12:00:54 UTC - in response to Message 3921.  

I have just done my own tests and it looks like it isn't working. After starting from a clean install on a 4 core machine, setting 1 task and no limit for the CPUs results in 4 CMS single core tasks. If I also select 1 CPU I still get 4 single core tasks.

With the Theory app (multi-core), setting 1 task and no limit resulted in 2 4 core tasks with one waiting. If I also select 1 CPU I got 1 task with 4 cores.
ID: 3928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 3938 - Posted: 2 Aug 2016, 15:13:45 UTC - in response to Message 3928.  

On a 20-core machine I have this:
<app_config>
<project_max_concurrent>5</project_max_concurrent>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>ALICE</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>CMS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>LHCb</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>Theory</name>
<max_concurrent>3</max_concurrent>
</app>
<app_version>
  <app_name>Theory</app_name>
  <plan_class>vbox64_mt_mcore</plan_class>
  <avg_ncpus>8.000000</avg_ncpus>
  <cmdline>--nthreads 8.000000</cmdline>
  <cmdline>--memory_size_mb 10240</cmdline>
 </app_version>
</app_config>

and I've currently got one (single-core) CMS task running and two 8-core Theory tasks. top says:
top - 16:08:07 up 67 days,  5:36,  1 user,  load average: 4.11, 4.17, 4.11
Tasks: 515 total,   3 running, 511 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 76.2%sy, 11.5%ni, 12.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132113440k total, 126006772k used,  6106668k free,   420436k buffers
Swap: 33554428k total,    33804k used, 33520624k free, 112829736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
21007 eesridr   39  19 4764m 2.7g 2.6g S 659.9  2.1   1656:43 VBoxHeadless      
21294 eesridr   39  19 4670m 2.6g 2.5g S 656.9  2.0   1676:02 VBoxHeadless      
23475 eesridr   39  19 3766m 2.1g 2.0g S 100.9  1.6 245:44.50 VBoxHeadless      
26122 eesridr   39  19 3760m 2.1g 2.0g S 100.9  1.6 244:37.70 VBoxHeadless      
 4763 eesridr   39  19 64240  56m 2696 R 100.2  0.0  11:46.12 setiathome_8.00   
24241 eesridr   39  19 88816  66m 2652 R 100.2  0.1  24:46.54 setiathome_8.00   

(the other VBoxHeadless is a CMS task running under the production project).
ID: 3938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3939 - Posted: 2 Aug 2016, 15:53:22 UTC
Last modified: 2 Aug 2016, 16:52:59 UTC

Tasks: 515 total, 3 running, 511 sleeping, 0 stopped, 1 zombie


Well, you have a zombie and only 3 tasks running.
Should be at least 6.

From a static picture, it is hard to tell, which are running.
ID: 3939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 3940 - Posted: 2 Aug 2016, 16:47:16 UTC

I suppose, Ivan showed the top command from the host and not from a guest-VM.
ID: 3940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3941 - Posted: 2 Aug 2016, 16:55:46 UTC

I suppose, Ivan showed the top command from the host and not from a guest-VM.


Still, strange.
And all processes are at the lowest possible priority, which i find strange, but i do not know linux very well.
ID: 3941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Task and CPU limiter


©2024 CERN