Thread 'Task and CPU limiter'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1134 Credit: 339,231 RAC: 0	Message 3902 - Posted: 31 Jul 2016, 7:48:14 UTC Last modified: 31 Jul 2016, 7:57:12 UTC The server has just been updated to add the feature that limits Tasks and CPUs per user. This limit can be controlled in the project preferences. Together with my changes to the scheduler, per-project limits on jobs in progress and #CPUs should now work. But I haven't actually tested this. Laurence, please try it and tell me if it doesn't work. -- David Please post any feedback in this thread. ID: 3902 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3903 - Posted: 31 Jul 2016, 8:00:36 UTC - in response to Message 3902. Max JOBs... Do you mean boinc-tasks or jobs? ID: 3903 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1230 Credit: 946,490 RAC: 403	Message 3904 - Posted: 31 Jul 2016, 8:56:48 UTC - in response to Message 3902. Please post any feedback in this thread. My settings: Max # jobs 2 Max # CPUs 3 and I have 4 tasks (dual core VM's) in progress: vLHCathome-dev Theory_13879_1469828750.132079_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:56:22 (05:24:34) 16,246 15:08:25 06 Aug 16:42:00 Running 100,0 [65] 00:00:00 134.68 MB 476.84 MB vLHCathome-dev Theory_13882_1469828750.195681_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:55:54 (05:25:45) 16,202 15:08:55 06 Aug 16:42:00 Running 100,0 [62] 00:00:00 133.99 MB 476.84 MB vLHCathome-dev Theory_13881_1469828750.174532_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:54:58 (05:24:06) 16,120 15:09:52 06 Aug 16:42:01 Running 100,0 [62] 00:03:19 134.74 MB 476.84 MB vLHCathome-dev Theory_13875_1469828750.054634_0 2.02 Theory Simulation (vbox64_mt_mcore) 02:53:44 (05:22:15) 16,012 15:11:10 06 Aug 16:42:01 Running 100,0 [60] 00:00:00 133.90 MB 476.84 MB Updated the project after settings changed and afterwards asking for 1 day of work and would expect "This computer has reached a limit on tasks in progress", but got 4 tasks: 709 vLHCathome-dev 31 Jul 10:45:47 update requested by user 710 vLHCathome-dev 31 Jul 10:45:49 Sending scheduler request: Requested by user. 711 vLHCathome-dev 31 Jul 10:45:49 Not requesting tasks: "no new tasks" requested via Manager 712 vLHCathome-dev 31 Jul 10:45:50 Scheduler request completed 716 vLHCathome-dev 31 Jul 10:46:33 work fetch resumed by user 720 vLHCathome-dev 31 Jul 10:47:00 update requested by user 721 vLHCathome-dev 31 Jul 10:47:01 Sending scheduler request: Requested by user. 722 vLHCathome-dev 31 Jul 10:47:01 Requesting new tasks for CPU 723 vLHCathome-dev 31 Jul 10:47:03 Scheduler request completed: got 4 new tasks ID: 3904 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3906 - Posted: 31 Jul 2016, 9:24:46 UTC My settings: Max # jobs 1 Max # CPUs 4 and I have 1 task (4 core VM's) in progress. It downloaded 4 tasks and is running one of them. No app_config. (Work-buffer set to 0.5d) ID: 3906 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0	Message 3907 - Posted: 31 Jul 2016, 11:40:40 UTC Hmm, I set my home machine (Mint Linux) to "home" location, and set my "home" preferences to Theory only, one job, 1 CPU. I had the "standard" app_config.xml: <app_config> <project_max_concurrent>1</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>ALICE</name> <max_concurrent>1</max_concurrent> </app> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>LHCb</name> <max_concurrent>1</max_concurrent> </app> <app> <name>Theory</name> <max_concurrent>1</max_concurrent> </app> </app_config> When I set the project to "allow new tasks" it started downloading the vdi (took 18 mins altogether) and gradually downloaded tasks as well, until it had fetched 8 tasks and was getting the "computer has reached limits" message. I set NNT and waited for the vdi to finish. When it was all there, it started up a Theory task -- using 8 CPUs! Not what I expected... BTW, some jobs errored out early with a rather disturbing message: make: Entering directory `/var/lib/condor/execute/dir_4392/rivetvm' g++ yoda2flat-split.cc -o yoda2flat-split.exe -Wfatal-errors -Wl,-rpath /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/yoda/1.5.5/x86_64-slc6-gcc47-opt/lib `/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/yoda/1.5.5/x86_64-slc6-gcc47-opt/bin/yoda-config --cppflags --libs` g++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. make: *** [yoda2flat-split.exe] Error 4 make: Leaving directory `/var/lib/condor/execute/dir_4392/rivetvm' ./runRivet.sh: line 479: /var/lib/condor/execute/dir_4392/rivetvm/yoda2flat-split.exe: No such file or directory ERROR: missing file name ERROR: failed to unpack data histograms ID: 3907 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1134 Credit: 339,231 RAC: 0	Message 3908 - Posted: 31 Jul 2016, 12:10:24 UTC - in response to Message 3907. Last modified: 31 Jul 2016, 12:10:36 UTC I have just looked at David's commits and can't see anything related to the scheduler. This suggests that the values may not yet be taken into consideration. ID: 3908 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0	Message 3909 - Posted: 31 Jul 2016, 14:36:20 UTC - in response to Message 3908. Last modified: 31 Jul 2016, 14:54:57 UTC Yes. It got worse. I added the app_version section for one core to app_config.xml and did "read config files", expecting the change to take effect when the current task died. Instead, it immediately backed off, allowing S@H jobs to start up again. When I looked in the top console, it showed cvmfs2 taking nearly 600% CPU. Eventually, things settled down a bit but no jobs were consuming a significant amount of CPU time. As well, the system clock was running significantly ahead of local time. I aborted it, expecting a new 1-CPU task to start, but instead it started another 8-CPU one. At that point I paused the app and aborted all its tasks. [Added] I then resumed the app and allowed it to download a task. This is now running as a 1-CPU task[]. So, it looks like BOINC's behaviour is consistent -- it's the server* that determines what features a task will run with, such as S@H's sending tasks designated as GPU or CPU despite the fact that the file to be analysed doesn't determine which analysis programme runs over it. In my case, it thought I could run 8-core jobs, so it sent me 8 jobs thus designated and the local client could not override that. [] Well, no, actually. It's taking up one CPU of my PC, but the top console shows it running a Pythia-8 and a rivetvm job simultaneously. This is getting harder to understand with each passing second! Oh, looking at the logs I see only one job which seems to be running both executables, so I guess that's the expected behaviour but I wasn't aware of it. Theoretical physicists* are getting harder to understand by the second... ID: 3909 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1230 Credit: 946,490 RAC: 403	Message 3910 - Posted: 31 Jul 2016, 15:36:51 UTC - in response to Message 3909. Hi Ivan, It may confuse one, but it's not the server determining those VM-settings when using an app_config. After changing and re-reading your app_config, the number of free cores are immediately recalculated, although the running VM-task still needs what's in the VM-settings. The not yet started tasks will create VM's with your new settings, but maybe in BOINC Manager you see still the wrong number X-cpu's. Probably cause these tasks are already included in the client_state.xml. The latter is not a real problem only cosmetic. The first problem could lead to overcomitting your machine, but the other BOINC-processes will run on lowest priority and slow down until your VM-task is ready. The opposite can also happen. Increasing the # of CPU's in app_config with a running VM, will set some other BOINC-tasks to 'waiting', although that VM will not use more CPU. This will cause temporary under-committing. ID: 3910 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0	Message 3911 - Posted: 31 Jul 2016, 15:53:31 UTC - in response to Message 3909. Last modified: 31 Jul 2016, 15:55:13 UTC As well, the system clock was running significantly ahead of local time. OK, now I've got a stable 1-CPU task, it looks like the VM's clock is set to CEST rather than BST. I'd not noticed that before. ID: 3911 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 19 Aug 15 Posts: 65 Credit: 3,637,544 RAC: 2	Message 3912 - Posted: 31 Jul 2016, 18:21:21 UTC For me the CPU limiter seems to work fine. I set one PC to 2 core and the other to 4 in the default and home locations and these were respected on the client with no app_config The task limiter doesn't seem to work as expected, on the 4 core PC just one task was running, on the other PC, I think 9 task loaded up. ID: 3912 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1134 Credit: 339,231 RAC: 0	Message 3913 - Posted: 31 Jul 2016, 19:02:30 UTC - in response to Message 3912. Received a message from Rom that the scheduler was updated, see this commit. ID: 3913 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1230 Credit: 946,490 RAC: 403	Message 3914 - Posted: 31 Jul 2016, 20:05:35 UTC Strange things are happening without using app_config: With Max # jobs 2 and Max # CPUs 3 in my preferences and limited in my BOINC Manager the number of cores to use to 2, I got 1 task, whereof 1 VM with 2 processors was created. Thereafter I set the number of cores to use in BM to 4, asked new tasks and got 3 new tasks all dedicated to start a VM with 4 cores. The --cmdline in client_state for MB RAM disappeared and the 2 core-VM is running with RAM for 1 job. ID: 3914 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 19 Aug 15 Posts: 65 Credit: 3,637,544 RAC: 2	Message 3916 - Posted: 1 Aug 2016, 8:38:02 UTC Interesting this morning, the one PC that was se to 1/4 Job/CPU is running an 8core task no app congig The other which is set to 1/2 with appconfig of max 2, is running one 2 core task and BOINC is running nothing else. ID: 3916 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 797 Credit: 14,074,336 RAC: 10,236	Message 3917 - Posted: 1 Aug 2016, 9:20:21 UTC I keep getting 8-core tasks after making the setting 2 core Tried a couple times and aborted so I guess I will try a reset again.......except it now tells me I had my daily limit after the aborts so I will try to d/l again tomorrow. Mad Scientist For Life ID: 3917 · Rating: 0 · rate: / Reply Quote

Thund3rb1rd Send message Joined: 20 Jun 16 Posts: 20 Credit: 1,673,817 RAC: 0	Message 3921 - Posted: 1 Aug 2016, 22:05:02 UTC I've been successfully tweaking a combination of app_config and BOINC local settings to tailor my machines to what I want. All of this really only affects three projects and the tasks don't take that long to run, so frankly, I'm not sure what problem is being addressed here. As far as how much work is being downloaded, is that really an issue? Personally, I take what I can get and don't throw a fit. Or am I so far off-base here that this posting is simply ludicrous? ID: 3921 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1134 Credit: 339,231 RAC: 0	Message 3928 - Posted: 2 Aug 2016, 12:00:54 UTC - in response to Message 3921. I have just done my own tests and it looks like it isn't working. After starting from a clean install on a 4 core machine, setting 1 task and no limit for the CPUs results in 4 CMS single core tasks. If I also select 1 CPU I still get 4 single core tasks. With the Theory app (multi-core), setting 1 task and no limit resulted in 2 4 core tasks with one waiting. If I also select 1 CPU I got 1 task with 4 cores. ID: 3928 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0	Message 3938 - Posted: 2 Aug 2016, 15:13:45 UTC - in response to Message 3928. On a 20-core machine I have this: <app_config> <project_max_concurrent>5</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>ALICE</name> <max_concurrent>1</max_concurrent> </app> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> </app> <app> <name>LHCb</name> <max_concurrent>1</max_concurrent> </app> <app> <name>Theory</name> <max_concurrent>3</max_concurrent> </app> <app_version> <app_name>Theory</app_name> <plan_class>vbox64_mt_mcore</plan_class> <avg_ncpus>8.000000</avg_ncpus> <cmdline>--nthreads 8.000000</cmdline> <cmdline>--memory_size_mb 10240</cmdline> </app_version> </app_config> and I've currently got one (single-core) CMS task running and two 8-core Theory tasks. top says: top - 16:08:07 up 67 days, 5:36, 1 user, load average: 4.11, 4.17, 4.11 Tasks: 515 total, 3 running, 511 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 76.2%sy, 11.5%ni, 12.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132113440k total, 126006772k used, 6106668k free, 420436k buffers Swap: 33554428k total, 33804k used, 33520624k free, 112829736k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21007 eesridr 39 19 4764m 2.7g 2.6g S 659.9 2.1 1656:43 VBoxHeadless 21294 eesridr 39 19 4670m 2.6g 2.5g S 656.9 2.0 1676:02 VBoxHeadless 23475 eesridr 39 19 3766m 2.1g 2.0g S 100.9 1.6 245:44.50 VBoxHeadless 26122 eesridr 39 19 3760m 2.1g 2.0g S 100.9 1.6 244:37.70 VBoxHeadless 4763 eesridr 39 19 64240 56m 2696 R 100.2 0.0 11:46.12 setiathome_8.00 24241 eesridr 39 19 88816 66m 2652 R 100.2 0.1 24:46.54 setiathome_8.00 (the other VBoxHeadless is a CMS task running under the production project). ID: 3938 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3939 - Posted: 2 Aug 2016, 15:53:22 UTC Last modified: 2 Aug 2016, 16:52:59 UTC Tasks: 515 total, 3 running, 511 sleeping, 0 stopped, 1 zombie Well, you have a zombie and only 3 tasks running. Should be at least 6. From a static picture, it is hard to tell, which are running. ID: 3939 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1230 Credit: 946,490 RAC: 403	Message 3940 - Posted: 2 Aug 2016, 16:47:16 UTC I suppose, Ivan showed the top command from the host and not from a guest-VM. ID: 3940 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3941 - Posted: 2 Aug 2016, 16:55:46 UTC I suppose, Ivan showed the top command from the host and not from a guest-VM. Still, strange. And all processes are at the lowest possible priority, which i find strange, but i do not know linux very well. ID: 3941 · Rating: 0 · rate: / Reply Quote

Development for LHC@home