Thread 'New Muti-core version V1.9'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3663 - Posted: 13 Jul 2016, 12:57:01 UTC Last modified: 13 Jul 2016, 12:58:07 UTC This is a test for the multi-core version of the theory app. Expect things to break. If you don't like things breaking, please crunch on the production project. ID: 3663 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3664 - Posted: 13 Jul 2016, 14:05:41 UTC - in response to Message 3663. Is this working for anyone? I am having difficulty. Two tasks are downloaded, and both have 1.5 CPUs :( One is running and one is waiting to start. What do I need to do to get one task that has 3 CPUs? ID: 3664 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3665 - Posted: 13 Jul 2016, 14:15:40 UTC Last modified: 13 Jul 2016, 14:22:29 UTC This is working for me after I stopped the VM after a first try with the zombie java-process. Thereafter 2 cores are allocated (I set the system to 2 cores) and 2 science processes are running, but only one is displayed in the running-log. 16:02:28 +0200 2016-07-13 [INFO] New Job Starting 16:02:29 +0200 2016-07-13 [INFO] Condor JobID: 1178883 16:02:31 +0200 2016-07-13 [INFO] New Job Starting 16:02:31 +0200 2016-07-13 [INFO] Condor JobID: 1178881 16:02:34 +0200 2016-07-13 [INFO] MCPlots JobID: 31917765 16:02:36 +0200 2016-07-13 [INFO] MCPlots JobID: 31917806 I also increased the memory for the VM to 1024 (2x512 for each core) You have to think about using containers if you want to run more than 1 job in a VM. ID: 3665 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3666 - Posted: 13 Jul 2016, 14:24:57 UTC - in response to Message 3665. I decreased the number of tasks given out by the project to 1 and I now get one task with 1.5 CPUs. My machine has four cores and 3 are available for BOINC. Condor which is inside the VM will dynamically create the number of job slots to match the number of CPUs and split the memory evenly between them. So for Theory we just need to end up with about 600MB per core but am a little confused now to do this. The other thing we need to do it sort out the multiple job logs ... ID: 3666 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3667 - Posted: 13 Jul 2016, 15:22:22 UTC - in response to Message 3666. I have been using mulitcore for quite a while now. I simply allocated more cores (1.3) to a task in the app_config.xml. Works fine. ID: 3667 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3669 - Posted: 13 Jul 2016, 16:17:03 UTC - in response to Message 3666. Last modified: 13 Jul 2016, 16:29:09 UTC The other thing we need to do it sort out the multiple job logs ... 2016-07-13 18:08:39 (6424): Guest Log: [INFO] Theory application starting. Check log files. 2016-07-13 18:10:09 (6424): Guest Log: [[IINNFFOO]] NNeeww JJobo bS tSatratritnigng 2016-07-13 18:10:09 (6424): Guest Log: [[IINNFFOO]] NNeeww JJobo bS tSatratritnigng 2016-07-13 18:10:09 (6424): Guest Log: [INFO] Co[ndIoNr FJOo]b ICDo:n d o1r 1J801o0b6I 2016-07-13 18:10:09 (6424): Guest Log: 1180105 2016-07-13 18:10:20 (6424): Guest Log: [INFO] MCPlots JobID: 31919035 2016-07-13 18:10:20 (6424): Guest Log: [INFO] MCPlots JobID: 31918954 ... and how to decide when it's time for a graceful shutdown without loosing cpu-time, because 1 core is idle and the other one is still busy with its last job. ... or 1 core is idle because no jobs available and the other core still busy. ID: 3669 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3670 - Posted: 13 Jul 2016, 16:37:52 UTC Last modified: 13 Jul 2016, 16:40:32 UTC What is going to be achieved here? 1. Have one task with one job at a time running on multiple cores 2. Run multiple jobs simultaneously in one task (VM)with one core or more per job In my opinion, running multiple jobs in one VM in parallel is asking for trouble and makes no sense. ID: 3670 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3671 - Posted: 13 Jul 2016, 18:45:30 UTC - in response to Message 3670. The goal is to run multiple jobs in parallel within the same VM. This approach reduces the disk and memory requirements at the cost of some idle CPU. That cost is half the run time of one job multiplied by the number of cores, or divide half the run time of one job by the time the VM was up and you get the efficiency. It may not be for everyone but for the power volunteers it is an option. The good thing about multicore is that if the number of cores is one, it is what we have now. ID: 3671 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3674 - Posted: 13 Jul 2016, 19:48:19 UTC - in response to Message 3664. ]What do I need to do to get one task that has 3 CPUs?[/quote] For testing I've a mt_mcore running with 3 processors dedicated in the VM using the app_config.xml with app_version part: [pre]<app_version> <app_name>Theory</app_name> <plan_class>vbox64_mt_mcore</plan_class> <avg_ncpus>3.000000</avg_ncpus> <cmdline>--nthreads 3.000000</cmdline> <cmdline>--memory_size_mb 1536</cmdline> </app_version>[/pre] Three theory jobs has started, consuming about 1.01 GB RAM, 0.44 GB free and no swap used at all so far. Due to missing logs not sure, but 2 pythia8's and probably 1 pythia6 running at the moment. 21:19:12 +0200 2016-07-13 [INFO] New Job Starting 21:19:12 +0200 2016-07-13 [INFO] New Job Starting 21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181998 21:19:12 +0200 2016-07-13 [INFO] New Job Starting 21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181997 21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181996 21:19:17 +0200 2016-07-13 [INFO] MCPlots JobID: 31920864 21:19:17 +0200 2016-07-13 [INFO] MCPlots JobID: 31920920 21:19:18 +0200 2016-07-13 [INFO] MCPlots JobID: 31920834 I let it run overnight and will see how it end after 12 or 18 hours. ID: 3674 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3675 - Posted: 13 Jul 2016, 23:02:48 UTC I am seeing the slot number added to the job wrapper output. However, that does not help, as all jobs are running in the same Task/slot. ID: 3675 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3676 - Posted: 14 Jul 2016, 0:41:31 UTC - in response to Message 3675. Apparently, these slot numbers are vm internal, not boinc-slots. Never mind. ID: 3676 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 26 Feb 15 Posts: 26 Credit: 5,331,144 RAC: 0	Message 3677 - Posted: 14 Jul 2016, 2:50:48 UTC How will the BOINC client be able to know to run only one of these tasks, which will use all CPU threads, and no other CPU tasks of any kind from any other project? Or even tasks from other applications on the same project? Also, Assuming a single task using 4 CPU threads within the VM, how will BOINC know to credit 4x threads of CPU time? Here is the thing: Without using a VM, BOINC can already do all this natively. Several projects already use this method. Are we trying to re-create something already available, just so that it can be in a VM? Reno, NV Team: SETI.USA ID: 3677 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3678 - Posted: 14 Jul 2016, 6:30:28 UTC - in response to Message 3675. Last modified: 14 Jul 2016, 6:50:10 UTC I am seeing the slot number added to the job wrapper output. With my 3 processor-VM I see this now: running-slot1.log 14-Jul-2016 08:02 24K running-slot2.log 14-Jul-2016 08:03 61K running-slot3.log 14-Jul-2016 08:03 31K running.log 14-Jul-2016 08:02 24K In my case from 1 job the log is written twice - running-slot1.log and running.log have the same contents. Edit: Since about 01:20 CEST the finished_XX.logs are now all complete and not cut off somewhere when a new job started in another directory. stdout.log provides a bit more information. At the moment 1 sherpa, 1 herwig++ and 1 pythia6 running the VM uses 1.3GB of memory. Swapfile hardly used overnight: 3584k. ID: 3678 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3680 - Posted: 14 Jul 2016, 7:40:29 UTC The big question is, how a task is ended. 1. When the first job ends after the 12h mark--all remaining jobs would be abandoned. 2 When the longest running job-slot ends--This would mean, that any previous job-slot ending after the 12h mark must not get a new job, otherwise the 18h cutoff time would be reached eventually and some job(s) would be abandoned. ID: 3680 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3683 - Posted: 14 Jul 2016, 7:52:02 UTC - in response to Message 3680. It should be 2. If a job runs on average for 90 mins, all slots should be empty on average within 45mins after the 12h mark. There will be a step function with the decreasing number of cores being used. The trade off in this case would be 6.24% of idle time to save 8GB of disk space per core and reduce some memory usage. ID: 3683 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3684 - Posted: 14 Jul 2016, 8:01:07 UTC - in response to Message 3677. Last modified: 14 Jul 2016, 9:01:23 UTC Hi zombie67, I am new to multi-core with BOINC so can't answer all your questions but AFAIK the BOINC does support multi-core VMs as this is what we are experimenting with here. We need to run in a VM as the software we used is not available for Windows which 80% of the volunteers use. Our tasks are embarrassingly parallel so we can either run one job per VM and run many VMs or many jobs in one VM. There is always a trade off and here it is some idle CPU for reduced Disk space and network usage. If we can support both approaches then it up to you which you would prefer to run. ID: 3684 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3685 - Posted: 14 Jul 2016, 10:30:34 UTC I returned the first task running the whole sequence: http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=221315 At 09:15 12 hours elapsed were over. At 10:53 job in slot 2 finished and no new job was started At 11:30 job in slot 1 finished and no new job was started At 12:00 job in slot 3 finished (started at 09:06) and the VM got its shutdown signal. The Condor and MCPlots JobID's were not added to BOINC's stderr.txt anymore after last night change. In the just newly started task it is. ID: 3685 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3688 - Posted: 14 Jul 2016, 12:37:55 UTC - in response to Message 3685. Last modified: 14 Jul 2016, 12:41:29 UTC So at the end of the task we have 97 minutes of idle time and 2378 minutes of productive time giving a 4% idle time overhead for reducing the overhead of multiple VMs. Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this. EDIT: We could halve that overhead by doubling the lifetime of the VM. ID: 3688 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3689 - Posted: 14 Jul 2016, 12:49:28 UTC - in response to Message 3688. Last modified: 14 Jul 2016, 12:52:45 UTC Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this. I'll setup my i7 with 8 threads (4 cores hyperthreaded) to run a VM with 8 processors, but will set VboxHeadless.exe to priority 'below normal' and set in BOINC CPU to 90% (Execution Cap 90) to keep my machine responsive. I'll start it later today so that I can watch the ending of the processes tomorrow morning. ID: 3689 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 3692 - Posted: 14 Jul 2016, 15:08:18 UTC - in response to Message 3688. So at the end of the task we have 97 minutes of idle time and 2378 minutes of productive time giving a 4% idle time overhead for reducing the overhead of multiple VMs. Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this. EDIT: We could halve that overhead by doubling the lifetime of the VM. I've got a 20-core (128 GB) and a 12-core (64 GB) that I could try. :-) ID: 3692 · Rating: 0 · rate: / Reply Quote

Development for LHC@home