Message boards : Theory Application : New Muti-core version V1.9
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3663 - Posted: 13 Jul 2016, 12:57:01 UTC
Last modified: 13 Jul 2016, 12:58:07 UTC

This is a test for the multi-core version of the theory app. Expect things to break. If you don't like things breaking, please crunch on the production project.
ID: 3663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3664 - Posted: 13 Jul 2016, 14:05:41 UTC - in response to Message 3663.  

Is this working for anyone? I am having difficulty. Two tasks are downloaded, and both have 1.5 CPUs :( One is running and one is waiting to start. What do I need to do to get one task that has 3 CPUs?
ID: 3664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3665 - Posted: 13 Jul 2016, 14:15:40 UTC
Last modified: 13 Jul 2016, 14:22:29 UTC

This is working for me after I stopped the VM after a first try with the zombie java-process.
Thereafter 2 cores are allocated (I set the system to 2 cores) and 2 science processes are running, but only one is displayed in the running-log.

16:02:28 +0200 2016-07-13 [INFO] New Job Starting
16:02:29 +0200 2016-07-13 [INFO] Condor JobID: 1178883
16:02:31 +0200 2016-07-13 [INFO] New Job Starting
16:02:31 +0200 2016-07-13 [INFO] Condor JobID: 1178881
16:02:34 +0200 2016-07-13 [INFO] MCPlots JobID: 31917765
16:02:36 +0200 2016-07-13 [INFO] MCPlots JobID: 31917806

I also increased the memory for the VM to 1024 (2x512 for each core)

You have to think about using containers if you want to run more than 1 job in a VM.
ID: 3665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3666 - Posted: 13 Jul 2016, 14:24:57 UTC - in response to Message 3665.  

I decreased the number of tasks given out by the project to 1 and I now get one task with 1.5 CPUs. My machine has four cores and 3 are available for BOINC.

Condor which is inside the VM will dynamically create the number of job slots to match the number of CPUs and split the memory evenly between them.

So for Theory we just need to end up with about 600MB per core but am a little confused now to do this.

The other thing we need to do it sort out the multiple job logs ...
ID: 3666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,215,383
RAC: 2
Message 3667 - Posted: 13 Jul 2016, 15:22:22 UTC - in response to Message 3666.  

I have been using mulitcore for quite a while now.
I simply allocated more cores (1.3) to a task in the app_config.xml.
Works fine.
ID: 3667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3669 - Posted: 13 Jul 2016, 16:17:03 UTC - in response to Message 3666.  
Last modified: 13 Jul 2016, 16:29:09 UTC

The other thing we need to do it sort out the multiple job logs ...


2016-07-13 18:08:39 (6424): Guest Log: [INFO] Theory application starting. Check log files.
2016-07-13 18:10:09 (6424): Guest Log: [[IINNFFOO]] NNeeww JJobo bS tSatratritnigng
2016-07-13 18:10:09 (6424): Guest Log: [[IINNFFOO]] NNeeww JJobo bS tSatratritnigng
2016-07-13 18:10:09 (6424): Guest Log: [INFO] Co[ndIoNr FJOo]b ICDo:n d o1r 1J801o0b6I
2016-07-13 18:10:09 (6424): Guest Log: 1180105
2016-07-13 18:10:20 (6424): Guest Log: [INFO] MCPlots JobID: 31919035
2016-07-13 18:10:20 (6424): Guest Log: [INFO] MCPlots JobID: 31918954

... and how to decide when it's time for a graceful shutdown without loosing cpu-time, because 1 core is idle and the other one is still busy with its last job.
... or 1 core is idle because no jobs available and the other core still busy.
ID: 3669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,215,383
RAC: 2
Message 3670 - Posted: 13 Jul 2016, 16:37:52 UTC
Last modified: 13 Jul 2016, 16:40:32 UTC

What is going to be achieved here?
1. Have one task with one job at a time running on multiple cores
2. Run multiple jobs simultaneously in one task (VM)with one core or more per job

In my opinion, running multiple jobs in one VM in parallel is asking for trouble and makes no sense.
ID: 3670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3671 - Posted: 13 Jul 2016, 18:45:30 UTC - in response to Message 3670.  

The goal is to run multiple jobs in parallel within the same VM. This approach reduces the disk and memory requirements at the cost of some idle CPU. That cost is half the run time of one job multiplied by the number of cores, or divide half the run time of one job by the time the VM was up and you get the efficiency.

It may not be for everyone but for the power volunteers it is an option. The good thing about multicore is that if the number of cores is one, it is what we have now.
ID: 3671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3674 - Posted: 13 Jul 2016, 19:48:19 UTC - in response to Message 3664.  

]What do I need to do to get one task that has 3 CPUs?[/quote]
For testing I've a mt_mcore running with 3 processors dedicated in the VM using the app_config.xml with app_version part:

[pre]<app_version>
<app_name>Theory</app_name>
<plan_class>vbox64_mt_mcore</plan_class>
<avg_ncpus>3.000000</avg_ncpus>
<cmdline>--nthreads 3.000000</cmdline>
<cmdline>--memory_size_mb 1536</cmdline>
</app_version>[/pre]
Three theory jobs has started, consuming about 1.01 GB RAM, 0.44 GB free and no swap used at all so far.
Due to missing logs not sure, but 2 pythia8's and probably 1 pythia6 running at the moment.

21:19:12 +0200 2016-07-13 [INFO] New Job Starting
21:19:12 +0200 2016-07-13 [INFO] New Job Starting
21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181998
21:19:12 +0200 2016-07-13 [INFO] New Job Starting
21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181997
21:19:12 +0200 2016-07-13 [INFO] Condor JobID: 1181996
21:19:17 +0200 2016-07-13 [INFO] MCPlots JobID: 31920864
21:19:17 +0200 2016-07-13 [INFO] MCPlots JobID: 31920920
21:19:18 +0200 2016-07-13 [INFO] MCPlots JobID: 31920834


I let it run overnight and will see how it end after 12 or 18 hours.
ID: 3674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,215,383
RAC: 2
Message 3675 - Posted: 13 Jul 2016, 23:02:48 UTC

I am seeing the slot number added to the job wrapper output.
However, that does not help, as all jobs are running in the same Task/slot.
ID: 3675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,215,383
RAC: 2
Message 3676 - Posted: 14 Jul 2016, 0:41:31 UTC - in response to Message 3675.  

Apparently, these slot numbers are vm internal, not boinc-slots.
Never mind.
ID: 3676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 26 Feb 15
Posts: 26
Credit: 5,331,144
RAC: 3,464
Message 3677 - Posted: 14 Jul 2016, 2:50:48 UTC

How will the BOINC client be able to know to run only one of these tasks, which will use all CPU threads, and no other CPU tasks of any kind from any other project? Or even tasks from other applications on the same project?

Also, Assuming a single task using 4 CPU threads within the VM, how will BOINC know to credit 4x threads of CPU time?

Here is the thing: Without using a VM, BOINC can already do all this natively. Several projects already use this method. Are we trying to re-create something already available, just so that it can be in a VM?
Reno, NV
Team: SETI.USA
ID: 3677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3678 - Posted: 14 Jul 2016, 6:30:28 UTC - in response to Message 3675.  
Last modified: 14 Jul 2016, 6:50:10 UTC

I am seeing the slot number added to the job wrapper output.

With my 3 processor-VM I see this now:

running-slot1.log 14-Jul-2016 08:02 24K
running-slot2.log 14-Jul-2016 08:03 61K
running-slot3.log 14-Jul-2016 08:03 31K
running.log 14-Jul-2016 08:02 24K

In my case from 1 job the log is written twice - running-slot1.log and running.log have the same contents.

Edit: Since about 01:20 CEST the finished_XX.logs are now all complete and not cut off somewhere when a new job started in another directory.
stdout.log provides a bit more information.
At the moment 1 sherpa, 1 herwig++ and 1 pythia6 running the VM uses 1.3GB of memory.
Swapfile hardly used overnight: 3584k.
ID: 3678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,215,383
RAC: 2
Message 3680 - Posted: 14 Jul 2016, 7:40:29 UTC

The big question is, how a task is ended.

1. When the first job ends after the 12h mark--all remaining jobs would be abandoned.
2 When the longest running job-slot ends--This would mean, that any previous job-slot ending after the 12h mark must not get a new job, otherwise the 18h cutoff time would be reached eventually and some job(s) would be abandoned.
ID: 3680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3683 - Posted: 14 Jul 2016, 7:52:02 UTC - in response to Message 3680.  

It should be 2.

If a job runs on average for 90 mins, all slots should be empty on average within 45mins after the 12h mark. There will be a step function with the decreasing number of cores being used. The trade off in this case would be 6.24% of idle time to save 8GB of disk space per core and reduce some memory usage.
ID: 3683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3684 - Posted: 14 Jul 2016, 8:01:07 UTC - in response to Message 3677.  
Last modified: 14 Jul 2016, 9:01:23 UTC

Hi zombie67,

I am new to multi-core with BOINC so can't answer all your questions but AFAIK the BOINC does support multi-core VMs as this is what we are experimenting with here.

We need to run in a VM as the software we used is not available for Windows which 80% of the volunteers use. Our tasks are embarrassingly parallel so we can either run one job per VM and run many VMs or many jobs in one VM. There is always a trade off and here it is some idle CPU for reduced Disk space and network usage.

If we can support both approaches then it up to you which you would prefer to run.
ID: 3684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3685 - Posted: 14 Jul 2016, 10:30:34 UTC

I returned the first task running the whole sequence: http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=221315

At 09:15 12 hours elapsed were over.
At 10:53 job in slot 2 finished and no new job was started
At 11:30 job in slot 1 finished and no new job was started
At 12:00 job in slot 3 finished (started at 09:06) and the VM got its shutdown signal.

The Condor and MCPlots JobID's were not added to BOINC's stderr.txt anymore after last night change.
In the just newly started task it is.
ID: 3685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1150
Credit: 342,328
RAC: 1
Message 3688 - Posted: 14 Jul 2016, 12:37:55 UTC - in response to Message 3685.  
Last modified: 14 Jul 2016, 12:41:29 UTC

So at the end of the task we have 97 minutes of idle time and 2378 minutes of productive time giving a 4% idle time overhead for reducing the overhead of multiple VMs.

Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this.

EDIT: We could halve that overhead by doubling the lifetime of the VM.
ID: 3688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1252
Credit: 996,550
RAC: 72
Message 3689 - Posted: 14 Jul 2016, 12:49:28 UTC - in response to Message 3688.  
Last modified: 14 Jul 2016, 12:52:45 UTC

Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this.

I'll setup my i7 with 8 threads (4 cores hyperthreaded) to run a VM with 8 processors,
but will set VboxHeadless.exe to priority 'below normal' and set in BOINC CPU to 90% (Execution Cap 90) to keep my machine responsive.
I'll start it later today so that I can watch the ending of the processes tomorrow morning.
ID: 3689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 3692 - Posted: 14 Jul 2016, 15:08:18 UTC - in response to Message 3688.  

So at the end of the task we have 97 minutes of idle time and 2378 minutes of productive time giving a 4% idle time overhead for reducing the overhead of multiple VMs.

Has anyone got a spare 8 cores to play with? The maximum we can do at the moment is 10 cores but I can easily increase this.

EDIT: We could halve that overhead by doubling the lifetime of the VM.

I've got a 20-core (128 GB) and a 12-core (64 GB) that I could try. :-)
ID: 3692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Theory Application : New Muti-core version V1.9


©2025 CERN