Message boards : Number crunching : Respect My Limits!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3930 - Posted: 2 Aug 2016, 12:14:17 UTC
Last modified: 2 Aug 2016, 12:25:26 UTC

OK, now I got work.

I set CPUs to unlimited as in the (dangerous) default setting, allowed 1 job and checked all APPS except for the classical LHC Sixtrack. On the i5 (Quadcore, no hyperthreading) where I reserved one CPU core for firing the Hawaii GPU, I got three tasks: 2x ALICE, 1x LHCb. So, I guess my asumptions above were correct.

If I leave these preferences as they are, the Linux Quadcore with hyperthreading (and hence 8 possible tasks) would certainly crash as it would retrieve 7 tasks to be run simultaneously (again, one CPU core is reserved for the GTX770 GPU) and would run out of memory. Of course, maybe the BOINC manager interferes with its memory managment - but I wouldn't trust it.

So, two options for me:

(1) Setup an app_config.xml to restrict the number of CPU-cores (not CPUs!) to, say, FOUR.
(2) Design another (additional) preference setup and assign that to this Linux box. Problem: I do not know how much of RAM each of the apps maximally requires...

Michael.

P.S.: Suggestion: You may rename the preference sheets from "home, school, work, etc." to "8 GB RAM / 4 CPU-Cores, 8 GB RAM / 8 CPU-Cores, 16 GB RAM / 4 CPU-Cores, 16 GB RAM / 8 CPU-Cores, etc." and preset these preferences sheets appropriately to never exceed the physical RAM by your apps. This would spare me an app_config.xml and allow even the less experienced to easily setup things in an optimized way.
President of Rechenkraft.net
ID: 3930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3931 - Posted: 2 Aug 2016, 12:20:33 UTC - in response to Message 3929.  
Last modified: 2 Aug 2016, 12:23:10 UTC


First thing I encountered was the style of how the preference page is organized:

(1) From what I see, it seems that one can check each of the (many) application types and then set the number of WUs associated which means PER CHECKED APPLICATION - is that correct?

No. The limits are global. By default a new user should have the Theory app selected and be limited to 1 task with 1 core.

Yes. SHOULD have. But hasn't. The default is "unlimited".

(3) What does # of CPUS mean? CPUs or CORES? Please specify precisely in the settings page. What is this setting good for?

For me it is cores but David A. chose the naming.

Mwahahaha , such as "results" for the tasks sent out by the server? Just teasing...

(4) Behind the name of each app, please indicate in red writing the amount of RAM in GB which that particular app will require/reserve at maximum FOR A SINGLE task. I find it important to make this crystal clear to whomever thinks he/she can modify these settings.


We can do this but first should get the mechanics working.

Ah, Laurence, please put it in. One more thing done... ;-)


Michael.
President of Rechenkraft.net
ID: 3931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3932 - Posted: 2 Aug 2016, 12:22:53 UTC - in response to Message 3930.  

The default settings need to be fail save.
One task, 1 core and a min of memory.
ID: 3932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3933 - Posted: 2 Aug 2016, 12:23:44 UTC - in response to Message 3932.  

The default settings need to be fail save.
One task, 1 core and a min of memory.

Yes.

Michael.
President of Rechenkraft.net
ID: 3933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3934 - Posted: 2 Aug 2016, 12:33:55 UTC
Last modified: 2 Aug 2016, 12:35:13 UTC

Are ALICE tasks differing in length?
My i5 machine estimates a 6 hrs runtime but then this happens (and credits are granted, so task OK):

02.08.2016 13:39:09 | vLHCathome-dev | Starting task ALICE_16562_1470076953.252092_0

02.08.2016 14:19:30 | vLHCathome-dev | Computation for task ALICE_16562_1470076953.252092_0 finished

One other ALICE task is already at 50 min. but barely CPU activity.
ALT+F4 Console shows "job finished with unknown exit code".
ALT+F3 indicates 0.3% CPU load...

Michael.
President of Rechenkraft.net
ID: 3934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3935 - Posted: 2 Aug 2016, 12:39:19 UTC

Two more ALICE tasks uploaded. One around 50 min. the other around 15 min runtime. Both showed numbers in the error console window. At least they terminate properly. Won't report more about this, because probably off -topic in this discussion thread here. If you need screenshots in the future please let me know...

Michael.
President of Rechenkraft.net
ID: 3935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3936 - Posted: 2 Aug 2016, 12:39:25 UTC - in response to Message 3934.  

A lot of these tasks are idle.
They do not do any real work.

As this is a testing project, there is a number of things, that do not work.
I tried a few times to address this, with little success.

You just have to get used to it, or not.
ID: 3936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 28 Jul 16
Posts: 7
Credit: 1,349
RAC: 0
Message 3937 - Posted: 2 Aug 2016, 12:42:27 UTC

OK, incoming multicore tasks (Theory, 3 CPUs) cause other vLHC-dev tasks in progress to be suspended. Not good. Priority settings checked?

Michael.
President of Rechenkraft.net
ID: 3937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 481
Credit: 394,720
RAC: 3
Message 4050 - Posted: 11 Aug 2016, 7:22:21 UTC

This is my experience attaching a new host:

1. I set max_jobs to 3, max_cpus to 2 and checked only the benchmark application
2. I attached the host and while it downloaded the .vdi it asked for 3 WUs
3. After the .vdi download had finished my host started all 3 WUs
4. During this startup phase my host used several GB of swap space and became unresponsive due to very high IO load
5. It took more than 20 minutes to recover and one of the WUs lost the connection to the BOINC client

6. After the first WUs were finished (resp. aborted) I changed max_jobs to 1 and checked the CMS application
7. Whit this setting (2 cpus) the CMS WUs failed after a few minutes with EXIT_NO_SUB_TASKS
8. I changed max_cpus to 1 and the next CMS WU finished successfully
ID: 4050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tito

Send message
Joined: 11 Jun 16
Posts: 1
Credit: 250,843
RAC: 0
Message 4186 - Posted: 14 Oct 2016, 10:59:18 UTC

For one week My host can't use more than one core.
Before all was in default, just limited in BM to 75% of core use and all MT applications were going normal.
But suddenly only one core is in use despite any settings I tried on my account page.
Any hints?
ID: 4186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tern

Send message
Joined: 21 Sep 15
Posts: 89
Credit: 383,017
RAC: 0
Message 4291 - Posted: 6 Nov 2016, 14:23:53 UTC
Last modified: 6 Nov 2016, 14:40:28 UTC

My cycle brought me back to this project and I turned on the new applications. No problems (other than those I've expressed before, design or VBOX related) with CMS, THEORY, or BENCHMARK. Have not yet got an ALICE task to complete successfully, will do that next, have concentrated on LHCb.

Problem: Estimated time on download shows approx 50 minutes. When it starts running, this quickly climbs to 1:10:30:00 or more (over one day, almost one and a half) which causes BOINC scheduler problems - I don't get work I need for other projects until the last minute. Then to aggravate the situation, the job completes successfully anywhere from 3.4 to 6.8 hours later (although estimate never drops accordingly).

Problem: Disk space required is absurd. Very quickly climbs to at least 2GB, eventually will pass 4.5GB! I had to raise the BOINC allocation to even get these to run (found out quickly not to run more than one dev app at a time) and think some crashes may be due to running out of available disk space.

MAJOR problem: Does not abide by BOINC memory utilization settings. Requires 2.07GB RAM. With 4GB present, set at "50% when in use", not only will LHCb not suspend "Waiting on Memory", but other projects will also not suspend. (Appears they don't know how much LHCb is taking - they do suspend if LHCb not present.) Trying to run two LHCb tasks (which should have one running and one waiting) instead runs both and crashes.
ID: 4291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 70
Message 4293 - Posted: 6 Nov 2016, 21:56:53 UTC - in response to Message 4291.  

My cycle brought me back to this project and I turned on the new applications. No problems (other than those I've expressed before, design or VBOX related) with CMS, THEORY, or BENCHMARK. Have not yet got an ALICE task to complete successfully, will do that next, have concentrated on LHCb.


Welcome back! I hope you have seen that we are moving towards one production project LHC@home. Your last cycle made a contribution towards moving in this direction. :)


Problem: Estimated time on download shows approx 50 minutes. When it starts running, this quickly climbs to 1:10:30:00 or more (over one day, almost one and a half) which causes BOINC scheduler problems - I don't get work I need for other projects until the last minute. Then to aggravate the situation, the job completes successfully anywhere from 3.4 to 6.8 hours later (although estimate never drops accordingly).


We can look into this. Will add it to the task tracker (see top left).


Problem: Disk space required is absurd. Very quickly climbs to at least 2GB, eventually will pass 4.5GB! I had to raise the BOINC allocation to even get these to run (found out quickly not to run more than one dev app at a time) and think some crashes may be due to running out of available disk space.

This is not unusual for VM based applications. The disk space is the disk size for the VM including OS, application code and data. 2-4GB is pretty compact for what it is.


MAJOR problem: Does not abide by BOINC memory utilization settings. Requires 2.07GB RAM. With 4GB present, set at "50% when in use", not only will LHCb not suspend "Waiting on Memory", but other projects will also not suspend. (Appears they don't know how much LHCb is taking - they do suspend if LHCb not present.) Trying to run two LHCb tasks (which should have one running and one waiting) instead runs both and crashes.

This is a known issue for VM applications. Virtual Box does not accurately report it's memory usage. This is causing many problems and needs to resolved but it is non-trivial.
ID: 4293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tern

Send message
Joined: 21 Sep 15
Posts: 89
Credit: 383,017
RAC: 0
Message 4305 - Posted: 9 Nov 2016, 7:32:41 UTC - in response to Message 4293.  

Simple fix for memory and disk space issues. PUT IT ON THE PREFERENCES PAGE!

If I'd KNOWN that each task required 2GB RAM, I would have known to only allow one at a time. If I'd KNOWN that each application I ran would require 5GB disk, I would have selected only one application at a time to run. COMMUNICATION!!!! :-)

(Would be nice to put up there that 'does not abide by BOINC memory limitations', too. But then the home page for the project still doesn't even mention that VirtualBox is required - volunteers don't find that out until after joining...)
ID: 4305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Respect My Limits!


©2024 CERN