Thread 'Respect My Limits!'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1140 Credit: 339,231 RAC: 0	Message 3838 - Posted: 28 Jul 2016, 12:43:14 UTC Please post here any issues relating to the BOINC client assigning too much or too little work based on your preferences. ID: 3838 · Rating: 0 · rate: / Reply Quote

Bryan Send message Joined: 31 May 16 Posts: 4 Credit: 80,532 RAC: 0	Message 3862 - Posted: 29 Jul 2016, 18:21:10 UTC - in response to Message 3838. Last modified: 29 Jul 2016, 18:25:22 UTC I have 9 boxes that are dedicated "crunching" machines. This is my hobby. I find it extremely frustrating to have 2 72 thread machines with 128G of memory and I can use a max of 8 threads. I have 3 other Xeon machines with 24 threads and 48G of memory and of course they have the same problem. The main project is even worse, it allows me 2 WU total. The BOINC scheduler is not the most brilliant piece of code ever written. If I am running VLHC I usually run another project so I don't have threads sitting idle. In order to keep things operating correctly and requesting work when needed it requires that I setup a app_config file on the 2nd project to make sure that it doesn't encroach on VLHC and work gets requested when needed. There is a lot of crunching power you have access to if you remove the current limits. Set up a check box in the "project preference section" where people can limit what gets sent to their machines. BTW, I do NOT run my 204 threads on the main VLHC project because it isn't worth my time and effort for 2 WU. This really applies here as well. I came back when I saw you had introduced MT tasks only to find the 8 thread limit. I'm not intending to be nasty, I just want to point out that there is potential for far more computing power if you give the flexibility to your volunteers so they can get the amount of work they desire - whether in small/large amounts. ID: 3862 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1145 Credit: 8,310,612 RAC: 0	Message 3863 - Posted: 29 Jul 2016, 19:00:43 UTC - in response to Message 3862. Brian, I haven't been following the multi-core argument closely, but I was caught out myself when I tried to run Theory on a 20-core machine and found it only ran eight in reality. Now, I don't want to run Theory in perpetuity, but I do think we need to run machinery to its limits when warranted, to gain experience. So I guess I missed it, but can someone provide a link or explanation as to why theory is (allegedly [BBC_mode/]) limited to eight cores. ID: 3863 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3864 - Posted: 29 Jul 2016, 19:25:07 UTC - in response to Message 3863. Slightly off topic: How do 20+core machines overcome the memory bandwidth problem? It has to slow down a lot, if all cores accessing the memory. ID: 3864 · Rating: 0 · rate: / Reply Quote

Ben Segal Volunteer moderator Volunteer developer Volunteer tester Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0	Message 3865 - Posted: 29 Jul 2016, 20:08:57 UTC - in response to Message 3862. Hi Bryan, it's only very recently that Theory has been able to use multiple cores sensibly and we are slowly gaining experience with that. The code itself is not multi-threaded to any serious extent so one is forced to use either multiple BOINC tasks with a VM in each (ugly), or multiple VMs per task, or multiple jobs per VM (which may be tricky) or a mixture of all that. If the code were designed for multicore and/or multiple threads life would be simpler (like Atlas I believe). In any case going over about 8 threads at a time is application dependent and in general high energy physics code doesn't lend itself to that sort of thing. In the old days we tried and failed to use vector hardware on super computers like the Cray and so on. Sorry not to be able to ace out your super installation! Ben ID: 3865 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1140 Credit: 339,231 RAC: 0	Message 3866 - Posted: 29 Jul 2016, 20:56:53 UTC - in response to Message 3862. Last modified: 29 Jul 2016, 22:39:40 UTC Hi Bryan, Your feedback is very valuable and what this thread is for. I don't understand the 8 core limit. From what I understand both Virtual Box and BOINC will support up to 32 cores and we haven't put any limits on this. We also need to understand the performance of multi-core in more detail. I will try to provide add a benchmarking app next week so people can experiment with their setups. ID: 3866 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 19 Aug 15 Posts: 70 Credit: 3,642,628 RAC: 201	Message 3868 - Posted: 29 Jul 2016, 21:14:50 UTC Is there a way to control how many cores the multi core task use? Currently the task limit is number of cores ID: 3868 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 19 Aug 15 Posts: 70 Credit: 3,642,628 RAC: 201	Message 3869 - Posted: 29 Jul 2016, 21:18:04 UTC Brian, for my 40core machine, I ran 16 instances of boinc with 2 task per instance. It wasn't too hard to setup. ID: 3869 · Rating: 0 · rate: / Reply Quote

Bryan Send message Joined: 31 May 16 Posts: 4 Credit: 80,532 RAC: 0	Message 3870 - Posted: 29 Jul 2016, 22:11:58 UTC - in response to Message 3866. Last modified: 29 Jul 2016, 22:18:02 UTC Wow, all I can say is you guys are responsive :) As I stated earlier, I really am not wishing to be negative ... you asked for feedback of what some of us would like to see and that is what I'm stating. Nothing says that my requests are even feasible ... I can certainly accept that! BOINC, on a normal project, will support all 72 threads (Linux). VBox only supports 36 threads in a single VM. That really isn't a limitation however. VBox takes a huge, 30%, performance hit when working across multiple processors. For example, it is far better on a 24 thread machine to run 2 VM w/ 12 threads rather than a single VM w/ 24 threads. Windows on the other hand only supports 64 threads due to their NUMA implementation. BOINC will let you start 72 threads but the reality is NUMA is going to run those 72 WU using 64 threads. When it 1st came out I tried running the new MT WU. I don't have a problem with it only using 8 threads (although I do have 5 12 thread machines). What I objected to was the fact the project would allow me to chew on 1 WU at a time while there were 24 more downloaded and waiting to run. Now part of this is due to way I prefer to run my machines. I very seldom have multiple projects running on a single machine. When I move onto a project I prefer to hammer it with all the machine has and that way I'm not having to baby sit different projects and the BOINC scheduler. To give an example on the regular VLHC project. It downloads 2 WU with no spares. Assume I'm running a 2nd project on the remaining threads and that project doesn't checkpoint (ie prime search). When a VLHC WU finishes it must upload and then download the new WU. In the meantime the BOINC scheduler sees there is a free thread so it starts a new prime WU that can't be interrupted. So when the new VLHC WU downloads it sits there waiting until one of the threads crunching a prime becomes available. BTW, setting a high priority on VLHC vs the prime program buys nothing since BOINC sees an idle thread with only work available from the prime program. In order to solve this problem I'm required to go into the prime project's folder and create a app_config file that limits it to maximum of 70 WU. That way 2 threads are left free for VLHC to accommodate upload/download times. So all I'm trying to do is give you some perspective of what it is like on this side of the fence :) BTW, when I was still working ... quite a few years ago, I made multiple trips to your facility. I worked for HP Test & Measurement which became Agilent and now is called Keysight. Fantastic experience every time I visited. NOTE: I just started up the MT app and it is only running 1 WU. However, BOINC is showing it is using 64 cores (threads). So if the WU is actually only using 8 threads then that is why it won't start more WU. BOINC thinks there are no threads available. ID: 3870 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1140 Credit: 339,231 RAC: 0	Message 3871 - Posted: 29 Jul 2016, 22:39:25 UTC - in response to Message 3870. Hi Bryan, The main focus of this thread is to collect issues related to BOINC overcommitting the resources. Your posts seem to cover under utilization, performance and optimization. All very interesting and valuable and I would suggest starting new threads on the specific topics. In this project it should be possible to run 10 tasks. If this is not the case then some project settings are wrong. We can play around with things in this project to arrive at good settings and then apply that to the production project. Work is in progress on a task limiter for new volunteers and as soon as this is ready we can relax the task limits. As you seem to be interested in multi-core performance, please take a look a this post if you have missed it. Multi-core is very experimental at the moment and it will take some playing around to understand its behaviour. ID: 3871 · Rating: 0 · rate: / Reply Quote

Bryan Send message Joined: 31 May 16 Posts: 4 Credit: 80,532 RAC: 0	Message 3872 - Posted: 30 Jul 2016, 0:29:50 UTC - in response to Message 3838. Please post here any issues relating to the BOINC client assigning too much or too little work based on your preferences. I apologize for getting off topic Laurence. Your original post said "too much" or "too little" so I assumed you were interested in the later. I'll butt out :) ID: 3872 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1236 Credit: 956,977 RAC: 509	Message 3873 - Posted: 30 Jul 2016, 7:22:13 UTC - in response to Message 3868. Is there a way to control how many cores the multi core task use? Currently the task limit is number of cores Hi Toby, With an app_version part in the app_config.xml you can lower the number of cores of max 8 atm to your wish. My app_config.xml now: <app_config> <project_max_concurrent>1</project_max_concurrent> <app> <name>ALICE</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app> <name>CMS</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app> <name>LHCb</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app> <name>Theory</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app_version> <app_name>Theory</app_name> <plan_class>vbox64_mt_mcore</plan_class> <avg_ncpus>4.000000</avg_ncpus> <cmdline>--nthreads 4.000000</cmdline> <cmdline>--memory_size_mb 2048</cmdline> </app_version> </app_config> ID: 3873 · Rating: 0 · rate: / Reply Quote

tullio Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0	Message 3875 - Posted: 30 Jul 2016, 8:03:22 UTC All multicore tasks fail on my Linux box with a two core Opteron 1210, which is running also SETI@home and SETI Beta tasks, also GPU tasks.All others perform regularly. Tullio ID: 3875 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 19 Aug 15 Posts: 70 Credit: 3,642,628 RAC: 201	Message 3877 - Posted: 30 Jul 2016, 8:57:14 UTC - in response to Message 3873. Thanks CP, I found this last night too, I throttled back my 20 core machine. . ID: 3877 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1145 Credit: 8,310,612 RAC: 0	Message 3878 - Posted: 30 Jul 2016, 10:56:51 UTC - in response to Message 3864. Slightly off topic: How do 20+core machines overcome the memory bandwidth problem? It has to slow down a lot, if all cores accessing the memory. A lot of it with on-core cache, I guess, but that only buys you so much. On my dual-socket Xeons each processor has a memory controller handling half the RAM and I understand there is a fast interconnect between the processors to allow access to the other half. On four-socket machines there are a number of approaches (and price points, from an article I read earlier this week). You can connect to two neighbours, needing a second hop to get to the fourth, or cross-connect to all three, and so on. ID: 3878 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 743 Credit: 2,547,388 RAC: 13,527	Message 3881 - Posted: 30 Jul 2016, 11:46:37 UTC Have only AMD and IOMMU is a system-device, installed to eliminate wrong attach of Memory: https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit ID: 3881 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1140 Credit: 339,231 RAC: 0	Message 3882 - Posted: 30 Jul 2016, 11:48:46 UTC - in response to Message 3872. Hi Bryan, It is me that should apologize as didn't remember I had also included under commitment in the original post. The intention is to find issues with the local management of the resources by the BOINC client. The limit you are hitting is on the server and we are taking steps to address it. As everyone seems keen of talking about performance and tuning, I have created a new thread on the topic. ID: 3882 · Rating: 0 · rate: / Reply Quote

Bryan Send message Joined: 31 May 16 Posts: 4 Credit: 80,532 RAC: 0	Message 3899 - Posted: 31 Jul 2016, 3:43:07 UTC - in response to Message 3882. Not a problem Laurence. Under commitment is an annoyance but over commitment is unacceptable :) You have to beat down the big problems 1st then you can worry about finessing the rest! ID: 3899 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 28 Jul 16 Posts: 7 Credit: 1,349 RAC: 0	Message 3927 - Posted: 2 Aug 2016, 11:36:58 UTC Last modified: 2 Aug 2016, 12:03:57 UTC OK, I just signed up to help a bit. First thing I encountered was the style of how the preference page is organized: (1) From what I see, it seems that one can check each of the (many) application types and then set the number of WUs associated which means PER CHECKED APPLICATION - is that correct? Meaning, I check just CMS and allow for 4 tasks, so I do get 4 tasks of CMS on each machine using that preferences profile (e.g. default). However, if I check two apps such as CMS AND Theory while keeping those 4 tasks per app, I would receive 4 tasks for EACH of the apps, i.e. 8 tasks in toto. Correct? If so, then - well - I don't like it. ;-) A better solution to my point of view would probably be to check each app and (maybe directly behind that app) the corresponding number of tasks allowed. This allows for other combinations which are not possible at present (e.g. ONW task of CMS combined with TWO tasks of Theory, and so on). (2) The default settings as found now will crash almost any home computer. So, please limit number of tasks to only ONE and allow only the Theory app (which I find well-established). (3) What does # of CPUS mean? CPUs or CORES? Please specify precisely in the settings page. What is this setting good for? (4) Behind the name of each app, please indicate in red writing the amount of RAM in GB which that particular app will require/reserve at maximum FOR A SINGLE task. I find it important to make this crystal clear to whomever thinks he/she can modify these settings. Just last week, even a few of my boxes ceased operation with a nice blue screen, after the number of vLHC tasks had been increased (which is at the core of what we like to optimize here, as far as I understand). And finally, I don't get any tasks from this test project at present. So how is testing supposed to work? :) Michael. P.S.: For a start, I hoked up the following two machines for testing: (1) Windows 7 Ultimate 64 Bit with an Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7] and an MSI Lightning AMD R9 290X (Hawaii) GPU (probably not relevant here, but I use that card to compute in parallel to these tests). This box has 16 GB of RAM. (2) Ubuntu Linux 16.04 LTS 64 Bit with an Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz [Family 6 Model 30 Stepping 5] and a Gainward Phantom GTX770 GPU. This box has 16 GB of RAM. To my "surprise" BAM! reports this OS as 32 Bit Which is FALSE. President of Rechenkraft.net ID: 3927 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1140 Credit: 339,231 RAC: 0	Message 3929 - Posted: 2 Aug 2016, 12:12:45 UTC - in response to Message 3927. First thing I encountered was the style of how the preference page is organized: (1) From what I see, it seems that one can check each of the (many) application types and then set the number of WUs associated which means PER CHECKED APPLICATION - is that correct? No. The limits are global. By default a new user should have the Theory app selected and be limited to 1 task with 1 core. Once a certain level of expertise has been obtained, i.e the local and project preferences have been discovered, tasks and cpus should be set to no limit and control should be done by the BOINC client alone. (2) The default settings as found now will crash almost any home computer. So, please limit number of tasks to only ONE and allow only the Theory app (which I find well-established). At the moment it is not working. David A. has been informed. (3) What does # of CPUS mean? CPUs or CORES? Please specify precisely in the settings page. What is this setting good for? For me it is cores but David A. chose the naming. (4) Behind the name of each app, please indicate in red writing the amount of RAM in GB which that particular app will require/reserve at maximum FOR A SINGLE task. I find it important to make this crystal clear to whomever thinks he/she can modify these settings. We can do this but first should get the mechanics working. And finally, I don't get any tasks from this test project at present. So how is testing supposed to work? For which app? There was a problem with Theory this morning but I have already fixed it. ID: 3929 · Rating: 0 · rate: / Reply Quote

Development for LHC@home