Message boards :
CMS Application :
CMS multi-core
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 25 |
4-core running as single core . . . |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,422,653 RAC: 3,337 |
Same here. |
Send message Joined: 15 Mar 23 Posts: 2 Credit: 84,904 RAC: 8 |
My old R7-1700 had four tasks of which one failed right at the beginning. The other three finished successfully with the CPU time being 4x as high as the runtime. It uses Windows 10, BOINC manager 7.24.1 and VirtualBox 7.0.6. I am guessing if those units don't work on your machine the culprit could be the VBox version... |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,422,653 RAC: 3,337 |
Yes those that you did were the actual multi-core version but they stopped running again so we will just wait for a new batch to try when they arrive here. Your last one was April 4th and that is the last day we have had any more of them so it isn't the version of VB |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Your Ryzen 7 1700 has 8 physical cores and 16 logical cores. Your 1st log shows that you configured a 15-core VM (meanwhile you use 4-core VMs): https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3314981 2024-04-02 11:41:10 (6376): Setting CPU Count for VM. (15) 15-core VMs should not be configured on an 8 core (physical cores) computer. Instead, each VM should not exceed the number of physical cores. See a detailed comment about that here: https://forums.virtualbox.org/viewtopic.php?t=77413 I suggest to respect that limit to avoid issues being introduced in any test here that have nothing to do with CERN. |
Send message Joined: 22 Aug 22 Posts: 22 Credit: 63,680 RAC: 0 |
My 4 core 8 threads ryzen 3 3100 is able to run 8 thread vm running cpu stress test on vmware, without making host system lag. While virtualbox makes host system choke even with 4 threads. Something is wrong with virtualbox code. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Unlike VirtualBox VMWare is out of scope for CMS multi-core. Hence, discussing VirtualBox settings may be useful while a discussion about VMWare vs. VirtualBox is not. It just moves the focus off. |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,422,653 RAC: 3,337 |
Woo Hoo....ran my usual pair of 4-core and one was the typical one that just uses one core BUT the other that started at the same time was an actual 4-core. (as usual the credits are not much different) https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3319894 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 25 |
Woo Hoo....ran my usual pair of 4-core and one was the typical one that just uses one core BUT the other that started at the same time was an actual 4-core. (as usual the credits are not much different)Since your task stopped just after 18 hours wall time, it's very likely that the task ended due to exceeding the job duration time and so it's just not sure whether the four inside jobs were ready and returned successful to CERN. |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,422,653 RAC: 3,337 |
Same host running the same as I do daily just sent one of those in. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3320257 This one is 17hrs run time and 2 days 11 hours 11 min 59 sec CPU time That other one I was just looking at the Run time / CPU time And that it was at least Valid (I was busy doing things so I didn't actually read the entire stderr) But I just looked and I see how that other one finished like this (in case anyone wants to compare) Without the VM Completion Message: glidein exited with return value 0. Instead of the proper way like this And as I was typing all of this the other one finished and it turned out as it should too https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192 ( I will keep running them here and now over at prod.) |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
CMS Simulation v60.70 (vbox64_mt_mcore_cms) windows_x86_64 Five Tasks parallel on ONE Computer! Computer ID 4639 Laufzeit 14 Stunden 56 min. 19 sek. CPU Zeit 2 Tage 8 Stunden 14 min. 43 sek. Prüfungsstatus Gültig Punkte 260.69 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Some computers running CMS tasks only run empty envelopes. Unfortunately their user(s) did not make them visible for others, hence can't be directly informed. Please check the following settings: - recent CMS tasks require at least a computer reporting 4 cores - computers reporting less than 4 cores (e.g. just 1) should not run recent CMS tasks Examples: https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4980 https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4800 https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4803 https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4810 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 25 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3325378 Run time 17 hours 41 min 35 sec CPU time 2 days 16 hours 59 min 52 sec The task did only 1 cycle of 4 jobs. It seems it could finish just in time before the abort after 18 hours wall clock time. I really don't like these looooong jobs. On another (faster) machine I've running now five 4-core tasks and the jobs inside seems to be much shorter. Within three hours of the task runtime, the VM's have done three times 4 subjobs each and busy with the fourth cycles. That's looking much better. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Ivan(at prod) wrote: At present we have two workflows running. One is set to run 503,000 events/job (as was the template it was derived from) and takes about 5-6 hours wall-time. The other is set to 50,000 events/job and runs about one hour clock time. If we run out of jobs before the weekend, I'll submit a batch with 100,000 events/job, to match the 2-hour average our previous tasks took. These jobs generate considerably less output per CPU-hour than our previous ones. Might be a result of the different batches with different event counts. Unfortunately we still can't look into the job details to see what kind of job a VM is currently running. |
©2024 CERN