Message boards : CMS Application : CMS multi-core
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 25
Message 8406 - Posted: 11 Apr 2024, 7:56:25 UTC

4-core running as single core . . .
ID: 8406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,422,653
RAC: 3,337
Message 8407 - Posted: 11 Apr 2024, 19:59:50 UTC

Same here.
ID: 8407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Drago75

Send message
Joined: 15 Mar 23
Posts: 2
Credit: 84,904
RAC: 8
Message 8408 - Posted: 15 Apr 2024, 21:22:42 UTC

My old R7-1700 had four tasks of which one failed right at the beginning. The other three finished successfully with the CPU time being 4x as high as the runtime. It uses Windows 10, BOINC manager 7.24.1 and VirtualBox 7.0.6. I am guessing if those units don't work on your machine the culprit could be the VBox version...
ID: 8408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,422,653
RAC: 3,337
Message 8409 - Posted: 16 Apr 2024, 9:07:27 UTC - in response to Message 8408.  

Yes those that you did were the actual multi-core version but they stopped running again so we will just wait for a new batch to try when they arrive here.

Your last one was April 4th and that is the last day we have had any more of them so it isn't the version of VB
ID: 8409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 8410 - Posted: 16 Apr 2024, 10:21:05 UTC - in response to Message 8408.  

Your Ryzen 7 1700 has 8 physical cores and 16 logical cores.
Your 1st log shows that you configured a 15-core VM (meanwhile you use 4-core VMs):
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3314981
2024-04-02 11:41:10 (6376): Setting CPU Count for VM. (15)


15-core VMs should not be configured on an 8 core (physical cores) computer.
Instead, each VM should not exceed the number of physical cores.

See a detailed comment about that here:
https://forums.virtualbox.org/viewtopic.php?t=77413

I suggest to respect that limit to avoid issues being introduced in any test here that have nothing to do with CERN.
ID: 8410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Aug 22
Posts: 22
Credit: 63,680
RAC: 0
Message 8411 - Posted: 16 Apr 2024, 10:41:40 UTC

My 4 core 8 threads ryzen 3 3100 is able to run 8 thread vm running cpu stress test on vmware, without making host system lag.

While virtualbox makes host system choke even with 4 threads.

Something is wrong with virtualbox code.
ID: 8411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 8412 - Posted: 16 Apr 2024, 12:09:49 UTC - in response to Message 8411.  

Unlike VirtualBox VMWare is out of scope for CMS multi-core.
Hence, discussing VirtualBox settings may be useful while a discussion about VMWare vs. VirtualBox is not.
It just moves the focus off.
ID: 8412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,422,653
RAC: 3,337
Message 8413 - Posted: 17 Apr 2024, 1:40:43 UTC
Last modified: 17 Apr 2024, 1:41:53 UTC

Woo Hoo....ran my usual pair of 4-core and one was the typical one that just uses one core BUT the other that started at the same time was an actual 4-core. (as usual the credits are not much different)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3319894
ID: 8413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 25
Message 8414 - Posted: 17 Apr 2024, 7:55:32 UTC - in response to Message 8413.  

Woo Hoo....ran my usual pair of 4-core and one was the typical one that just uses one core BUT the other that started at the same time was an actual 4-core. (as usual the credits are not much different)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3319894
Since your task stopped just after 18 hours wall time, it's very likely that the task ended due to exceeding the job duration time and
so it's just not sure whether the four inside jobs were ready and returned successful to CERN.
ID: 8414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,422,653
RAC: 3,337
Message 8415 - Posted: 18 Apr 2024, 0:04:33 UTC - in response to Message 8414.  

Same host running the same as I do daily just sent one of those in.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3320257
This one is 17hrs run time and 2 days 11 hours 11 min 59 sec CPU time

That other one I was just looking at the Run time / CPU time
And that it was at least Valid
(I was busy doing things so I didn't actually read the entire stderr)

But I just looked and I see how that other one finished like this (in case anyone wants to compare)
Without the VM Completion Message: glidein exited with return value 0.


Instead of the proper way like this


And as I was typing all of this the other one finished and it turned out as it should too
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192
( I will keep running them here and now over at prod.)
ID: 8415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 0
Message 8429 - Posted: 27 Apr 2024, 0:54:01 UTC - in response to Message 8415.  

CMS Simulation v60.70 (vbox64_mt_mcore_cms) windows_x86_64
Five Tasks parallel on ONE Computer!
Computer ID 4639
Laufzeit 14 Stunden 56 min. 19 sek.
CPU Zeit 2 Tage 8 Stunden 14 min. 43 sek.
Prüfungsstatus Gültig
Punkte 260.69
ID: 8429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 8435 - Posted: 28 Apr 2024, 10:13:45 UTC

Some computers running CMS tasks only run empty envelopes.
Unfortunately their user(s) did not make them visible for others, hence can't be directly informed.

Please check the following settings:
- recent CMS tasks require at least a computer reporting 4 cores
- computers reporting less than 4 cores (e.g. just 1) should not run recent CMS tasks

Examples:
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4980
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4800
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4803
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4810
ID: 8435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 25
Message 8436 - Posted: 28 Apr 2024, 10:14:58 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3325378
Run time 17 hours 41 min 35 sec
CPU time 2 days 16 hours 59 min 52 sec

The task did only 1 cycle of 4 jobs. It seems it could finish just in time before the abort after 18 hours wall clock time.
I really don't like these looooong jobs.
On another (faster) machine I've running now five 4-core tasks and the jobs inside seems to be much shorter.
Within three hours of the task runtime, the VM's have done three times 4 subjobs each and busy with the fourth cycles. That's looking much better.
ID: 8436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 8437 - Posted: 28 Apr 2024, 10:22:25 UTC - in response to Message 8436.  

Ivan(at prod) wrote:
At present we have two workflows running. One is set to run 503,000 events/job (as was the template it was derived from) and takes about 5-6 hours wall-time. The other is set to 50,000 events/job and runs about one hour clock time. If we run out of jobs before the weekend, I'll submit a batch with 100,000 events/job, to match the 2-hour average our previous tasks took. These jobs generate considerably less output per CPU-hour than our previous ones.

Might be a result of the different batches with different event counts.
Unfortunately we still can't look into the job details to see what kind of job a VM is currently running.
ID: 8437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : CMS Application : CMS multi-core


©2024 CERN