1) Message boards : CMS Application : Problem with upgrade of BOINC server (Message 8468)
Posted 5 Jun 2024 by Crystal Pellet
Post:
It seems all jobs are exactly the same. Is this on purpose for testing or is that a failure?

In which way the same? They are Monte Carlo simulations, so while the config file might be the same, they should give different results due to the pseudorandom-number generators.
As I wrote it seems (to me), because the package files had the same byte sizes.
In the far past (before Grafana), we could exactly see what sub-tasks a/my machine had done or was biting on.
2) Message boards : CMS Application : Problem with upgrade of BOINC server (Message 8460)
Posted 4 Jun 2024 by Crystal Pellet
Post:
So much or so many ;-) Credit.

Yeah, the credit calculation is a mystery:
3335618	2417753	4 Jun 2024, 6:47:30 UTC	4 Jun 2024, 14:18:30 UTC	Completed and validated	23,549.03	89,871.33	1,171.84

3335619	2417754	4 Jun 2024, 6:47:30 UTC	4 Jun 2024, 13:48:08 UTC	Completed and validated	22,026.96	84,156.92	1,096.09

3335621	2417756	4 Jun 2024, 6:47:30 UTC	4 Jun 2024, 13:47:02 UTC	Completed and validated	21,387.70	79,706.69	1,064.28

3335630	2417765	4 Jun 2024, 6:47:30 UTC	4 Jun 2024, 14:25:13 UTC	Completed and validated	21,868.83	80,232.25	   41.84 
3) Message boards : CMS Application : Problem with upgrade of BOINC server (Message 8459)
Posted 4 Jun 2024 by Crystal Pellet
Post:
Apparently there is a problem with the BOINC server after an OS upgrade to RHEL9. The server status display shows zero CMS tasks available even though there are jobs pending. This is affecting creation of new tasks, even though we do have some jobs being run.
We are working on a fix.

@Ivan: Jobs, you created yesterday afternoon, are coming trough now.
It seems all jobs are exactly the same. Is this on purpose for testing or is that a failure?
4) Message boards : CMS Application : CMS multi-core (Message 8436)
Posted 28 Apr 2024 by Crystal Pellet
Post:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3325378
Run time 17 hours 41 min 35 sec
CPU time 2 days 16 hours 59 min 52 sec

The task did only 1 cycle of 4 jobs. It seems it could finish just in time before the abort after 18 hours wall clock time.
I really don't like these looooong jobs.
On another (faster) machine I've running now five 4-core tasks and the jobs inside seems to be much shorter.
Within three hours of the task runtime, the VM's have done three times 4 subjobs each and busy with the fourth cycles. That's looking much better.
5) Message boards : CMS Application : New version v61.00 (Message 8432)
Posted 27 Apr 2024 by Crystal Pellet
Post:
Yes, running well since this night.
Have 8 CMS with always 4 CPUs in -dev working since 7 hours :-)).
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4639
The runtimes of your CMS-tasks returned this morning are all about 7 hours.
Were the tasks stopped by the program or did you gracefully shutdown the tasks yourself?
Normally a task would run between about 10 and 18 hours.
6) Message boards : CMS Application : New version v61.00 (Message 8430)
Posted 27 Apr 2024 by Crystal Pellet
Post:
After a reset of the LHC Dev-project new CMS-files were downloaded with the required signatures.
From the xml- and vdi-file the name was changed to 2024_04_26a

27 Apr 09:09:52	Resetting project	
27 Apr 09:10:30	work fetch resumed by user	
27 Apr 09:10:35	Master file download succeeded	
27 Apr 09:10:40	Sending scheduler request: To fetch work.	
27 Apr 09:10:40	Requesting new tasks for CPU	
27 Apr 09:10:41	Scheduler request completed: got 1 new tasks	
27 Apr 09:10:41	Project requested delay of 61 seconds	
27 Apr 09:10:43	Started download of vboxwrapper_26207_windows_x86_64.exe	
27 Apr 09:10:46	Finished download of vboxwrapper_26207_windows_x86_64.exe (1986048 bytes)	
27 Apr 09:10:46	Started download of CMS_2024_04_26a_mt_dev.xml	
27 Apr 09:10:47	Finished download of CMS_2024_04_26a_mt_dev.xml (657 bytes)	
27 Apr 09:10:47	Started download of CMS_2024_04_26a_mt_dev.vdi
27 Apr 09:25:29	Finished download of CMS_2024_04_26a_mt_dev.vdi (4040163328 bytes)	
7) Message boards : General Discussion : Number of tasks limited by Max # CPUs (Message 8428)
Posted 26 Apr 2024 by Crystal Pellet
Post:
This is a very old issue, but still present.

When I have set the number of tasks (Max # jobs) in my preference to e.g. 8, I only get a max of 2 tasks when Max # CPUs is set to one (1)
Requesting new tasks for e.g. Theory gives the server message: No tasks are available for Theory Simulation although enough tasks are available.

When Max # CPUs is set to 2 I get 4 tasks
When Max # CPUs is set to 3 I get 6 tasks
When Max # CPUs is set to 4 I get 8 tasks

There seems to be a wrong correlation in BOINC's server configuration.
8) Message boards : Theory Application : New version v6.00 (Message 8421)
Posted 26 Apr 2024 by Crystal Pellet
Post:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3324292


2024-04-26 11:58:24 (6916): Adding virtual disk drive to VM. (Theory_2024_04_26_dev.xml)
2024-04-26 11:58:33 (6916): Attempts: 5
2024-04-26 11:58:40 (6916): Error in check if parent hdd is registered.
Command:
VBoxManage -q showhdinfo "C:\ProgramData\BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.xml" 
Output:
VBoxManage.exe: error: Could not get the storage format of the medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2024_04_26_dev.xml' (VERR_NOT_SUPPORTED)
VBoxManage.exe: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 201 of file VBoxManageDisk.cpp

2024-04-26 11:58:40 (6916): Could not create VM
2024-04-26 11:58:40 (6916): ERROR: VM failed to start
2024-04-26 11:58:40 (6916): Powering off VM.
2024-04-26 11:58:40 (6916): Deregistering VM. (boinc_3e4fc01cce596b2f, slot#0)
2024-04-26 11:58:40 (6916): Removing network bandwidth throttle group from VM.
2024-04-26 11:58:40 (6916): Removing VM from VirtualBox.
9) Message boards : CMS Application : CMS multi-core (Message 8414)
Posted 17 Apr 2024 by Crystal Pellet
Post:
Woo Hoo....ran my usual pair of 4-core and one was the typical one that just uses one core BUT the other that started at the same time was an actual 4-core. (as usual the credits are not much different)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3319894
Since your task stopped just after 18 hours wall time, it's very likely that the task ended due to exceeding the job duration time and
so it's just not sure whether the four inside jobs were ready and returned successful to CERN.
10) Message boards : CMS Application : CMS multi-core (Message 8406)
Posted 11 Apr 2024 by Crystal Pellet
Post:
4-core running as single core . . .
11) Message boards : CMS Application : CMS multi-core (Message 8371)
Posted 28 Mar 2024 by Crystal Pellet
Post:
The 2 tasks I requested this morning created 2 dual core VM's, but both are running 1 single core job.
12) Message boards : CMS Application : CMS multi-core (Message 8366)
Posted 27 Mar 2024 by Crystal Pellet
Post:
I requested a dual core task. That task did 3 jobs within 4.5 hours, but now I don't get a new sub-job, so VM almost idling.
I'm not so sure any longer that the VM is not processing events.
No process cmsRun with up to 200% cpu or any other process with high CPU usage is shown in Console ALT-F3 (top),
but the total CPU used by the VM since beginning is ~184% (incl. init phase) and there is also data transfered.
At the start of the first three seen jobs at 27-mar-2024 05:38:10.10, 27-mar-2024 07:41:24.24 and 27-mar-2024 10:01:54.54
The last two jobs where I did not see a cmsRun data downloaded at 27-mar-2024 12:27:03.03 and 27-mar-2024 15:03:15.15
13) Message boards : CMS Application : CMS multi-core (Message 8365)
Posted 27 Mar 2024 by Crystal Pellet
Post:
After I tested the 4-core VM (not really a success with doing 4 cms jobs in almost 18 hours),
I requested a dual core task. That task did 3 jobs within 4.5 hours, but now I don't get a new sub-job, so VM almost idling.
14) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8359)
Posted 26 Mar 2024 by Crystal Pellet
Post:
This was luck ;-))
No luck for CMS, cause the running cms jobs did not return a result.
Your task was valid BOINC-wise, but a valid cms-job should end with: VM Completion Message: glidein exited with return value 0.
0 means no error. Your job ended by a shutdown given by vboxwrapper after 18 hours wall clock time.
15) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8357)
Posted 26 Mar 2024 by Crystal Pellet
Post:
Crystal,
seeing the same: 4 cmsExternalGene... 153 minutes hours on 4-Cores!
Your task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3311748 seems to be killed by the 18 hours deadline.

Your result ends with:
2024-03-26 16:45:34 (33976): Status Report: Elapsed Time: '60000.000000'
2024-03-26 16:45:34 (33976): Status Report: CPU Time: '216839.343750'
2024-03-26 18:05:37 (33976): Powering off VM.
2024-03-26 18:05:38 (33976): Successfully stopped VM.
2024-03-26 18:05:38 (33976): Deregistering VM. (boinc_9f8cb45e4f80768f, slot#27)
2024-03-26 18:05:38 (33976): Removing network bandwidth throttle group from VM.
2024-03-26 18:05:38 (33976): Removing VM from VirtualBox.


Mine with:
2024-03-26 13:06:15 (1020): Guest Log: [INFO] glidein exited with return value 0.
2024-03-26 13:06:15 (1020): Guest Log: [INFO] Shutting Down.
2024-03-26 13:06:15 (1020): VM Completion File Detected.
2024-03-26 13:06:15 (1020): VM Completion Message: glidein exited with return value 0.
.
2024-03-26 13:06:15 (1020): Powering off VM.
2024-03-26 13:06:16 (1020): Successfully stopped VM.
2024-03-26 13:06:16 (1020): Deregistering VM. (boinc_15fe7a25adba3060, slot#0)
2024-03-26 13:06:16 (1020): Removing network bandwidth throttle group from VM.
2024-03-26 13:06:16 (1020): Removing VM from VirtualBox.
16) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8354)
Posted 26 Mar 2024 by Crystal Pellet
Post:
Still running after 13.5 hours the first 4 jobs:
...
...
On slow but a bit faster than this machine, it could happen, that the second 4 jobs are been killed by the 18 hours deadline.
Why not run 1 single job 4 times faster like you do with the dual core tasks. 1 cmsRun using 200% cpu and twice as fast.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3311582



Just in time:

Run time 17 hours 32 min 57 sec
Only 4 concurrently running jobs.
17) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8353)
Posted 26 Mar 2024 by Crystal Pellet
Post:
Still running after 13.5 hours the first 4 jobs:



On slow but a bit faster than this machine, it could happen, that the second 4 jobs are been killed by the 18 hours deadline.
Why not run 1 single job 4 times faster like you do with the dual core tasks. 1 cmsRun using 200% cpu and twice as fast.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3311582
18) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8351)
Posted 25 Mar 2024 by Crystal Pellet
Post:
tripled
19) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8350)
Posted 25 Mar 2024 by Crystal Pellet
Post:
dupe
20) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8349)
Posted 25 Mar 2024 by Crystal Pellet
Post:
What is the process cmsExternalGene doing?
Now I see 4 of those processses on my 4-core VM using up to 100% cpu each.
Do they the event processing and if yes: Does it mean I'm processing 4 jobs concurrently.
That would not make much sense.


Next 20


©2024 CERN