Message boards : CMS Application : New Version 60.67
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1051
Credit: 294,070
RAC: 0
Message 7838 - Posted: 2 Nov 2022, 15:04:36 UTC

Testing the upstream version of the vboxwrapper.
ID: 7838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7839 - Posted: 2 Nov 2022, 15:39:08 UTC - in response to Message 7838.  

That vboxwrapper reports it's version as 26205 but BOINC does not yet offer it for download.
They offer vboxwrapper up to 26204.
https://boinc.berkeley.edu/dl/


ATLAS and CMS on prod already use a 26205 taken from a github artefact (compiled by the BOINC CI).
That artefact will become an official 26205 once the BOINC process owners put it on the download page.

Where is the version used for this CMS app taken from?
ID: 7839 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1051
Credit: 294,070
RAC: 0
Message 7840 - Posted: 2 Nov 2022, 16:15:44 UTC - in response to Message 7839.  

This is pre-release testing.

- linux: https://github.com/BOINC/boinc/releases/download/vboxwrapper%2F26205/vboxwrapper_26205_x86_64-pc-linux-gnu.zip
- Windows: https://github.com/BOINC/boinc/releases/download/vboxwrapper%2F26205/vboxwrapper_26205_windows_x86_64.exe.zip
ID: 7840 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7841 - Posted: 2 Nov 2022, 16:33:36 UTC - in response to Message 7840.  

OK.
This looks like fresh compiled executables from the known source code.
I agree, they should be tested since on github there might have been changes affecting the compilers/development tools.
ID: 7841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7842 - Posted: 2 Nov 2022, 16:56:44 UTC

Got a task that started fine.

The good thing from stderr.txt:
It has no issues getting X509 credentials.
2022-11-02 17:51:33 (91056): Guest Log: [INFO] Reading volunteer information
2022-11-02 17:51:35 (91056): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2022-11-02 17:51:36 (91056): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev
2022-11-02 17:51:37 (91056): Guest Log: [INFO] CMS application starting. Check log files.


@Laurence
This might be helpful to solve the X509 issue on prod.
ID: 7842 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7843 - Posted: 2 Nov 2022, 17:13:24 UTC

At the beginning of the VM's "MasterLog" there are lots of error messages like:
...Failed to decode JWT in keyfile...

The same messages are printed to the StartdLog.

Should be forwarded to the developers of the scientific app.

11/02/22 17:53:59 (pid:15609) ******************************************************
11/02/22 17:53:59 (pid:15609) ** condor_master (CONDOR_MASTER) STARTING UP
11/02/22 17:53:59 (pid:15609) ** /tmp/glide_4xHWR4/main/condor/usr/sbin/condor_master
11/02/22 17:53:59 (pid:15609) ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
11/02/22 17:53:59 (pid:15609) ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
11/02/22 17:53:59 (pid:15609) ** $CondorVersion: 9.0.11 Mar 12 2022 BuildID: 578027 PackageID: 9.0.11-1 $
11/02/22 17:53:59 (pid:15609) ** $CondorPlatform: x86_64_CentOS7 $
11/02/22 17:53:59 (pid:15609) ** PID = 15609
11/02/22 17:53:59 (pid:15609) ** Log last touched time unavailable (No such file or directory)
11/02/22 17:53:59 (pid:15609) ******************************************************
11/02/22 17:53:59 (pid:15609) Using config source: /tmp/glide_4xHWR4/condor_config
11/02/22 17:53:59 (pid:15609) config Macros = 326, Sorted = 326, StringBytes = 21599, TablesBytes = 11776
11/02/22 17:53:59 (pid:15609) CLASSAD_CACHING is OFF
11/02/22 17:53:59 (pid:15609) Daemon Log is logging: D_ALWAYS D_ERROR
11/02/22 17:53:59 (pid:15609) Daemoncore: Listening at <0.0.0.0:44699> on TCP (ReliSock).
11/02/22 17:53:59 (pid:15609) DaemonCore: command socket at <10.0.2.15:44699?addrs=10.0.2.15-44699&alias=408-3406-14250&noUDP>
11/02/22 17:53:59 (pid:15609) DaemonCore: private command socket at <10.0.2.15:44699?addrs=10.0.2.15-44699&alias=408-3406-14250>
11/02/22 17:54:00 (pid:15609) Failed to decode JWT in keyfile '/tmp/glide_4xHWR4/ticket/myproxy'; ignoring.
((repeats this message about 80 times))


11/02/22 17:54:00 (pid:15609) CCBListener: registered with CCB server vocms0840.cern.ch as ccbid 137.138.156.85:9618?addrs=[2001-1458-d00-14--b3]-9618+137.138.156.85-9618&alias=vocms0840.cern.ch#7702380
11/02/22 17:54:00 (pid:15609) Master restart (GRACEFUL) is watching /tmp/glide_4xHWR4/main/condor/sbin/condor_master (mtime:1667407914)
11/02/22 17:54:00 (pid:15609) Started DaemonCore process "/tmp/glide_4xHWR4/main/condor/sbin/condor_startd", pid and pgroup = 15612
11/02/22 17:54:00 (pid:15609) Daemons::StartAllDaemons all daemons were started
11/02/22 17:54:02 (pid:15609) Setting ready state 'Ready' for STARTD
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute GLIDEIN_Resource_Slots = Iotokens,80,,type=main.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_JOB_ATTRS =  x509userproxysubject x509UserProxyFQAN x509UserProxyVOName x509UserProxyEmail x509UserProxyExpiration,MemoryUsage,ResidentSetSize,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_PARTITIONABLE_SLOT_ATTRS = MemoryUsage,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute GLIDEIN_Resource_Slots = Iotokens,80,,type=main.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_JOB_ATTRS =  x509userproxysubject x509UserProxyFQAN x509UserProxyVOName x509UserProxyEmail x509UserProxyExpiration,MemoryUsage,ResidentSetSize,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:54:05 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_PARTITIONABLE_SLOT_ATTRS = MemoryUsage,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
((repeats this message about 80 times))


11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute GLIDEIN_Resource_Slots = Iotokens,80,,type=main.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_JOB_ATTRS =  x509userproxysubject x509UserProxyFQAN x509UserProxyVOName x509UserProxyEmail x509UserProxyExpiration,MemoryUsage,ResidentSetSize,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_PARTITIONABLE_SLOT_ATTRS = MemoryUsage,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute GLIDEIN_Resource_Slots = Iotokens,80,,type=main.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_JOB_ATTRS =  x509userproxysubject x509UserProxyFQAN x509UserProxyVOName x509UserProxyEmail x509UserProxyExpiration,MemoryUsage,ResidentSetSize,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
11/02/22 17:58:43 (pid:15609) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_PARTITIONABLE_SLOT_ATTRS = MemoryUsage,ProportionalSetSizeKb.  The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the MASTER ad.
ID: 7843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 654
Credit: 10,929,747
RAC: 1,604
Message 7844 - Posted: 2 Nov 2022, 22:09:49 UTC - in response to Message 7838.  

Testing the upstream version of the vboxwrapper.

Thanks for the warning Laurence,
Good luck here is the 2 I have running so far began a few hours ago and had no problems and I can update the other 2 in less than 6 hours from now so I can leave these running without checking them doing the update.

Begin processing the 7110th record. Run 1, Event 78597110, LumiSection 157195 on stream 0 at 02-Nov-2022 22:59:47.425 CET
Begin processing the 7111th record. Run 1, Event 78597111, LumiSection 157195 on stream 0 at 02-Nov-2022 22:59:49.140 CET
Begin processing the 7112th record. Run 1, Event 78597112, LumiSection 157195 on stream 0 at 02-Nov-2022 22:59:51.755 CET
Begin processing the 7113th record. Run 1, Event 78597113, LumiSection 157195 on stream 0 at 02-Nov-2022 22:59:53.421 CET
Begin processing the 7114th record. Run 1, Event 78597114, LumiSection 157195 on stream 0 at 02-Nov-2022 22:59:55.036 CET


ID: 7844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 654
Credit: 10,929,747
RAC: 1,604
Message 7845 - Posted: 2 Nov 2022, 22:13:46 UTC - in response to Message 7843.  

At the beginning of the VM's "MasterLog" there are lots of error messages like:
...Failed to decode JWT in keyfile...

The same messages are printed to the StartdLog.

Should be forwarded to the developers of the scientific app.

they always say that on the MasterLog remind you of anything?
ID: 7845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1147
Credit: 754,546
RAC: 10
Message 7847 - Posted: 3 Nov 2022, 9:45:57 UTC - in response to Message 7838.  
Last modified: 3 Nov 2022, 9:47:32 UTC

I tested 1 CMS-task v60.67 and that returned fine. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3136136
Combination BOINC 7.20.2 and VBox 7.0.2
Somewhere in the middle of the task, I suspended the task a few minutes, where the state was saved to disk.
Towards the end one task-suspend with keep in memory active.

Nothing to do with this vboxwrapper, but I noticed that Vbox7 keeps the used harddisks in the VirtuBox.xml file even after a reboot and no BOINC-VMs in use.

    <MediaRegistry>
      <HardDisks>
        <HardDisk uuid="{6f08958e-7bfd-4804-8dd7-c7b4408cb126}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{997e0796-142b-4278-9763-0bceb3ac71bc}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/ATLAS_vbox_1.17_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{dae25e8f-de18-4971-b11c-eca764ede402}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{8fb925ef-3497-4bfb-88e3-bbab2930787f}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/CMS_2022_09_07.vdi" format="VDI" type="MultiAttach"/>
      </HardDisks>
ID: 7847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7848 - Posted: 3 Nov 2022, 10:48:39 UTC - in response to Message 7847.  

...I noticed that Vbox7 keeps the used harddisks in the VirtuBox.xml file even after a reboot and no BOINC-VMs in use.

    <MediaRegistry>
      <HardDisks>
        <HardDisk uuid="{6f08958e-7bfd-4804-8dd7-c7b4408cb126}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{997e0796-142b-4278-9763-0bceb3ac71bc}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/ATLAS_vbox_1.17_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{dae25e8f-de18-4971-b11c-eca764ede402}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{8fb925ef-3497-4bfb-88e3-bbab2930787f}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/CMS_2022_09_07.vdi" format="VDI" type="MultiAttach"/>
      </HardDisks>

Indeed, v7.x writes the multiattach parents to the global store while older versions write them to the VM description files using the parent.
This may allow it to avoid the workaround in future vboxwrapper versions if it detects vbox >=v7.

Since the workaround is only executed in case of an error and we have done lots of test to get it stable I'd like to avoid any change for now.
But I'll keep that in mind.
ID: 7848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 404
Credit: 374,791
RAC: 0
Message 7849 - Posted: 3 Nov 2022, 11:03:50 UTC

After a project reset on my test client I ran a CMS v60.67 task (reduced to singlecore) last night.
The task finished successfully.

Today I got another task that runs as 2-core.
checked:
- pausing/resuming the task
- pausing -> BOINC shutdown -> waiting a while -> restarting BOINC -> resuming the task

So far the task passed all tests and runs fine.
It will take some hours until it is done.
ID: 7849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : New Version 60.67


©2023 CERN