Message boards :
CMS Application :
New Version 50.00
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Cleaning up the CVMFS configuration so that it no long contacts the old proxy. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
ATM have downloaded a CMS-Task with 1.1 GByte vdi. CMS Simulation 49.00 (vbox64_mt_mcore_cms) Name CMS_3491221_1582997091.166045 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2879060 Application shows Vers.50 but Boinc-properties Vers.49 Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 50.00 (vbox64_mt_mcore_cms) 5 Mar 2020, 9:53:35 UTC 0 GigaFLOPS Edit: Saw a download of singularity. Grafik shows: [DIR] Parent Directory - [ ] MasterLog 05-Mar-2020 18:20 4.5K [ ] StartLog 05-Mar-2020 18:20 11K [TXT] StarterLog 05-Mar-2020 18:19 7.9K [ ] running.log 05-Mar-2020 18:18 39 [ ] stderr.log 05-Mar-2020 18:18 32 [TXT] stdout.log 05-Mar-2020 18:18 43 Task was Canceled from the system - nothing shown in F2 or F3. Virtualbox is 6.0.14- Tasks in Production for Atlas are running well in this Computer. |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
The server is still giving me Version 49.00 also. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,040 |
Since a few days I get real CMS-jobs running and no longer NO_SUB_TASKS. At the moment job ireid_TC_SLC7_IDR_CMS_Home_200522_095132_4941 is running. With all tasks I get after 1 hour run time e.g.: 05/24/20 10:39:06 PERMISSION DENIED to gsi@unmapped from host 10.0.2.15 for command 448 (GIVE_STATE), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15 05/24/20 10:39:06 DC_AUTHENTICATE: Command not authorized, done! Don't know whether this is bad for the job outcome. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,040 |
At the moment job ireid_TC_SLC7_IDR_CMS_Home_200522_095132_4941 is running.After the VM has run this 1st job, it looks like the same job is running for a second time within the VM and maybe a 3rd time when the second has finished within 12 hours VM-lifetime. Although the same job name is used (ireid_TC_SLC7_IDR_CMS_Home_200522_095132_4941) there is a difference in parameters 4224 for the first job and 4355 for the second job. There is also a difference in the event numbering: Begin processing the 1st record. Run 1, Event 3150001, LumiSection 6301 on stream 0 at 24-May-2020 07:44:49.763 UTC Begin processing the 2nd record. Run 1, Event 3150002, LumiSection 6301 on stream 0 at 24-May-2020 07:45:09.446 UTC Begin processing the 1st record. Run 1, Event 4460001, LumiSection 8921 on stream 0 at 24-May-2020 12:23:58.676 UTC Begin processing the 2nd record. Run 1, Event 4460002, LumiSection 8921 on stream 0 at 24-May-2020 12:24:19.565 UTC |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
I ran a few on the 22nd and got 4 Valids so I tried more and they failed. One was a *DC_NOP failed!* and I think the rest were [ERROR] Condor ended after....( I may have done a suspend/restart with those) I might try a few more since Valids have been hiding for months with the CMS (but none of my Valids gave me 3000 credits) |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
I noticed a few members have some Valid CMS tasks running now (including Ivan) with Linux OS so I decided to try some here again and once again as usual they take close to 17 minutes just to get past HTCondor ping .....but once again it Failed on Windows 10 OS Guest Log: [ERROR] Condor ended after 3290 seconds. 2020-07-10 19:10:22 (5428): Guest Log: [INFO] Shutting Down. 2020-07-10 19:10:22 (5428): VM Completion File Detected. 2020-07-10 19:10:22 (5428): VM Completion Message: Condor ended after 3290 seconds. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
It still hangs at "HT condor ping" in win10. Maybe a good idea to fix this, as the last post about this was in July ?? |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
I decided to run some of these again since that is all we have. They say they are Valid but when I check the Log it has hours of...... warn [frontier.c:1136]: Trying next server cms4-frontier.openhtc.io[2606:4700:3030::681f:4f63] warn [frontier.c:1014]: Request 255 on chan 1 failed at Fri Jan 1 22:49:03 2021: -9 [fn-socket.c:85]: network error on connect to 2606:4700:3030::681f:4f63: Network is unreachable error [frontier.c:1159]: No more servers/proxies. Last error was: Request 255 on chan 1 failed at Fri Jan 1 22:49:03 2021: -9 [fn-socket.c:85]: network error on connect to 2606:4700:3030::681f:4f63: Network is unreachable ----- Begin Fatal Exception 01-Jan-2021 22:49:03 UTC----------------------- An exception of category 'StdException' occurred while [0] Constructing the EventProcessor [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag' Exception Message: A std::exception was thrown. Can not get data (Additional Information: [frontier.c:1159]: No more servers/proxies. Last error was: Request 255 on chan 1 failed at Fri Jan 1 22:49:03 2021: -9 [fn-socket.c:85]: network error on connect to 2606:4700:3030::681f:4f63: Network is unreachable) ( CORAL : "coral::FrontierAccess::Statement::execute" from "CORAL/RelationalPlugins/frontier" ) ----- End Fatal Exception ------------------------------------------------- Complete process id is 280 status is 66 But then I am surprised to even get Valids with these CMS tasks here but haven't tried them at LHC and they are the same version 50.00 I have another running right now. ( of course I looked up that *cloudflare* number) |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
|
Send message Joined: 18 Sep 16 Posts: 17 Credit: 984,836 RAC: 1 |
I'm sorry this is long but it's all I have, is the problem with my Win10 laptop and these tasks my crappy wifi? output of stder.txt: <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> The ring 2 stack is in use. (0xcf) - exit code 207 (0xcf)</message> <stderr_txt> 2021-05-08 09:23:03 (2868): Detected: vboxwrapper 26197 2021-05-08 09:23:03 (2868): Detected: BOINC client v7.7 2021-05-08 09:23:04 (2868): Detected: VirtualBox VboxManage Interface (Version: 6.1.18) 2021-05-08 09:23:04 (2868): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2021-05-08 09:23:04 (2868): Successfully copied 'init_data.xml' to the shared directory. 2021-05-08 09:23:05 (2868): Create VM. (boinc_f63e846a4cf0a22f, slot#2) 2021-05-08 09:23:06 (2868): Setting Memory Size for VM. (3688MB) 2021-05-08 09:23:06 (2868): Setting CPU Count for VM. (3) 2021-05-08 09:23:06 (2868): Setting Chipset Options for VM. 2021-05-08 09:23:06 (2868): Setting Boot Options for VM. 2021-05-08 09:23:07 (2868): Setting Network Configuration for NAT. 2021-05-08 09:23:07 (2868): Enabling VM Network Access. 2021-05-08 09:23:07 (2868): Disabling USB Support for VM. 2021-05-08 09:23:08 (2868): Disabling COM Port Support for VM. 2021-05-08 09:23:08 (2868): Disabling LPT Port Support for VM. 2021-05-08 09:23:08 (2868): Disabling Audio Support for VM. 2021-05-08 09:23:08 (2868): Disabling Clipboard Support for VM. 2021-05-08 09:23:09 (2868): Disabling Drag and Drop Support for VM. 2021-05-08 09:23:09 (2868): Adding storage controller(s) to VM. 2021-05-08 09:23:09 (2868): Adding virtual disk drive to VM. (vm_image.vdi) 2021-05-08 09:23:09 (2868): Adding VirtualBox Guest Additions to VM. 2021-05-08 09:23:10 (2868): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2021-05-08 09:23:10 (2868): forwarding host port 63086 to guest port 80 2021-05-08 09:23:10 (2868): Enabling remote desktop for VM. 2021-05-08 09:23:10 (2868): Required extension pack not installed, remote desktop not enabled. 2021-05-08 09:23:10 (2868): Enabling shared directory for VM. 2021-05-08 09:23:11 (2868): Starting VM using VBoxManage interface. (boinc_f63e846a4cf0a22f, slot#2) 2021-05-08 09:23:15 (2868): Successfully started VM. (PID = '12300') 2021-05-08 09:23:15 (2868): Reporting VM Process ID to BOINC. 2021-05-08 09:23:15 (2868): Guest Log: BIOS: VirtualBox 6.1.18 2021-05-08 09:23:15 (2868): Guest Log: CPUID EDX: 0x178bfbff 2021-05-08 09:23:15 (2868): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2021-05-08 09:23:15 (2868): VM state change detected. (old = 'PoweredOff', new = 'Running') 2021-05-08 09:23:15 (2868): Detected: Web Application Enabled (http://localhost:63086) 2021-05-08 09:23:15 (2868): Preference change detected 2021-05-08 09:23:15 (2868): Setting CPU throttle for VM. (90%) 2021-05-08 09:23:15 (2868): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 900 seconds) or (Vbox_job.xml: 600 seconds)) 2021-05-08 09:23:17 (2868): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2021-05-08 09:23:17 (2868): Guest Log: BIOS: Booting from Hard Disk... 2021-05-08 09:23:20 (2868): Guest Log: BIOS: KBD: unsupported int 16h function 03 2021-05-08 09:23:20 (2868): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2021-05-08 09:23:36 (2868): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2021-05-08 09:23:36 (2868): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2021-05-08 09:24:07 (2868): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000144 main Log opened 2021-05-08T13:24:06.939445000Z 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000343 main OS Product: Linux 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000381 main OS Release: 4.14.157-17.cernvm.x86_64 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000409 main OS Version: #1 SMP Wed Dec 4 17:26:45 CET 2019 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000436 main Executable: /usr/share/vboxguest52/usr/sbin/VBoxService 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000436 main Process ID: 3035 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.000437 main Package type: LINUX_64BITS_GENERIC 2021-05-08 09:24:07 (2868): Guest Log: 00:00:00.001616 main 5.2.6 r120293 started. Verbose level = 0 2021-05-08 09:24:24 (2868): Guest Log: [INFO] Mounting the shared directory 2021-05-08 09:24:24 (2868): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2021-05-08 09:24:24 (2868): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded! 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded! 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Testing connection to Condor server on port 9618 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Connection to vocms0840.cern.ch 9618 port [tcp/condor] succeeded! 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:24:25 (2868): Guest Log: [DEBUG] Testing connection to WMAgent server on port 4080 2021-05-08 09:24:26 (2868): Guest Log: [DEBUG] Connection to vocms0267.cern.ch 4080 port [tcp/lorica-in] succeeded! 2021-05-08 09:24:26 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:24:26 (2868): Guest Log: [DEBUG] Testing connection to Frontier server on port 8080 2021-05-08 09:24:26 (2868): Guest Log: [DEBUG] Connection to cms-frontier.openhtc.io 8080 port [tcp/webcache] succeeded! 2021-05-08 09:24:26 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:24:27 (2868): Guest Log: [INFO] CVMFS and Frontier will use DIRECT instead of an HTTP proxy. 2021-05-08 09:24:27 (2868): Guest Log: [INFO] A local HTTP proxy could help making the network usage 2021-05-08 09:24:27 (2868): Guest Log: [INFO] of this application more efficient. It would also help to 2021-05-08 09:24:27 (2868): Guest Log: [INFO] offload the project servers. 2021-05-08 09:24:27 (2868): Guest Log: [INFO] Details can be found in the project forum. 2021-05-08 09:24:34 (2868): Guest Log: [INFO] Reloading the CVMFS configuration (can take a while) ... 2021-05-08 09:25:43 (2868): Guest Log: [INFO] Probing CVMFS ... 2021-05-08 09:25:50 (2868): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2021-05-08 09:25:50 (2868): Guest Log: 2.4.4.0 3815 1 26124 15129 3 1 1243096 4096000 2 65024 0 2 100 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1 2021-05-08 09:26:06 (2868): Guest Log: [INFO] Reading volunteer information 2021-05-08 09:26:06 (2868): Guest Log: [INFO] Volunteer: mikey (419) 2021-05-08 09:26:06 (2868): Guest Log: [INFO] VMID: 49b2fac1-df25-48d2-a4ee-4612ca6a31f8 2021-05-08 09:26:06 (2868): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2021-05-08 09:26:07 (2868): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2021-05-08 09:26:09 (2868): Guest Log: [INFO] Running the fast benchmark. 2021-05-08 09:27:23 (2868): Guest Log: [INFO] Machine performance 20.27 HEPSPEC06 2021-05-08 09:27:23 (2868): Guest Log: [INFO] CMS application starting. Check log files. 2021-05-08 09:27:24 (2868): Guest Log: [DEBUG] HTCondor ping 2021-05-08 09:27:26 (2868): Guest Log: [DEBUG] 0 2021-05-08 09:28:11 (2868): VM state change detected. (old = 'Running', new = 'Paused') 2021-05-08 09:28:19 (2868): VM state change detected. (old = 'Paused', new = 'Running') 2021-05-08 09:28:21 (2868): VM state change detected. (old = 'Running', new = 'Paused') 2021-05-08 09:28:41 (2868): VM state change detected. (old = 'Paused', new = 'Running') 2021-05-08 09:38:44 (2868): Guest Log: Did the tarball get created? 2021-05-08 09:38:44 (2868): Guest Log: /tmp/CMS_2647666_1619986436.227913_0.tgz 2021-05-08 09:38:44 (2868): Guest Log: Here is the upload output 2021-05-08 09:38:44 (2868): Guest Log: Here is the upload error 2021-05-08 09:38:44 (2868): Guest Log: Here is the condor directory 2021-05-08 09:38:44 (2868): Guest Log: MasterLog 2021-05-08 09:38:44 (2868): Guest Log: ProcLog 2021-05-08 09:38:44 (2868): Guest Log: StarterLog 2021-05-08 09:38:44 (2868): Guest Log: StartLog 2021-05-08 09:38:44 (2868): Guest Log: XferStatsLog 2021-05-08 09:38:44 (2868): Guest Log: Here is the MasterLog 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ****************************************************** 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** condor_master (CONDOR_MASTER) STARTING UP 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** /usr/sbin/condor_master 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $ 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** $CondorPlatform: x86_64_RedHat6 $ 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** PID = 10784 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ** Log last touched time unavailable (No such file or directory) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 ****************************************************** 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 Using config source: /etc/condor/condor_config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 Using local config sources: 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/10_security.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/14_network.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/20_workernode.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/30_lease.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/35_cms.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/40_ccb.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/config.d/62-benchmark.conf 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 /etc/condor/condor_config.local 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 config Macros = 173, Sorted = 173, StringBytes = 6803, TablesBytes = 6332 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 CLASSAD_CACHING is OFF 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 Daemon Log is logging: D_ALWAYS D_ERROR 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 Daemoncore: Listening at <10.0.2.15:46425> on TCP (ReliSock). 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 DaemonCore: command socket at <10.0.2.15:46425?addrs=10.0.2.15-46425&noUDP> 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:26 DaemonCore: private command socket at <10.0.2.15:46425?addrs=10.0.2.15-46425> 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 CCBListener: registered with CCB server vocms0840.cern.ch as ccbid 137.138.156.85:9618?addrs=137.138.156.85-9618+[2001-1458-d00-14--b3]-9618#823960 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1520893905) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 13715 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:45 Setting ready state 'Ready' for STARTD 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Got SIGTERM. Performing graceful shutdown. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Sent SIGTERM to STARTD (pid 13715) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 AllReaper unexpectedly called on pid 13715, status 0. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 The STARTD (pid 13715) exited with status 0 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 All daemons are gone. Exiting. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 **** condor_master (condor_MASTER) pid 10784 EXITING WITH STATUS 0 2021-05-08 09:38:44 (2868): Guest Log: Here is the KernelTuning.log 2021-05-08 09:38:44 (2868): Guest Log: Here is the StartLog 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ****************************************************** 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** condor_startd (CONDOR_STARTD) STARTING UP 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** /usr/sbin/condor_startd 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $ 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** $CondorPlatform: x86_64_RedHat6 $ 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** PID = 13715 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ** Log last touched time unavailable (No such file or directory) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 ****************************************************** 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Using config source: /etc/condor/condor_config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Using local config sources: 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/10_security.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/14_network.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/20_workernode.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/30_lease.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/35_cms.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/40_ccb.config 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/config.d/62-benchmark.conf 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 /etc/condor/condor_config.local 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 config Macros = 174, Sorted = 174, StringBytes = 6830, TablesBytes = 6368 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 CLASSAD_CACHING is ENABLED 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Daemon Log is logging: D_ALWAYS D_ERROR 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 Daemoncore: Listening at <10.0.2.15:43253> on TCP (ReliSock). 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 DaemonCore: command socket at <10.0.2.15:43253?addrs=10.0.2.15-43253&noUDP> 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:36 DaemonCore: private command socket at <10.0.2.15:43253?addrs=10.0.2.15-43253> 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:41 CCBListener: registered with CCB server vocms0840.cern.ch as ccbid 137.138.156.85:9618?addrs=137.138.156.85-9618+[2001-1458-d00-14--b3]-9618#823961 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 VM-gahp server reported an internal error 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 VM universe will be tested to check if it is available 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 History file rotation is enabled. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Maximum history file size is: 20971520 bytes 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Number of rotated history files is: 2 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto 2021-05-08 09:38:44 (2868): Guest Log: slot type 0: Cpus: 1.000000, Memory: 2000, Swap: 33.33%, Disk: 33.33% 2021-05-08 09:38:44 (2868): Guest Log: slot type 0: Cpus: 1.000000, Memory: 2000, Swap: 33.33%, Disk: 33.33% 2021-05-08 09:38:44 (2868): Guest Log: slot type 0: Cpus: 1.000000, Memory: 2000, Swap: 33.33%, Disk: 33.33% 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot1: New machine resource allocated 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Setting up slot pairings 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot2: New machine resource allocated 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Setting up slot pairings 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot3: New machine resource allocated 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 Setting up slot pairings 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJobList: Adding job 'multicore' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJob: Initializing job 'multicore' (/usr/local/bin/multicore-shutdown) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJobList: Adding job 'mips' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJobList: Adding job 'kflops' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot1: State change: IS_OWNER is false 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot1: Changing state: Owner -> Unclaimed 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 State change: RunBenchmarks is TRUE 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot1: Changing activity: Idle -> Benchmarking 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 BenchMgr:StartBenchmarks() 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot2: State change: IS_OWNER is false 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot2: Changing state: Owner -> Unclaimed 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 State change: RunBenchmarks is TRUE 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot2: Changing activity: Idle -> Benchmarking 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot2: Changing activity: Benchmarking -> Idle 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot3: State change: IS_OWNER is false 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot3: Changing state: Owner -> Unclaimed 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 State change: RunBenchmarks is TRUE 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot3: Changing activity: Idle -> Benchmarking 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:42 slot3: Changing activity: Benchmarking -> Idle 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:45 Initial update sent to collector(s) 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:27:45 Sending DC_SET_READY message to master <10.0.2.15:46425?addrs=10.0.2.15-46425> 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:28:04 State change: benchmarks completed 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:28:04 slot1: Changing activity: Benchmarking -> Idle 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 No resources have been claimed for 600 seconds 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Shutting down Condor on this machine. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Got SIGTERM. Performing graceful shutdown. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 shutdown graceful 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job multicore 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job mips 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job kflops 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Deleting cron job manager 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job multicore 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job multicore 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting job 'multicore' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJob: Deleting job 'multicore' (/usr/local/bin/multicore-shutdown), timer 9 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Deleting benchmark job mgr 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job mips 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job kflops 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job mips 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Killing job kflops 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting job 'mips' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJob: Deleting job 'mips' (/usr/libexec/condor/condor_mips), timer -1 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting job 'kflops' 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJob: Deleting job 'kflops' (/usr/libexec/condor/condor_kflops), timer -1 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 Cron: Killing all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 CronJobList: Deleting all jobs 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 All resources are free, exiting. 2021-05-08 09:38:44 (2868): Guest Log: 05/08/21 09:38:12 **** condor_startd (condor_STARTD) pid 13715 EXITING WITH STATUS 0 2021-05-08 09:38:44 (2868): Guest Log: [ERROR] No jobs were available to run. 2021-05-08 09:38:44 (2868): Guest Log: [INFO] Shutting Down. 2021-05-08 09:38:44 (2868): VM Completion File Detected. 2021-05-08 09:38:44 (2868): VM Completion Message: No jobs were available to run. . 2021-05-08 09:38:44 (2868): Powering off VM. 2021-05-08 09:43:45 (2868): VM did not power off when requested. 2021-05-08 09:43:45 (2868): VM was successfully terminated. 2021-05-08 09:43:45 (2868): Deregistering VM. (boinc_f63e846a4cf0a22f, slot#2) 2021-05-08 09:43:45 (2868): Removing network bandwidth throttle group from VM. 2021-05-08 09:43:45 (2868): Removing VM from VirtualBox. 09:43:51 (2868): called boinc_finish(207) </stderr_txt> ]]> |
Send message Joined: 18 Sep 16 Posts: 17 Credit: 984,836 RAC: 1 |
|
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
Hi Mikey You just got one of those * [ERROR] No jobs were available to run* Happens once in a while with the CMS , BUT I have to say that I have over 60 of these Valid in the last several months which is nice to see for me. This is the *multicore version of 50.00* (vbox64_mt_mcore_cms) and the one over at LHC public is the single core 50.00 https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3792 I have both versions running at the same time on this host and just the -dev version on my faster pc's with plenty of Ram I usually only run the -dev version but when a few members said they had problems over there with CMS I decided to run a few and ALL of mine have been Valids so I don't know what their problem is other than the isp and maybe security. The ones that I had crash were just because they had not got connected and running since it was on a laptop that I had a Zoom meeting running so that screwed things up. |
Send message Joined: 18 Sep 16 Posts: 17 Credit: 984,836 RAC: 1 |
Hi Mikey I knew it was an MT task and I allowed it 4 cpu cores to run and it ran for a few minutes but then errored out like every other task I have had. I'm using Win10 and ver 6.18 of VBox but they all error out. I do have a couple of Linux laptops but they always say that VBox is not installed and won't even download a task, I need to work on that I guess. Next week I will have much better internet so I will try again then. Thanks for your help |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
I think running 2-core works best for these and all of mine are Windows 10 Another thing you have to watch is that the CMS use more Ram than Theory tasks so you have to be careful with running 4 cores You can take a look at your Task Manager and see if they are getting too close to the limit you have. I know that running 2 cores of CMS will run close to 6.5 GB on Windows 10 which is why I only run 2 tasks on this laptop but can run all 8 cores on my other 8-core hosts ( either 4 two core tasks or 8 single core tasks) since on those I have 28GB ram. You probably know how to do all of that on a Linux Mad Scientist For Life |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
One thing that I should never trust is CMS tasks. Happens every time. I have close to 70 Valids in a row and actually believe I don't have to watch them running in my sleep......and sure enough I just check for the first time today and what do I see? 19 of these in a row https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2968956 This is usually because the internet speed is too slow and this same pc tried to get one or two to start over and over but these take over 2 hours to actually start running with fast speed. (just watch it happen on the localhost running log) So this host (my best one with 28GB ram/8 core) has nothing running since I stopped it and sent back the last 2 error tasks ( it was trying to run tasks X2 core) Now I will check my internet speed since the other 4 tasks are running on the other hosts and they tend to only need to send back data and for some reason via satellite they let me have the faster speed Up and slow me down when trying to receive data. I did a check and it seems to have the 3Mbps now so I will try again here. (one at a time) Mad Scientist For Life |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,484,111 RAC: 4,610 |
As far as I can see here only Mikey and myself tried more than one core with these CMS multicores and we couldn't get them to work.........but these with one core work fine. Both of us used Windows 10 and Mikey used a Ryzen and I used Intel 7 Mikey got EXIT_NO_SUB_TASKS And my last 2 are also EXIT_NO_SUB_TASKS Mikey did get several Atlas multicore to end up Valid here but they only ran for about 30 minutes each and just a couple CPU minutes. And the 2-core I just tried again here were with full speed internet so it wasn't that this time. And my single core I ran of the v.50.00 over at LHC have been 100% Valids. So it looks like the reason we even run this version of 50,00 here is failing. Be nice if we had a member try a 2-core with a Linux OS here to see if that makes any difference and go from there as far as getting it to work with Windows 10 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2968989 mine with 2 cores https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2968859 Mikey with 3 core Mad Scientist For Life |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Be nice ... Tried a 2-core task that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2968996 stderr.txt looks fine until: 2021-05-11 07:09:12 (93214): Guest Log: [DEBUG] HTCondor ping 2021-05-11 07:09:14 (93214): Guest Log: [DEBUG] 0 The VM's logfiles are empty except "StarterLog" that shows just 2 lines: 05/11/21 07:09:17 (pid:5020) DOCKER is undefined. 05/11/21 07:09:17 (pid:5020) DockerAPI::detect() failed to detect the Docker version; assuming absent. A currently running 1-core VM from -prod shows more lines: 05/10/21 22:20:01 (pid:5474) DOCKER is undefined. 05/10/21 22:20:01 (pid:5474) DockerAPI::detect() failed to detect the Docker version; assuming absent. 05/10/21 22:20:25 (pid:8276) ****************************************************** 05/10/21 22:20:25 (pid:8276) ** condor_starter (CONDOR_STARTER) STARTING UP 05/10/21 22:20:25 (pid:8276) ** /usr/sbin/condor_starter . . . Neither my firewall nor my squid logfiles show any unusual entries that point out a network problem. The test was done using a separate BOINC client with plain vanilla settings except the proxy configuration. There was no app_config.xml. #cores were set to 2 via the project server. Other BOINC clients on the same box are working fine with CMS tasks from -prod. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 10 |
The docker message is harmless -- we don't use docker. Sorry I haven't been following this, I've had a rough year and when I can get motivated I have to concentrate on getting the "mainstream" version into Production. ...an endless round of meetings, reports, meetings... I'll fire up an instance and see what I see. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 10 |
First investigations suggest that this is due to the condor jobs not matching VMs with more than one CPU, hence the familiar no jobs message. I'll need to get more specialists involved to find out when and why this changed -- it looks like it comes from the WMAgent side rather than CMS@home-dev itself |
©2025 CERN