Message boards :
ATLAS Application :
ATLAS vbox v.1.13
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Yes Ray, something went wrong with -dev and production-tasks. Have deleted .vdi of Atlas from -dev VBoxManage.exe: error: Could not find file for the medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\ATLAS_vbox_1.13_image.vdi' (VERR_FILE_NOT_FOUND). and waiting for the end of three CMS-Tasks (about three hours). Then making a reset of the -dev Project and hoping all is ok. Using for both vboxwrapper 204 (in production renamed the wrapper). https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3097402 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Strange behaviour of three tasks so far waiting for: *** Job finished *** https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3097299 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3097310 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3097320 Very long runtimes. After the 20 events were ready, 'Preparing HITS file was displayed for almost 2 hours. I forced the first task to start from scratch. The second task was restarted after a suspend and the 3rd was in a running state all the time. After >2 hours they finished and uploaded a result file. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I checked the full logs and there is a strange timeshift in the middle of the task: 2022-07-04 18:42:41,307 | INFO | executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh --quiet;source $AtlasSetup/scripts/asetup.sh AtlasOffline,21.0.15,notest --platform x86_64-slc6-gcc49-opt --makeflags='$MAKEFLAGS' 2022-07-04 16:43:00,181 | INFO | nothing to verify (for remote files) Did you change the system time or something like that? This could have confused the software into hanging until the time "caught up" with the previous time: 2022-07-04 18:40:12,136 | INFO | will abort job monitoring soon since job state=finished (job is still in queue) 2022-07-04 18:41:13,716 | INFO | monitor loop #110: job 0:5515971838 is in state 'finished' 2022-07-04 18:41:13,716 | INFO | will abort job monitoring soon since job state=finished (job is still in queue) 2022-07-04 18:42:15,298 | INFO | monitor loop #111: job 0:5515971838 is in state 'finished' 2022-07-04 18:42:15,298 | INFO | will abort job monitoring soon since job state=finished (job is still in queue) 2022-07-04 18:42:50,074 | INFO | finished stage-out for finished payload, adding job to finished_jobs queue 2022-07-04 18:42:52,785 | INFO | job 5515971838 has state=finished I checked some others' tasks and they look ok with a runtime of around one hour or less. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Did you change the system time or something like that? This could have confused the software into hanging until the time "caught up" with the previous time: I did not touched the system accept monitoring the Consoles F2. (5 running tasks at that moment) After seen the message "Preparing HITS file" for over an hour, I tried to rivive the mentioned first result. Before I took a copy of stderr.txt. All three last lines like: 2022-07-04 18:40:13 (5868): Guest Log: ATHENA_PROC_NUMBER=4 2022-07-04 18:40:14 (5868): Guest Log: *** Starting ATLAS job. (PandaID=5515971838 taskID=NULL&cor) *** 2022-07-04 18:41:47 (5868): Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000104 main Log opened 2022-07-04T18:41:45.667110000Z 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000164 main OS Product: Linux 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000188 main OS Release: 3.10.0-957.27.2.el7.x86_64 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000204 main OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000222 main Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000223 main Process ID: 8530 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000223 main Package type: LINUX_64BITS_GENERIC 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000797 main 5.2.32 r132073 started. Verbose level = 0 2022-07-04 18:41:57 (5868): Guest Log: 00:00:10.002218 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 188 620 002 000ns (GuestNow=1 656 952 917 049 102 000 ns GuestLast=1 656 960 105 669 104 000 ns fSetTimeLastLoop=true ) Normally followed by *** Job finished ***. About 2 hours difference. My Local time and the VM-time (UTC) differs 2 hours !!?? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
After a Reset, 4 Tasks are downloading and .vdi (1,09 GByte). .vdi download stuck at 84,17% (0.92 from 1.09) Speed 3005.37 KBps. After retry finished. Three tasks running now. Taking a look tomorrow. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
First three Tasks successful, the next one: Verschoben:VM environment needs to be cleaned up. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
2022-07-04 23:23:45 (13952): Guest Log: 00:00:00.002313 main 5.2.32 r132073 started. Verbose level = 0 2022-07-04 23:23:55 (13952): Guest Log: 00:00:10.012626 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 189 313 248 000ns (GuestNow=1 656 969 834 861 988 000 ns GuestLast=1 656 977 024 175 236 000 ns fSetTimeLastLoop=true ) 2022-07-04 23:24:08 (13952): Guest Log: CVMFS is ok Seeing this after start of inside VM of the Windows-Task. Don't know, if this can be a sign. |
Send message Joined: 31 Aug 21 Posts: 13 Credit: 1,118,469 RAC: 0 |
I run 125 tasks so far using mainly two hosts (number 1 & 2 on the list below). No problems of any kind. "HITS file was successfully produced" every time. Tested configurations (tasks running parallel and cores per task) 1 x 1 core, 2 x 1 core, 2 x 2 cores, 3 x 2 cores ... depending on the RAM available. I didn't have any other kind of tasks running at the same time. Host 1 : Boinc 7.20.1 , VirtualBox 6.1.34 , Windows 11 , Intel Xeon X5660 overclocked to 3.8 GHz , 16 GB RAM Host 2 : Boinc 7.20.1 , VirtualBox 6.1.34 , Windows 10 , Intel Xeon X5650 overclocked to 3.9 GHz , 12 GB RAM Host 3 : Boinc 7.20.1 , VirtualBox 6.1.34 , Windows 10 , Intel Xeon X5660 overclocked to 3.8 GHz , 48 GB RAM Host 3 : Boinc 7.20.1 , VirtualBox 6.1.34 , Windows 7 , Intel Core 2 Quad Q9550 overclocked to 3.5 GHz , 8 GB RAM Alles funkzionieren magnifico with this ATLAS edizione v1.13 und that software combination even though those chips are molto vecchio already (launched in 2010 / 2008). I'll continue now by running this v1.13 parallel to v2.00 (LHC@home) on hosts 1 & 2 to look for signs of weird behaviour. 1 x 2 cores + 1 x 2 cores ... and ... 1 x 1 cores + 1 x 1 cores * The first v1.13 tasks running parallel to v2.00 on both hosts completed, der Ausgang ist gut. |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 655 |
Alles funkzionieren magnifico with this ATLAS edizione v1.13 und that software combination even though those chips are molto vecchio already (launched in 2010 / 2008). Are you Salvatore of Monserrat (from "The name of the Rose")?? :-P |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
boboviz, Please, only facts. This is -dev! |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 655 |
boboviz, Ok, ok, calm down My Atlas wus are all ok and this is a fact! |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
When Vbox Guest Extensions are active a timesync with the host's RTC happens every 10 seconds. It's a Vbox function and must not be switched off. 2022-07-04 18:41:47 (5868): Guest Log: 00:00:00.000797 main 5.2.32 r132073 started. Verbose level = 0 2022-07-04 18:41:57 (5868): Guest Log: 00:00:10.002218 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 188 620 002 000ns (GuestNow=1 656 952 917 049 102 000 ns GuestLast=1 656 960 105 669 104 000 ns fSetTimeLastLoop=true ) I wouldn't be afraid of a guest time change if it is a multiple of 3600 seconds since this just shows the VM adjusts the timezone. What makes me wonder is the small deviation of ~11 seconds in the example above (change: -7 188...). This may cause a fight between the NTP service inside the VM and the Vbox timesync since the VM can't set the host's RTC. Thus, the host should always be in sync with an NTP time source. In addition the host's RTC should be set to UTC. This is the default on Linux and (as far as I know) on Apple. Windows uses local time but can be configured to use UTC. Open a powershell with administrator rights and run: reg add "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\TimeZoneInformation" /v RealTimeIsUniversal /d 1 /t REG_DWORD /f Reboot and check/adjust the BIOS time to the correct UTC value. @Richie_unstable @boboviz Thanks for your comments |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
[quote]Windows uses local time but can be configured to use UTC. Open a powershell with administrator rights and run: reg add "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\TimeZoneInformation" /v RealTimeIsUniversal /d 1 /t REG_DWORD /f Reboot and check/adjust the BIOS time to the correct UTC value./quote] This is shown during some Tasks - doing nothing from my side! |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
[quote]Windows uses local time but can be configured to use UTC. Open a powershell with administrator rights and run: reg add "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\TimeZoneInformation" /v RealTimeIsUniversal /d 1 /t REG_DWORD /f Reboot and check/adjust the BIOS time to the correct UTC value./quote] This is shown during some Tasks - doing nothing from my side! 2022-07-04 23:23:45 (9592): Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log 2022-07-04 23:23:45 (9592): Guest Log: 00:00:00.000345 main Log opened 2022-07-04T23:23:44.220636000Z 2022-07-04 23:23:45 (9592): Guest Log: 00:00:00.000458 main OS Product: Linux 2022-07-04 23:23:45 (9592): Guest Log: 00:00:00.000480 main OS Release: 3.10.0-957.27.2.el7.x86_64 2022-07-04 23:23:45 (9592): Guest Log: 00:00:00.000500 main OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019 2022-07-04 23:23:45 (9592): Guest Log: 00:00:00.000518 main Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Restart of Boinc, vboxwrapper for production AND -dev 204. Four Atlas from -dev (last night postponed) are restarted. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
In addition the host's RTC should be set to UTC. This is the default on Linux and (as far as I know) on Apple.It's not the default for Windows. I adjusted my registry and the BIOS-clock to use UTC. At least the timesync in Guest Log is gone now. I'm trying to introduce maeax's postponed task problems and are running LHC@home Theory together with dev-ATLAS on my laptop. No problem so far. Testing this I noted that ATLAS VM is showing UTC-time as it did before my registry change and Theory is using local time as it did before. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
VBoxService 5.2.32 r132073 VBoxGuestAdditions-5.2.32 This is the client part of VBoxGuestAdditions running inside the ATLAS VM. It is included in the ATLAS vdi file. The functions we need (shared folders) didn't change in more recent versions, hence 5.2.32 is OK. Nonetheless, if you install the host part of the VBoxGuestAdditions they should always match the main Vbox version. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Vboxwrapper since v26204 forwards the host's RTC setting to the guest. It assumes UTC on Linux/Apple hosts and reads the registry setting on Windows hosts. The VMs display (and log) the time according to the timezone setting of the running VM-user. More important would be to sync the time with NTP as this avoids jumps. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Restart of Boinc, vboxwrapper for production AND -dev 204. Seeing ATLAS_vbox_1.13_image.vdi for this 4 -dev Tasks only as entry in Manager. Have downloaded 30 Tasks in Production and stopped testing in -dev, so long. If Virtualbox 6.0.14 is a Problem (Version from Boinc-Website), testing with 6.1.34 next. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
I'm trying to introduce maeax's postponed task problems and are running LHC@home Theory together with dev-ATLAS on my laptop.Postponed issue reproduced. I have one Theory running from LHC@home (production server) and 1 ATLAS from dev and 7 dev-ATLAS-tasks in queue ready to start. After the running ATLAS-task finished all queued tasks start for ~40 seconds and get postponed. When you try a BOINC restart these tasks don't have a boot medium. lhcathome-dev 05 Jul 12:36:30 Computation for task djbKDmmucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmOIHKDmV3NHZm_0 finished lhcathome-dev 05 Jul 12:36:31 Starting task fNKMDmvucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmWIHKDm6BKY3m_0 lhcathome-dev 05 Jul 12:36:34 Started upload of djbKDmmucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmOIHKDmV3NHZm_0_r769731847_ATLAS_result lhcathome-dev 05 Jul 12:36:34 Started upload of djbKDmmucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmOIHKDmV3NHZm_0_r769731847_ATLAS_hits lhcathome-dev 05 Jul 12:36:46 Finished upload of djbKDmmucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmOIHKDmV3NHZm_0_r769731847_ATLAS_hits lhcathome-dev 05 Jul 12:37:09 Finished upload of djbKDmmucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmOIHKDmV3NHZm_0_r769731847_ATLAS_result lhcathome-dev 05 Jul 12:37:13 Sending scheduler request: To report completed tasks. lhcathome-dev 05 Jul 12:37:13 Reporting 1 completed tasks lhcathome-dev 05 Jul 12:37:13 Not requesting tasks: "no new tasks" requested via Manager lhcathome-dev 05 Jul 12:37:14 Scheduler request completed lhcathome-dev 05 Jul 12:37:14 Project requested delay of 61 seconds lhcathome-dev 05 Jul 12:37:16 Task fNKMDmvucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmWIHKDm6BKY3m_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:37:16 Starting task 3dPMDmuucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmVIHKDmKasUfm_0 lhcathome-dev 05 Jul 12:38:02 Task 3dPMDmuucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmVIHKDmKasUfm_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:38:02 Starting task QILMDmtucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmUIHKDmbODRwm_0 lhcathome-dev 05 Jul 12:38:47 Task QILMDmtucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmUIHKDmbODRwm_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:38:48 Starting task NFCLDmiucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmKIHKDmnUTkDo_0 lhcathome-dev 05 Jul 12:39:33 Task NFCLDmiucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmKIHKDmnUTkDo_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:39:33 Starting task n5pMDmrucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmTIHKDmhLE0Gm_0 lhcathome-dev 05 Jul 12:40:18 Task n5pMDmrucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmTIHKDmhLE0Gm_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:40:18 Starting task A9kKDmjucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmLIHKDmcwiPhm_0 lhcathome-dev 05 Jul 12:41:03 Task A9kKDmjucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmLIHKDmcwiPhm_0 postponed for 86400 seconds: VM environment needs to be cleaned up. lhcathome-dev 05 Jul 12:41:04 Starting task exKLDmoucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmQIHKDmOBKfUm_0 lhcathome-dev 05 Jul 12:41:48 Task exKLDmoucS1n7Olcko1bjSoqABFKDmABFKDmKFbSDmQIHKDmOBKfUm_0 postponed for 86400 seconds: VM environment needs to be cleaned up. LHC@home 05 Jul 12:48:08 Computation for task Theory_2390-1098582-268_0 finished I let them crash by using an atlas done file. See the error results: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4600&offset=0&show_names=0&state=6&appid=5 So there seems to be a real issue when mixing the new multi attach vboxwrapper with the older LHC@home tasks. Edit: I reproduced it on my other PC too: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3097580 |
©2024 CERN