Message boards :
ATLAS Application :
Testing CentOS 7 vbox image
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
That's it, finally. The last 4 events: Size ATLAS_hits-file 236120.72 K Result: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2824181 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Crystal +1, one of those longrunner is finishing successful. The other was suspended for hours tonight and running the last 110 Collisions. 24:24:49.283821 Changing the VM state from 'RUNNING' to 'SUSPENDING' 24:24:49.415819 PDMR3Suspend: 131 902 163 ns run time 24:24:49.415850 Changing the VM state from 'SUSPENDING' to 'SUSPENDED' 24:24:49.415869 Console: Machine state changed to 'Paused' 28:47:11.407187 Changing the VM state from 'SUSPENDED' to 'RESUMING' 28:47:11.425768 Changing the VM state from 'RESUMING' to 'RUNNING' 28:47:11.425794 Console: Machine state changed to 'Running' 28:47:12.092070 TMR3UtcNow: nsNow=1 569 044 722 483 850 882 nsPrev=1 569 028 975 666 006 203 -> cNsDelta=15 746 817 844 679 (offLag=43 938 155 218 offVirtualSync=49 368 776 404 844 offVirtualSyncGivenUp=49 324 838 249 626, NowAgain=1 569 044 766 422 006 100) 28:47:12.105710 VMMDev: Guest Log: 24:23:24.531842 timesync vgsvcTimeSyncWorker: Radical host time change: 15 746 817 000 000ns (HostNow=1 569 044 722 483 000 000 ns HostLast=1 569 028 975 666 000 000 ns) 28:47:22.107541 VMMDev: Guest Log: 24:23:34.533605 timesync vgsvcTimeSyncWorker: Radical guest time change: 15 731 453 162 000ns (GuestNow=1 569 044 732 497 850 000 ns GuestLast=1 569 029 001 044 688 000 ns fSetTimeLastLoop=true ) |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
RDP shows for the collisions UTC-Time. (Windows 10pro) |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
The second finished after more than 2 days: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2824209 2019-09-22 11:40:53 (10716): Guest Log: HITS file was successfully produced 2019-09-22 11:40:53 (10716): Guest Log: -rw-------. 1 atlas atlas 249017334 Sep 22 09:38 /home/atlas/RunAtlas/HITS.19000509._016046.pool.root.1 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I'm planning to finally release the CentOS 7 image into production next week. From what I can see here most tasks finish ok except for the usual badly configured machines and internal ATLAS errors. Please shout if you have any last-minute problems or requests! |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 |
It’s great! In my case, the ALT-F2 takes a couple of tries to make it work. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I'm planning to finally release the CentOS 7 image into production next week. From what I can see here most tasks finish ok except for the usual badly configured machines and internal ATLAS errors. Found this in the stderr.txt of a recently downloaded task: 2019-10-04 16:45:25 (86164): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2019-10-04 16:45:25 (86164): Guest Log: Probing /cvmfs/atlas.cern.ch... Failed! 2019-10-04 16:45:36 (86164): Guest Log: Probing /cvmfs/atlas-condb.cern.ch... OK Despite the error a few lines later this: 2019-10-04 16:48:10 (86164): Guest Log: *** Starting ATLAS job. (PandaID=4495700049 taskID=19056301) *** No top output at tty3 (or at any other tty). The task requested lots of files from lcgft-atlas.gridpp.rl.ac.uk which usually indicates it had started correctly but after 25 min there's still no finished event reported at tty2. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
... there's still no finished event reported at tty2. Now the progress log appears at tty2 - after more than 1400 s to finish the 1st event. Still no top output. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I just made version 0.87 which is the final(!) test before deploying in production. This one takes the bootstrap script from CVMFS instead of my personal web site. Assuming there are no problems I'll use this image in production. PS The top on console 3 works for me but each refresh slowly scrolls up the screen, which is a bit annoying. I'm not sure exactly how to fix this. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Assuming there are no problems I'll use this image in production.Please change the <rsc_disk_bound of 8000000000.000000 to 10000000000.000000 before going into production. See my message I got my first event after 1932 seconds on tty2 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Assuming there are no problems I'll use this image in production.Please change the rsc_disk_bound of 8000000000.000000 to 10000000000.000000 before going into production. Thanks for the reminder. I've done that right now, because the disk limit is set when WU are submitted, so we need all the unsent WU in the queue to have the new value before the new app version is released. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
The new .vdi (0.87) have a size of 1.07 GByte in Windows for Downloading. 2 min Downloadtime :-). https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2829147 The vboxwrapper is 26198ab7. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Thanks, that was necessary. I suspended a task with almost 60% done (118 events) and total slot's space on disk was 8,052,105,216 bytes.Assuming there are no problems I'll use this image in production.Please change the rsc_disk_bound of 8000000000.000000 to 10000000000.000000 before going into production. Taking almost 29 seconds on an idle system: 15:21:11.076331 Console: Machine state changed to 'Saving' 15:21:39.928831 Console: Machine state changed to 'Saved' |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Something wrong with the server? lhcathome-dev 17 Oct 21:18:52 Giving up on download of 7OBLDmFwUbvnShfckohDCDFpABFKDmABFKDmMX2WDmABFKDmo2sdXm_EVNT.18605762._000224.pool.root.1: permanent HTTP error lhcathome-dev 17 Oct 21:20:58 Giving up on download of ZEIODmuL3ZvnShfckohDCDFpABFKDmABFKDmDoiUDmABFKDmZy8cum_EVNT.18605557._000447.pool.root.1: permanent HTTP error lhcathome-dev 17 Oct 21:23:02 Giving up on download of RFBLDmIiXcvnShfckohDCDFpABFKDmABFKDmxbeYDmABFKDm6X4tgn_EVNT.18605806._000117.pool.root.1: permanent HTTP error |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
14 hours later: lhcathome-dev 18 Oct 11:04:34 Giving up on download of QcoMDmDrAavnShfckohDCDFpABFKDmABFKDmRbuUDmABFKDmflLBMo_EVNT.18605557._000447.pool.root.1: permanent HTTP error |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I think the tasks remaining here were already cancelled upstream but not cancelled properly in BOINC, so that's why the input files were deleted on the server. I have manually cancelled all these tasks. Today I re-activated older versions of the vbox apps to debug the new "top" console that caused problems on the production project last week. I also submitted some test WU but beware these WU are likely to hang forever. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
This task: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1946347 is running in Virtualbox 6.0.12. RDP shows when ALT+F2 localhost login: ^[[[B. ALT+F3 shows ^[[[C The Atlas-Version is 0.86. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
A new task downloaded v0.84 vdi, but application running is v0.86. Consoles doesn't show anything informative. Task using ~22% CPU, should be 50% when 4 athena's running. Aborted the task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2831883 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Now it looks like the problems are fixed so new tasks should work. Please abort any tasks you downloaded up to now since they will never finish. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
I got one running - No Unsent left. Console ALT-F3 ('top') is working. I'm able to give input to show tasks e.g. from user 'atlas' only. 4 athena's started after ~10 minutes uptime. For Console ALT-F2 I have to wait another half an hour. Guessing the return of the task will be tomorrow afternoon. Edit: The runtime of the events from PandaID=4002876565 taskID=000649-2 seems to be a bit shorter, so return will be earlier. It looks like this was 'only' a test task with 10 events. |
©2024 CERN