Message boards : ATLAS Application : Testing CentOS 7 vbox image
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,332,133
RAC: 500
Message 6643 - Posted: 18 Sep 2019, 9:36:55 UTC - in response to Message 6642.  

- Rarely not uploading a result, although HITS-file is produced (with 200 events tasks very annoying).


This is the most serious problem in my opinion and a good reason to use the LHC wrapper. I just released 0.85 for windows which uses v26198ab7 so let's see if it helps with this problem.

That's up to you. I'm testing with Windows 7 and VBox 6.0.12. No idea how Linux and Windows 10 will do.


I'm running Linux with VBox 6.0.12 and everything works ok so I kept the old wrapper for now. Most Linux users would run the native version anyway so this is not as important as Windows.
ID: 6643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6644 - Posted: 18 Sep 2019, 13:31:13 UTC - in response to Message 6643.  

I just released 0.85 for windows which uses v26198ab7 so let's see if it helps with this problem.
I got some tasks for the LHC wrapper. First one: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2822886
I noticed you distribute vboxwrapper_26198ab1_windows_x86_64.pdb and so not created from the source of the ab7 version.
Therefore probably useless and if the correct version, pdb's are normally only used when developing a new wrapper. By the way: several BOINC projects using VBOX do not distribute the pdb's at all like Cosmology@Home.
ID: 6644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6645 - Posted: 18 Sep 2019, 15:24:32 UTC

With task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2822826 there was an issue.

After 30 minutes up time there were no athena's running. I suppose there was a temporary network problem to a server.
ID: 6645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6646 - Posted: 18 Sep 2019, 17:35:50 UTC
Last modified: 18 Sep 2019, 17:39:08 UTC

This task is now finished under Win10pro. Graphic and RDP are now active.
(10 Collisions instead of 200).
Had 3 Cores running with app_config.
The wrapper is so as Crystal wrote. Virtualbox 5.2.32.
TOP under F3 scrolls every 5 sec.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2822874
2019-09-18 19:25:46 (5692): Guest Log: HITS file was successfully produced
2019-09-18 19:25:46 (5692): Guest Log: -rw-------. 1 atlas atlas 9186657 Sep 18 17:24 /home/atlas/RunAtlas/HITS.000649-403691-15393._078090.pool.root.1
ID: 6646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6647 - Posted: 19 Sep 2019, 0:50:16 UTC

On all three Win10pro Atlas have produced a hits-file:
2019-09-18 20:36:22 (13484): Guest Log: HITS file was successfully produced
2019-09-18 20:36:22 (13484): Guest Log: -rw-------. 1 atlas atlas 9192759 Sep 18 18:34 /home/atlas/RunAtlas/HITS.000649-404167-26542._078090.pool.root.1
2019-09-18 21:58:19 (7476): Guest Log: HITS file was successfully produced
2019-09-18 21:58:19 (7476): Guest Log: -rw-------. 1 atlas atlas 9044926 Sep 18 19:55 /home/atlas/RunAtlas/HITS.000649-403632-27311._078090.pool.root.1
The third is in the message before.
Thank you David, to find the wrapper for working with Win10pro.
If there is a newer one, we can test it again.
The .vdi was always 084 as before and not downloaded again.
ID: 6647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6648 - Posted: 19 Sep 2019, 5:47:39 UTC - in response to Message 6647.  

On all three Win10pro Atlas have produced a hits-file:
..
If there is a newer one, we can test it again.
Good to see that switching to v0.85 with VBox Manage interface solved your problem on Win10.

The cause could also be related to VBox Version: 5.2.32, you are running on all three Win10 machines with vboxwrapper_26202.
Maybe even together with being AMD-processors.
Is there a reason not to update to VBox 6.0.12?
v5.2.32 is now only recommended for 32bit systems/OS's and Oracle support for it will stop in July 2020.
ID: 6648 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6649 - Posted: 19 Sep 2019, 5:58:23 UTC - in response to Message 6648.  
Last modified: 19 Sep 2019, 6:05:42 UTC

Hi Crystal,
no there is no reason, only it is work for five machines and the Linux VM's ;-)
This year it would be. Have one machine with a problem for upgrading to 10.00.18362.00. (Nine times with no success, every time 3 GByte OS-Download!)
When this is successful in October, than will upgrade Virtualbox to 6.0.x.
Being happy now to see that it work.
If there is a better vboxwrapper-version, we can test it.
ID: 6649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,332,133
RAC: 500
Message 6650 - Posted: 19 Sep 2019, 8:34:46 UTC - in response to Message 6644.  

I just released 0.85 for windows which uses v26198ab7 so let's see if it helps with this problem.
I got some tasks for the LHC wrapper. First one: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2822886
I noticed you distribute vboxwrapper_26198ab1_windows_x86_64.pdb and so not created from the source of the ab7 version.
Therefore probably useless and if the correct version, pdb's are normally only used when developing a new wrapper. By the way: several BOINC projects using VBOX do not distribute the pdb's at all like Cosmology@Home.


The ATLAS WU have used the pdb from the beginning, I always copied the setup from the previous app version without questioning if it was useful or not :)

Good to see that other problems were fixed with this wrapper, I will try to pass this information upstream to the developers.

I notice this version of the wrapper adds an extra new line to each guest log messate which is a bit annoying so I will try to remove it.
ID: 6650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,332,133
RAC: 500
Message 6651 - Posted: 19 Sep 2019, 9:37:22 UTC - in response to Message 6650.  

I just made v0.86 which doesn't use the pdb, let's see if it still works.
ID: 6651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6652 - Posted: 19 Sep 2019, 12:26:50 UTC
Last modified: 19 Sep 2019, 12:42:14 UTC

No problems:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2823029
2019-09-19 14:17:15 (3260): Guest Log: HITS file was successfully produced
2019-09-19 14:17:15 (3260): Guest Log: -rw-------. 1 atlas atlas 9055739 Sep 19 12:14 /home/atlas/RunAtlas/HITS.000649-2620510-10830._078090.pool.root.1

Edit: VM need to be deleted manually in Virtualbox:

2019-09-19 14:17:15 (3260): VM Completion File Detected.
2019-09-19 14:17:15 (3260): Powering off VM.
2019-09-19 14:22:18 (3260): VM did not power off when requested.
2019-09-19 14:22:18 (3260): VM was successfully terminated.
2019-09-19 14:22:18 (3260): Deregistering VM. (boinc_619308a4cd5573bb, slot#2
ID: 6652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6653 - Posted: 19 Sep 2019, 15:01:13 UTC - in response to Message 6652.  

Edit: VM need to be deleted manually in Virtualbox:

2019-09-19 14:17:15 (3260): VM Completion File Detected.
2019-09-19 14:17:15 (3260): Powering off VM.
2019-09-19 14:22:18 (3260): VM did not power off when requested.
2019-09-19 14:22:18 (3260): VM was successfully terminated.
2019-09-19 14:22:18 (3260): Deregistering VM. (boinc_619308a4cd5573bb, slot#2
This is meaningless. It's always been that way in the LHC-wrapper. It's only cosmetic. If Laurence has boredom he could have a look at it.
Although the VM is powered off directly (you can see it in VBox Manager), the VM is cleaned 5 minutes thereafter.
The wrapper thinking that VM did not power off when requested. must be a failure in the script. It's already for 5 minutes.
See my Theory results: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10360630&offset=0&show_names=0&state=4&appid=13 showing the same message towards the end.
The remnant in VirtualBox Manager you found was maybe from this older task of yours: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2822072
ID: 6653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6654 - Posted: 19 Sep 2019, 16:15:37 UTC

Yes Crystal,
the old VM was from a other task.
Have started a new task and this is deleted correct.
2019-09-19 18:04:03 (12396): Guest Log: HITS file was successfully produced
2019-09-19 18:04:03 (12396): Guest Log: -rw-------. 1 atlas atlas 9192759 Sep 19 16:02 /home/atlas/RunAtlas/HITS.000649-2895230-8215._078090.pool.root.1
ID: 6654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6655 - Posted: 19 Sep 2019, 17:38:05 UTC - in response to Message 6654.  

Now we have no new tasks.
ID: 6655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,332,133
RAC: 500
Message 6656 - Posted: 19 Sep 2019, 19:33:09 UTC - in response to Message 6655.  

I have started sending some real tasks here (the same tasks which are being sent to the production server).
ID: 6656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6657 - Posted: 19 Sep 2019, 20:34:06 UTC - in response to Message 6656.  

I have started sending some real tasks here (the same tasks which are being sent to the production server).
Surprise, surprise. I didn't expect new tasks this evening. Suddenly I got a few and 1 started immediately with my setting of 2 cores.
That will take long, because 200 events and more complicated particle interactions.
I suspended the task, discarded the saved state with VirtualBox Manager, changed the settings from 2 to 4 cores and from 4800 to 6600MB RAM.
I started the VM myself without using BOINC and let it run for a few minutes. Saved the VM and resumed the task in BOINC.
In Remote Display 4 athena's are running.
ID: 6657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6658 - Posted: 20 Sep 2019, 5:33:15 UTC - in response to Message 6657.  
Last modified: 20 Sep 2019, 5:37:24 UTC

That will take long, because 200 events and more complicated particle interactions.
Event processing is lasting between 1599 seconds/event and 2947 seconds/event (58 done so far).
That makes 31 hours and 40 minutes for the whole task plus some time for suspends to test.
First suspend with "Leave applications in memory (LAIM) off". To give the test a chance to succeed, I first suspended the 4 Theory's running alongside the ATLAS-task, so no other threads busy.
OK, first suspend test went well. Saving time 29.3 seconds, that's fast, but the rest of the system almost idle. Save-set size on disk 3,415,400,448 bytes and vm_image.vdi in slot almost 4GB.
Resuming the task went well. To speed up the task a bit, I will let run lesser Theory's (or none) alongside the ATLAS.
ID: 6658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,368,424
RAC: 2,196
Message 6659 - Posted: 20 Sep 2019, 6:49:53 UTC
Last modified: 20 Sep 2019, 6:54:02 UTC

Crystal,
have the same trouble. ;-) One Computer got a unexpected task. But for the test it is ok.
Is it possible to run it in native. -dev have no test-parameter in preferences.
Edit: Seeing atm a lot of lines in RDP with only the sign 222222 than a new line with the event nr.
ID: 6659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,332,133
RAC: 500
Message 6660 - Posted: 20 Sep 2019, 10:52:51 UTC - in response to Message 6659.  
Last modified: 20 Sep 2019, 10:57:30 UTC

I have disabled the native app here for now because the purpose is to test the new VBox image.

I have a couple of successful tasks so far, but maybe I got lucky (or you were unlucky) since the time per event is around 400s. The RDP console looks ok for me.

Edit: I suspended one task after 3 hours of running, the save took 35s for 3.5GB vdi. On resume the task continued from where it was suspended.
ID: 6660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6661 - Posted: 20 Sep 2019, 11:20:58 UTC - in response to Message 6660.  
Last modified: 20 Sep 2019, 11:24:08 UTC

I have a couple of successful tasks so far, but maybe I got lucky (or you were unlucky) since the time per event is around 400s. The RDP console looks ok for me.
You must be lucky. Still busy with my first task and 82 events to go.
Although your machine is a 4790 and mine a 2600 yours will not be twice as fast.
Since I'm running only this ATLAS-task the event times decrease to between 607 and 1555 seconds. You are probably smashing different hadrons.
ID: 6661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1146
Credit: 750,252
RAC: 1,445
Message 6662 - Posted: 20 Sep 2019, 13:05:33 UTC

I suspended my first long runner a second time after 141 events processed.

Saving time 28.3 seconds on an else idle system.
Save-set on disk: 3,313,393,664 bytes
VM image in slot: 4211 MB
ID: 6662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : ATLAS Application : Testing CentOS 7 vbox image


©2022 CERN