Message boards : Theory Application : New Version 5.30
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 481
Credit: 394,720
RAC: 0
Message 7398 - Posted: 19 Jun 2022, 15:33:19 UTC - in response to Message 7396.  

I'm obviously doing something wrong with the graceful shutdown thing

No.
You do it right.
You get computation errors because the task/VM shuts down before it has returned a result file to BOINC.
This leads to:
<message>
upload failure: <file_xfer_error>
  <file_name>Theory_2390-1129919-264_0_r791023812_result</file_name>
  <error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>




2 dev and 1 Production currently running happily

Sooner or later they will fail, at least either dev or prod.
Please be patient and don't run dev and prod concurrently until Laurence fixed it (early next week I guess).
ID: 7398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,952,515
RAC: 456
Message 7399 - Posted: 19 Jun 2022, 16:19:38 UTC - in response to Message 7398.  
Last modified: 19 Jun 2022, 16:30:11 UTC

Ah, ok. I was hoping, perhaps too optimistically, that it would be more elegant, with a return of the partially completed work.
ID: 7399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7401 - Posted: 19 Jun 2022, 17:50:36 UTC - in response to Message 7399.  

Ah, ok. I was hoping, perhaps too optimistically, that it would be more elegant, with a return of the partially completed work.
It was graceful in the past, but since the task-lifetime is extended to 10 days instead of the earlier 18 hours, the shutdown procedure is very rarely used by the wrapper itself.
It means that if you have such a rarely usefull long running job, it will be killed after 10 days and error out, invalid, no credit and thank you for your time and Watts.
I removed (when not testing) the 10 day limit, so I decide myself whether a task is useful and will not run too long.
ID: 7401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 163
Credit: 354,864
RAC: 403
Message 7432 - Posted: 24 Jun 2022, 18:10:48 UTC

Do you plan to pass this new app to LHC@Home project?
ID: 7432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7448 - Posted: 29 Jun 2022, 3:23:31 UTC - in response to Message 7329.  

This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues.

Have started one Theory and seeing this message in stderr.txt
2022-06-29 05:04:38 (5020): Detected: vboxwrapper 26204
2022-06-29 05:04:38 (5020): Detected: BOINC client v7.16.20
2022-06-29 05:04:39 (5020): Detected: VirtualBox VboxManage Interface (Version: 6.1.34)
2022-06-29 05:04:39 (5020): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2022-06-29 05:04:40 (5020): Guest Log: BIOS: VirtualBox 6.1.34
2022-06-29 05:04:40 (5020): Guest Log: CPUID EDX: 0x178bfbff
2022-06-29 05:04:40 (5020): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2022-06-29 05:04:40 (5020): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=80
2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot from Hard Disk 0 failed
2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot : bseqnr=2, bootseq=0003
2022-06-29 05:04:40 (5020): Guest Log: BIOS: CDROM boot failure code : 0002
2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot from CD-ROM failed
2022-06-29 05:04:40 (5020): Guest Log: Could not read from the boot medium! System halted.
2022-06-29 05:04:40 (5020): Starting VM using VBoxManage interface. (boinc_9f8eee29257d3a7a, slot#9)
2022-06-29 05:04:51 (5020): Successfully started VM. (PID = '1916')
2022-06-29 05:04:51 (5020): Reporting VM Process ID to BOINC.
2022-06-29 05:04:51 (5020): VM state change detected. (old = 'poweredoff', new = 'running')
2022-06-29 05:04:51 (5020): Preference change detected
2022-06-29 05:04:51 (5020): Setting CPU throttle for VM. (100%)
2022-06-29 05:04:52 (5020): Setting checkpoint interval to 1200 seconds. (Higher value of (Preference: 1200 seconds) or (Vbox_job.xml: 600 seconds))

Running task have no RDP in Boinc:https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2191880
ID: 7448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7450 - Posted: 29 Jun 2022, 5:26:24 UTC - in response to Message 7448.  

From the result:

Another VirtualBox management application has locked the session for
this VM. BOINC cannot properly monitor this VM
and so this job will be aborted.

2022-06-28 20:52:50 (8028): Could not create VM
2022-06-28 20:52:50 (8028): ERROR: VM failed to start
2022-06-28 20:52:55 (8028):
NOTE: VM session lock error encountered.
BOINC will be notified that it needs to clean up the environment.
This might be a temporary problem and so this job will be rescheduled for another time.


I think you have to cleanup yourself. Best to reset the dev-project.
Clean evt. project directory, slots and with Virtual Media Manager remnants of disks etc.
ID: 7450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7451 - Posted: 29 Jun 2022, 6:09:47 UTC

Thanks,
have cleaned old Theory tasks running from Production with SIGUSR1 Error.
Now four CMS from -dev are running well.
When they are finished making a reboot of the machine. Got this morning also a Win10pro optional update.
Have also a CentOS8-VM with emergency atm, because of Hyper-V testing last week.
This Computer is for all testing (8-Core).
After this reboot will try a Theory in -dev again AND a Atlas ;-).
ID: 7451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7466 - Posted: 30 Jun 2022, 11:45:17 UTC - in response to Message 7451.  
Last modified: 30 Jun 2022, 11:54:37 UTC

CMS is running, Atlas have no Tasks and Theory:
Theory_2390-1115156-266
Status
Verschoben:VM environment needs to be cleaned up.
Have a clean Virtualboxmanager. Theory show entry for Multiattachmode (Theory_2020_05_08.vdi).
FATAL: Could not read from the boot medium! System halted.
ID: 7466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7568 - Posted: 8 Jul 2022, 5:28:48 UTC - in response to Message 7466.  

2022-07-08 06:59:44 (19180): Adding storage controller(s) to VM.
2022-07-08 06:59:44 (19180): Adding virtual disk drive to VM. (Theory_2020_05_08.vdi)
2022-07-08 07:00:17 (19180): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_4c0d24e8ecb45ebb" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi"
Output:
VBoxManage.exe: error: Cannot attach medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later

https://www.virtualbox.org/ticket/18296
ID: 7568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 481
Credit: 394,720
RAC: 0
Message 7569 - Posted: 8 Jul 2022, 6:12:27 UTC - in response to Message 7568.  

ID: 7569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7570 - Posted: 8 Jul 2022, 9:01:31 UTC - in response to Message 7568.  
Last modified: 8 Jul 2022, 9:15:20 UTC

Using multiattach come from which Volunteer?
Boinc 5.2.44 running without postponed!
ID: 7570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7603 - Posted: 21 Jul 2022, 8:17:13 UTC - in response to Message 7570.  
Last modified: 21 Jul 2022, 8:40:56 UTC

Virtualbox 6.1.36 Theory postponed after 46 sec.
Now 1 Atlas-Production AND 2 CMS from -dev!
ID: 7603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7604 - Posted: 21 Jul 2022, 19:18:40 UTC - in response to Message 7603.  

Die Einstellungen konnten nicht gesichert werden.

Cannot attach medium 'S:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later.

Fehlercode: VBOX_E_INVALID_OBJECT_STATE (0x80BB0007)
Komponente: SessionMachine
Interface: IMachine {85632c68-b5bb-4316-a900-5eb28d3413df}
ID: 7604 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7605 - Posted: 21 Jul 2022, 19:59:06 UTC - in response to Message 7604.  

It could be that the two Theory_2020_05_08.vdi's from dev and production have the same UUID
VirtualBox don't like that and could be a reason for your problem.

Find the path to vboxmanage.exe, go there with a cmd.box and check that by using the command:

vboxmanage.exe S:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2020_05_08.vdi
and
vboxmanage.exe S:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi
ID: 7605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7606 - Posted: 22 Jul 2022, 3:42:53 UTC - in response to Message 7605.  
Last modified: 22 Jul 2022, 4:06:26 UTC

This message is, when Theory_2020_05_08.vdi is connected manual in Settings of the Virtualbox-manager after postponed (43 sec. after start of the -dev task).
There is no .vdi connected from the Theory-task.
2022-07-21 21:14:07 (5288): Adding virtual disk drive to VM. (Theory_2020_05_08.vdi)
2022-07-21 21:14:40 (5288): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_34e4e5af62a61d18" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "S:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi"

CMS have connected the multiattach file CMS_2022_06_22.vdi correct and
the different .vdi is connected in snapshot-Folder of the Boinc-Task folder.
ID: 7606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7607 - Posted: 22 Jul 2022, 7:27:54 UTC - in response to Message 7606.  
Last modified: 22 Jul 2022, 7:33:08 UTC

Sorry maeax,

In my commands in my previous post, I forgot the command showmediuminfo you have to give after vboxmanage.exe and before BOINC's virtual disk.

But even when the uuid's are different the postponed error may occur:

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101226

Your bolded line about virtualbox 4.0 is in my opinion a wrong interpretation of the error by VirtualBox.
ID: 7607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7608 - Posted: 22 Jul 2022, 8:27:01 UTC - in response to Message 7607.  

Ok Crystal,
but we two stay alone for Windows and multiattach.
CMS, no problems, can be tranfered to Production.
The other two (Atlas and Theory) need more investigation.
ID: 7608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7609 - Posted: 22 Jul 2022, 9:10:59 UTC - in response to Message 7608.  
Last modified: 22 Jul 2022, 9:22:27 UTC

CMS, no problems, can be tranfered to Production.
I'm not so sure.

It's hard to exactly reproduce the error, so it's really not known what's causing the error.
Most of the time the tasks are running OK, but not always.
CMS runs between 12 and 18 hours, so the creation of a VM is done not so often as with Theory- and these ATLAS test-tasks.
The only we know now: The problem is at the start of a multi-attach VM where no HD can be attached, although the Hard disk controller is added to the VM.
If we can't find the cause of the problem, a solution could be to rewrite vboxwrapper to abort a task instead of postpone the task.
Last Theory error: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101191

I'll try to reproduce the error with CMS, but for that I'll have to shorten the CMS-tasks to force creation of new CMS VM's more often.
This will lead to CMS-erros, but at the end this is a development system, so Ivan has to deal with it.
ID: 7609 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 675
Credit: 1,971,671
RAC: 1,605
Message 7610 - Posted: 22 Jul 2022, 9:27:38 UTC - in response to Message 7604.  

the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later.

for me it seem a timing problem (Atlas and Theory). There need a wait of about 10 sec. to mount HD.
ID: 7610 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,677
RAC: 8
Message 7611 - Posted: 22 Jul 2022, 10:00:03 UTC - in response to Message 7609.  

I'll try to reproduce the error with CMS, but for that I'll have to shorten the CMS-tasks to force creation of new CMS VM's more often.


That was fast.https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101384

and a second one because this one started before I could cleanup:

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101387
ID: 7611 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Theory Application : New Version 5.30


©2024 CERN