41) Message boards : ATLAS Application : ATLAS vbox v.1.18 (Message 7884)
Posted 17 Nov 2022 by Crystal Pellet
Post:
Yesterday the tasks used the new wrapper v26206 as announced, but were still running the old version 1.17 of the application.

Today the new application 1.18-vdi was downloaded.
42) Message boards : News : Server Release 1.4.0 (Message 7883)
Posted 16 Nov 2022 by Crystal Pellet
Post:
Get this line

set_cached_data(): can't open ../cache/f2/server_status.php_job_status

on top of the Server status page.
43) Message boards : News : Server Release 1.4.0 (Message 7871)
Posted 10 Nov 2022 by Crystal Pellet
Post:
When forum posts selected to sort newest first, the newest post is shown below the previous (older) post.
44) Message boards : ATLAS Application : ATLAS vbox v.1.17 (Message 7867)
Posted 8 Nov 2022 by Crystal Pellet
Post:
@Laurence and/or David,

Any plans to update vboxwrapper to the official released version 26206 here at dev and production?
45) Message boards : CMS Application : New Version 60.68 (Message 7850)
Posted 4 Nov 2022 by Crystal Pellet
Post:
A new CMS version was deployed this morning.
No new files were downloaded to my system.

What is the difference to the previous version?

Is it more than avoiding requesting multiple idtokens?
46) Message boards : CMS Application : New Version 60.67 (Message 7847)
Posted 3 Nov 2022 by Crystal Pellet
Post:
I tested 1 CMS-task v60.67 and that returned fine. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3136136
Combination BOINC 7.20.2 and VBox 7.0.2
Somewhere in the middle of the task, I suspended the task a few minutes, where the state was saved to disk.
Towards the end one task-suspend with keep in memory active.

Nothing to do with this vboxwrapper, but I noticed that Vbox7 keeps the used harddisks in the VirtuBox.xml file even after a reboot and no BOINC-VMs in use.

    <MediaRegistry>
      <HardDisks>
        <HardDisk uuid="{6f08958e-7bfd-4804-8dd7-c7b4408cb126}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{997e0796-142b-4278-9763-0bceb3ac71bc}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/ATLAS_vbox_1.17_image.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{dae25e8f-de18-4971-b11c-eca764ede402}" location="D:/Boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi" format="VDI" type="MultiAttach"/>
        <HardDisk uuid="{8fb925ef-3497-4bfb-88e3-bbab2930787f}" location="D:/Boinc1/projects/lhcathomedev.cern.ch_lhcathome-dev/CMS_2022_09_07.vdi" format="VDI" type="MultiAttach"/>
      </HardDisks>
47) Message boards : ATLAS Application : ATLAS vbox v.1.17 (Message 7832)
Posted 21 Oct 2022 by Crystal Pellet
Post:
Tested 1 ATLAS-task with the newest VirtualBox version 7.0.2:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3126629
48) Message boards : General Discussion : Xtrack beam simulation (Message 7831)
Posted 20 Oct 2022 by Crystal Pellet
Post:
I got this afternoon 5 Xtrack beam simulation tasks. All 5 resends - https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4600&offset=0&show_names=0&state=0&appid=16
Tasks are not shown in BOINC's tasks list. This is what is shown in BOINC's log:
598	lhcathome-dev	20 Oct 17:00:50	Requesting new tasks for CPU	
599	lhcathome-dev	20 Oct 17:00:52	[error] Can't parse file info in scheduler reply: file name is empty or has '..'	
600	lhcathome-dev	20 Oct 17:00:52	Scheduler request completed: got 1 new tasks	
601	lhcathome-dev	20 Oct 17:00:52	Project requested delay of 61 seconds	
602	lhcathome-dev	20 Oct 17:00:52	[error] State file error: missing file ../xboinc_input.bin	
603	lhcathome-dev	20 Oct 17:00:52	[error] State file error: missing input file ../xboinc_input.bin	
604	lhcathome-dev	20 Oct 17:00:52	[error] Can't handle task Xtrack_3838109_1664549883.039677 in scheduler reply	
605	lhcathome-dev	20 Oct 17:00:52	[error] State file error: missing task Xtrack_3838109_1664549883.039677	
606	lhcathome-dev	20 Oct 17:00:52	[error] Can't handle task Xtrack_3838109_1664549883.039677_2 in scheduler reply	
607	lhcathome-dev	20 Oct 17:00:54	Started download of xboinc_011-windows_x86_64.exe	
608	lhcathome-dev	20 Oct 17:00:59	Finished download of xboinc_011-windows_x86_64.exe	
620	lhcathome-dev	20 Oct 17:19:05	Sending scheduler request: To fetch work.	
622	lhcathome-dev	20 Oct 17:19:05	Requesting new tasks for CPU	
623	lhcathome-dev	20 Oct 17:19:07	[error] Can't parse file info in scheduler reply: file name is empty or has '..'	
624	lhcathome-dev	20 Oct 17:19:07	[error] Can't parse file info in scheduler reply: file name is empty or has '..'	
625	lhcathome-dev	20 Oct 17:19:07	Scheduler request completed: got 2 new tasks	
626	lhcathome-dev	20 Oct 17:19:07	Project requested delay of 61 seconds	
627	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing file ../xboinc_input.bin	
628	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing input file ../xboinc_input.bin	
629	lhcathome-dev	20 Oct 17:19:07	[error] Can't handle task Xtrack_3838316_1664550120.666061 in scheduler reply	
630	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing file ../xboinc_input.bin	
631	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing input file ../xboinc_input.bin	
632	lhcathome-dev	20 Oct 17:19:07	[error] Can't handle task Xtrack_3838326_1664550122.364841 in scheduler reply	
633	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing task Xtrack_3838316_1664550120.666061	
634	lhcathome-dev	20 Oct 17:19:07	[error] Can't handle task Xtrack_3838316_1664550120.666061_2 in scheduler reply	
635	lhcathome-dev	20 Oct 17:19:07	[error] State file error: missing task Xtrack_3838326_1664550122.364841	
636	lhcathome-dev	20 Oct 17:19:07	[error] Can't handle task Xtrack_3838326_1664550122.364841_2 in scheduler reply	
637	lhcathome-dev	20 Oct 17:34:24	Sending scheduler request: To fetch work.	
638	lhcathome-dev	20 Oct 17:34:24	Requesting new tasks for CPU	
639	lhcathome-dev	20 Oct 17:34:25	[error] Can't parse file info in scheduler reply: file name is empty or has '..'	
640	lhcathome-dev	20 Oct 17:34:25	[error] Can't parse file info in scheduler reply: file name is empty or has '..'	
641	lhcathome-dev	20 Oct 17:34:25	Scheduler request completed: got 2 new tasks	
642	lhcathome-dev	20 Oct 17:34:25	Project requested delay of 61 seconds	
643	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing file ../xboinc_input.bin	
644	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing input file ../xboinc_input.bin	
645	lhcathome-dev	20 Oct 17:34:25	[error] Can't handle task Xtrack_3838302_1664550118.050580 in scheduler reply	
646	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing file ../xboinc_input.bin	
647	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing input file ../xboinc_input.bin	
648	lhcathome-dev	20 Oct 17:34:25	[error] Can't handle task Xtrack_3838322_1664550121.701115 in scheduler reply	
649	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing task Xtrack_3838302_1664550118.050580	
650	lhcathome-dev	20 Oct 17:34:25	[error] Can't handle task Xtrack_3838302_1664550118.050580_2 in scheduler reply	
651	lhcathome-dev	20 Oct 17:34:25	[error] State file error: missing task Xtrack_3838322_1664550121.701115	
652	lhcathome-dev	20 Oct 17:34:25	[error] Can't handle task Xtrack_3838322_1664550121.701115_2 in scheduler reply	
49) Message boards : ATLAS Application : ATLAS vbox v.1.17 (Message 7827)
Posted 19 Oct 2022 by Crystal Pellet
Post:
I tested several tasks. Not sure what problem should be solved.
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4547&offset=0&show_names=0&state=0&appid=5
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4600&offset=0&show_names=0&state=0&appid=5
On host id 4547 I had three errors. 1 was part of the first 5 tasks starting at once. On that task I got a popup of a vboxheadless.exe application error.

With 2 other tasks I tested the netwoork connection problem by interrupting my internet when starting an ATLAS task.
I got Testing CVMFS .... of course without response. The tasks keep on running endless.
After connecting to the internet again also nothing positive happened / tasks keep on running doing nothing.
So I suspended the tasks, removed the saved states, rebooted the VMs and saved the state after over 5 minutes of runtime.
Then I resumed the tasks in BOINC again.
1 task got the same VBoxHeadless application error as above and the other was canceled by the server although running well.
50) Message boards : General Discussion : Server error (Message 7817)
Posted 21 Sep 2022 by Crystal Pellet
Post:
lhcathome-dev 21 Sep 13:02:14 Sending scheduler request: To report completed tasks.
lhcathome-dev 21 Sep 13:02:14 Reporting 1 completed tasks
lhcathome-dev 21 Sep 13:02:14 Not requesting tasks: "no new tasks" requested via Manager
lhcathome-dev 21 Sep 13:02:15 Scheduler request completed
lhcathome-dev 21 Sep 13:02:15 Server error: recompile needed
lhcathome-dev 21 Sep 13:02:15 Project requested delay of 3600 seconds
51) Message boards : CMS Application : New Version 60.66 (Message 7813)
Posted 17 Sep 2022 by Crystal Pellet
Post:
Thought it was a interrupt at 13 UTC in the CMS-Servers (WM-Agent upgrade).
No, there were enough jobs.

Since 05.30 UTC this morning we ran out of CMS-jobs.
52) Message boards : CMS Application : New Version 60.66 (Message 7811)
Posted 17 Sep 2022 by Crystal Pellet
Post:
Since 2 hours (13 UTC) task is doing nothing. 9. task have 1.4 MByte Data, but did not finished.
Runtime now 13 hours 45 min.

Your task also ended after the hard-coded job duration of 64800 seconds (18 hours).
So did Ivan's v60.66 tasks:
Runtime		CPU seconds
64,967.01	39,813.72
64,966.98	39,689.98
64,885.72	5,572.52
64,886.27	5,588.20
64,907.93	5,049.78

The shutdown after 12 hours runtime and a finised job is not working.
On LHC@home there is even a more sophisticated methode to calculate, whether it's worth to request a new job even before the first 12 hours are over.
53) Message boards : CMS Application : New Version 60.66 (Message 7808)
Posted 16 Sep 2022 by Crystal Pellet
Post:
The task I started yesterday evening is still in a running state, but not doing a cms-job. 4 jobs has finished.
It is not (yet) finished gracefully by the VM itself although:

09/16/22 12:09:14 (pid:15847) The DaemonShutdown expression "(STARTD_StartTime =?= 0)" evaluated to TRUE: starting graceful shutdown
09/16/22 12:09:14 (pid:15847) Got SIGTERM. Performing graceful shutdown.
09/16/22 12:09:14 (pid:15847) About to tell the ProcD to exit
09/16/22 12:09:14 (pid:15847) All daemons are gone.  Exiting.
09/16/22 12:09:14 (pid:15847) **** condor_master (condor_MASTER) pid 15847 EXITING WITH STATUS 99

Run time over 15 hours and CPU-time over 14 hours.
There is only 1 boinc process active inside the VM: bash.
I'll wait another hour or until the hard shutdown by vboxwrapper after 18 hours.

pid 15847 EXITING WITH STATUS 99: What does that mean?

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3112985
54) Message boards : CMS Application : New Version 60.66 (Message 7805)
Posted 16 Sep 2022 by Crystal Pellet
Post:
The 2nd, 3rd and 4th job of this task did not have these connection issues.
55) Message boards : CMS Application : New Version 60.66 (Message 7802)
Posted 15 Sep 2022 by Crystal Pellet
Post:
That certainly looks better. Transient network problems earlier?
I had and have no network problems. I'm not using any proxy.
56) Message boards : CMS Application : New Version 60.66 (Message 7800)
Posted 15 Sep 2022 by Crystal Pellet
Post:
Jobs now available again. I believe the little glitch here in -dev has been repaired but I won't be able to confirm it myself until mid-morning tomorrow.

Not looking good so far:
cmsRun  -j FrameworkJobReport.xml PSet.py
warn  [frontier.c:1014]: Request 507 on chan 1 failed at Thu Sep 15 21:48:52 2022: -6 [fn-socket.c:239]: read from 172.64.206.32 timed out after 10 seconds
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[172.64.207.32]
warn  [frontier.c:1014]: Request 701 on chan 1 failed at Thu Sep 15 21:49:07 2022: -6 [fn-socket.c:239]: read from 172.64.207.32 timed out after 10 seconds
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[2606:4700:e6::ac40:cf20]
warn  [frontier.c:1014]: Request 702 on chan 1 failed at Thu Sep 15 21:49:07 2022: -9 [fn-socket.c:85]: network error on connect to 2606:4700:e6::ac40:cf20: Network is unreachable
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[2606:4700:e6::ac40:ce20]
warn  [frontier.c:1014]: Request 703 on chan 1 failed at Thu Sep 15 21:49:07 2022: -9 [fn-socket.c:85]: network error on connect to 2606:4700:e6::ac40:ce20: Network is unreachable
warn  [frontier.c:1136]: Trying next server cms1-frontier.openhtc.io
warn  [frontier.c:1014]: Request 704 on chan 1 failed at Thu Sep 15 21:49:27 2022: -6 [fn-urlparse.c:178]: host name cms1-frontier.openhtc.io problem: Name or service not known
warn  [frontier.c:1136]: Trying next server cms2-frontier.openhtc.io

... but finally:
Begin processing the 1st record. Run 1, Event 1920001, LumiSection 3841 on stream 0 at 15-Sep-2022 21:51:36.256 CEST
57) Message boards : CMS Application : New Version 60.65 (Message 7774)
Posted 1 Sep 2022 by Crystal Pellet
Post:
After initializing and before processing the events, I see

cmsRun -j FrameworkJobReport.xml PSet.py
warn [fn-htclient.c:530]: Retrying after system error
58) Message boards : CMS Application : New Version 60.64 (Message 7724)
Posted 11 Aug 2022 by Crystal Pellet
Post:
The information that should have been shown, when using ALT-key's is still not displayed in this version.

ALT-F1 Startup console is shown
ALT-F2 shows only the dummy message (job output - event processing)
ALT-F3 "top" is shown correctly
ALT-F4 shows only the dummy message (output of job wrapper)
ALT-F5 shows only the dummy message (error messages)
ALT-F6 login screen is shown
59) Message boards : ATLAS Application : ATLAS vbox v.1.15 (Message 7715)
Posted 2 Aug 2022 by Crystal Pellet
Post:
100 Atlas-Tasks per day and PC.
This never ending tasks are only a handful per day.
You can see this in production for this two PC's.

ALT+F1 or ALT+F2 or ALT+F3 in Virtualbox is not avalaible.
Even when there are only a few, it's worth to find the cause, because they are running endless occupying a slot.
F-key's not available in BOINC's Console is a sign that the VM did not started through.
In Oracle VM VirtualBox Manager you may click the VM when you have that issue and touch the button with the green right arrow "Show" (Zeigen) from the top menu.
Maybe you only get a black screen and have to wake up the display with e.g. only the Alt-key.
You probably also good improve your throughput by reducing the number of cores to 8 per VM and increase the number of tasks to 7 or even 8.
60) Message boards : ATLAS Application : ATLAS vbox v.1.15 (Message 7710)
Posted 2 Aug 2022 by Crystal Pellet
Post:
There are not so many Atlas-Tasks in -dev to see such a problem.
Why, is there no chance for the Atlas-Team to take a deeper look?

Is it ok, to test this, when the new wrapper205 is in production?
It's always OK to test things, especially if something not common happens.
You have to provide as much information as possible.
What else is running on the machine. Do you use a second (Linux) VM on a (Windows) machine and run BOINC from there.
Of special interest: What do you see in VM-Console with ALT-F1?
I see very often "Checking CVMFS ....", but without a response.
Do you start several VM's at once?


Previous 20 · Next 20


©2024 CERN