1)
Message boards :
Sixtrack Application :
Xtrack beam simulation
(Message 7934)
Posted 7 Mar 2023 by ![]() Post: Two Xtracks that I'm about to euthanise 2283960 and 2283961. Both completed on Linux in a couple of seconds but the Windows wingman Aborted after 2 days. Is that significant? Mine (Windows) are 29 and 21 hours respectively. Both have been using all of a core each but I'm pretty sure they haven't actually done anything. This host only has onboard graphics. Are these supposed to only go to proper GPUs ? Boboviz's github link looks far to technical for me. If setting all that up is required to get these running, I'll set to not accept these for now and just go back to Theory tasks. |
2)
Message boards :
Sixtrack Application :
Xtrack beam simulation
(Message 7916)
Posted 2 Feb 2023 by ![]() Post: I don't know what the expected behaviour of these is so here are my observations in the hope that they are helpful: I had a successful 20 second Xtrack yesterday, where the wingmen completed in similar time but with error, and another running since yesterday. Progress 100%, 33hrs elapsed, none remaining. In Properties, CPU Time and Since last checkpoint are ~1/2 hr behind Elapsed Time but, in the slot, stderr says 12:29:53 (13692): app started; CPU time 20.000000, flags: but nothing has been written to any of the slot files since yesterday. It's using all of one core. The wingman completed in 2.05s so I suspect it is stuck. A restart of Boinc reset the clocks Initial estimate of 3?mins 50% reached at 2mins but progress, and time remaining, getting progressively slower. 91% at 7mins. 99.60% at 23mins, remaining 00:00:00. Using ~70% of a core. 99.999% at 33mins. 100% at 45mins. Using almost all of a core and I'll let it do that overnight but I expect it will have to be euthanised tomorrow. |
3)
Message boards :
General Discussion :
Boinc VM App
(Message 7835)
Posted 21 Oct 2022 by ![]() Post: It was an experimental app from July/August 2019 that was useful in developing the apps we are running now. Those unsent tasks became "stuck" when that experiment concluded with the release of the app(s) that replaced it. It's probably more trouble than it's worth to manually remove the offending tasks. |
4)
Message boards :
Theory Application :
New Version 5.40
(Message 7719)
Posted 2 Aug 2022 by ![]() Post: Great, thanks for that clarification update ๐ |
5)
Message boards :
Theory Application :
New Version 5.40
(Message 7707)
Posted 1 Aug 2022 by ![]() Post: Ah, ok. I've not been paying attention recently, just been letting run with the VBox window closed. Really we're waiting for VBox themselves for a proper fix. |
6)
Message boards :
Theory Application :
New Version 5.40
(Message 7705)
Posted 1 Aug 2022 by ![]() Post: 5.40 Theory -dev and Production Theory seem to have been playing nicely together overnight and today but I would still like to see if, on finishing, the last attached -dev allows the .vdi to be released. It was the transition where there was no -dev attached but the vdi was not released, and subsequent tasks could not re-attach to it, was where I previously had postponements. As is typical when I want to watch that, I have a task that is at 60% after 14hrs so it'll be tomorrow evening before I can watch what happens. Win10 Boinc 7.20.2 VBox 6.1.36 |
7)
Message boards :
ATLAS Application :
ATLAS vbox v.1.13
(Message 7505)
Posted 4 Jul 2022 by ![]() Post: There still appears to be a conflict when running LHC and -dev together. I tried a few LHC over the weekend and THEY run fine but new -dev tasks all stopped with a similar error. Setting No New Tasks and allowing the LHC ones to finish, I then exited Boinc, deleted everything in the slots and removed the .vdi from VBox, then closed the VBox window as advised previously. Restarted Boinc and all is well again with only -dev tasks running. (This is with Theory but seems it might also apply to Atlas if Maeax is running LHC and -dev) |
8)
Message boards :
Theory Application :
New Version 5.30
(Message 7399)
Posted 19 Jun 2022 by ![]() Post: Ah, ok. I was hoping, perhaps too optimistically, that it would be more elegant, with a return of the partially completed work. |
9)
Message boards :
Theory Application :
New Version 5.30
(Message 7396)
Posted 19 Jun 2022 by ![]() Post: I wasn't able to pay attention yesterday but I had left Boinc running with a 1/3, LHC/dev resource share so there would always be at least 1 dev attached. They have played well together. This may be because the Theory vdi was never left without a task attached. I have yet to test whether a new task attaches after the vdi is completely released. @Computezrmle Both tasks stopped at 2022-06-17 19:29:17 BST and successfully restarted at 2022-06-17 19:29:51 BST.I had previously just Aborted postponed tasks, thinking them to be unrecoverable. This successful restart would have been where I removed the vdi from VBox. I may have then closed the VBox window, as suggested earlier. @Crystal I'll give that shutdown a try on a Production task first. 1 of the dev tasks is a 4-day Sherpa so I'd rather let it run to completion or, if the practice shutdown is successful, return at least partial useful work. I'd rather euthanise it than murder it. [Later] โน๏ธWell, I'm obviously doing something wrong with the graceful shutdown thing. The VMs did shut down, but not gracefully so all 3 attempts lost to Computation Error๐ข. However, with VBox window not open, the vdi is removed on shutdown, whereas when I had it open to see what was happening, the vdi was not removed so I believe the problem is with VBox not allowing the unattached Theory image to be fully released, with that window open, and thus not available for reattachment, rather than with the wrapper itself. The vm_image.vdi does get released even when viewing VBox so I don't know why the Theory one doesn't๐ค Easiest solution is, therefore, to not be viewing VBox while the last attached task is ending. With VBox closed, 2 dev started at the same time and both failed (not postponed, however). Starting one then another seems to be OK. 2 dev and 1 Production currently running happily.๐ I hope someone can make sense of what's been happening over the past couple of days and it will help in finding a more robust solution. Thanks C & CP for your assistance |
10)
Message boards :
Theory Application :
New Version 5.30
(Message 7384)
Posted 17 Jun 2022 by ![]() Post: <VirtualBox xmlns="http://www.virtualbox.org/" version="1.12-windows"> Similar to Crystal, I almost always have VBox open so I can see whether the vms are being created and starting. Both tasks that replaced today's earlier successes stopped Postponed after 40? seconds. When the first successful one finished, Boinc didn't request new work until the 2nd one had finished (possibly due to my app_config limiting the number of running tasks, which I have temporarily removed) so the vdi was left unattached to anything and the replacements failed to attach to it. The app_config contains nothing other than max_concurrent lines but seemed to be causing a blockage. I sacrificed 2 LHC tasks to give -dev a clear run and so as not to cause myself unnecessary confusion. I tried a few things to get those 2 postponed's running Exit Boinc, delete powered-off vms, restart Boinc -- Postponed Exit Boinc, delete vm, empty each slot, restart Boinc -- Postponed Exit Boinc delete vm, empty slots, Remove vdi from VBox -- Successful start (I did start them individually just in case) The removal of the app_config allowed a new task to download on completion of the short-running replacement which immediately started up and successfully attached to the image which was still attached to the other running vm. While I have been writing this, another task has ended and been successfully replaced ๐ This part, at least, is now working for me. I'm still concerned that new tasks might be unable to attach to the image if there is not another one already attached. I have yet to master how to gracefully end a task so I will have to wait a few hours until these tasks finish to test whether a new task will attach to the vdi when no others are already attached. Crystal said earlier that they do but my original problem was because they didn't. [ In the job xml there is the line <completion_trigger_file>shutdown</completion_trigger_file> I created a text file named shutdown in the slot but I have no idea how to implement it Oops. I'm more graphical than Command Line so I killed one, accidentally, using fsutil to create a shutdown file but it did that very much less than gracefully and resulted in a Computation error ๐] Probably Sunday before I get another chance to play with it so I've upped the resource share so it shouldn't run dry and fired up LHC again, in case it does. Not problems; just learning opportunities. Anything is possible with the right attitude ... and a hammer ๐ |
11)
Message boards :
Theory Application :
New Version 5.30
(Message 7369)
Posted 17 Jun 2022 by ![]() Post: The 2 that I started this morning have completed successfully 8ยฌ) but I won't know if their replacements are ok until I get home. Both were attached to the vdi, as seen in Media Manager by 2 separate long strings of characters in a dropdown. |
12)
Message boards :
Theory Application :
New Version 5.30
(Message 7364)
Posted 17 Jun 2022 by ![]() Post: Thanks for looking, I'll look for that when I get home. I don't know what my earliest version of VBox was but it would have been from the time of the first Theory jobs. There have been many uninstalls and upgrades since then so I wouldn't think there would be any of an old version left over, unless there is some fragment lurking somewhere in registry. The overnight test didn't work as well as expected, with another Postponed task. I Aborted it and again manually removed the powered-off vm and the image to allow another to start. I'll report in again when I get home c.17:00UTC |
13)
Message boards :
Theory Application :
New Version 5.30
(Message 7360)
Posted 16 Jun 2022 by ![]() Post: Conjecture awaiting further observation: A single task registers the .vdi in VBox Media Manager and the vm attaches and starts up successfully. Starting a 2nd task also successfully attaches to the image and runs (all good so far). If one of those tasks finishes while another is still running, the ending one detaches and a new one attaches (again, all good). If there is continuity of at least one vm attached to the image then there is continued success BUT if an ending task is not replaced and the last connected vm detaches, such that there is no vm attached to the image, the image remains in Media Manager but subsequent tasks are unable to attach to it, resulting in the Postponed/cleanup error. Manual removal of the image in VBox before a new task starts allows normal service to resume. Overnight, I have limited LHC to only one running task to test Part 1 so there should always be at least one -dev task attached, with rolling replacement, and I don't expect any problem. Part 2 will need closer observation to confirm but I won't be able to do that until Friday evening after work as my other host has died so I'm down to only this one. |
14)
Message boards :
Theory Application :
New Version 5.30
(Message 7358)
Posted 16 Jun 2022 by ![]() Post: My tasks https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=196 1 success then a few Postponed:environment needs to be cleaned up (or similar wording) A Project reset got it working again for a while but, returning home this evening, I find 2 more Postponed. A Boinc restart lets them start again but, although running in Boinc, the VM shows FATAL: could not read from the boot medium! System halted. I suspect they would do nothing useful until timeout so I have Aborted them, which leaves behind the Powered-off vm which has to be manually removed. The Theory_2020_05_08.vdi also needs to be Removed (but kept) to allow the next task to start successfully. (2 instances are attached to that when they are running correctly) I have 3 cores allocated to Boinc with a maximum of 2 from either LHC or -dev allowed to run concurrently (2 LHC, 1 -dev or 1 LHC, 2 -dev) so 5 consecutive successes suggests that it does sometimes clean up on the way out, but the the Postponed ones suggest this is not always the case. Maybe sometimes being Multi-attached, sometimes singly, is confusing it? 1 -dev & 2 LHC running just now after manually doing the cleanup. I don't see others reporting similar issues but I hope this input is helpful. Win 10 |
15)
Message boards :
Theory Application :
New Version v5.19
(Message 6946)
Posted 15 Jan 2020 by ![]() Post: My Linux VBox Theory 5.19s show the same output, so not just Windows. Otherwise it works fine, just doesn't show the job in Alt-F1. Still available from 1st line of running.log. 5.20 for Windows does indeed fix this. |
16)
Message boards :
ATLAS Application :
Testing CentOS 7 vbox image
(Message 6802)
Posted 2 Nov 2019 by ![]() Post: 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED Quite annoying after 4+ days as it otherwise ran fine. Logs show 2019-11-01 23:36:09 (1796): Guest Log: HITS file was successfully produced 2019-11-01 23:36:42 (1796): Guest Log: Successfully finished the ATLAS job! and 2019-11-01 23:36:49 (1796): Guest Log: *** Success! Shutting down the machine. *** so it would seem the error happened during post-processing. Forgot to mention I'm also seeing the full download if there is a break in work. If work is contiguous it only downloads the stuff to run the new job but if there is a break between finishing and uploading a job and requesting a new one, the vdi is downloaded again. I've not checked if the vdi is being deleted on completion or if it is being overwritten. I'll look 2moro. |
17)
Message boards :
Theory Application :
New version 5.00
(Message 6707)
Posted 27 Sep 2019 by ![]() Post: Yes, "Show Graphics" now goes to the actual running job. On the Linux host, I'm getting yellow-triangle ghost images left behind in VBox Media Manager when a task finishes, which have to be manually deleted. I freely admit to being not much good with Linux so it could be something I haven't set up correctly here. No ghosts on Windows hosts. |
18)
Message boards :
Theory Application :
New version 5.00
(Message 6695)
Posted 26 Sep 2019 by ![]() Post: Windows host also completed its task, reported, credited and, with all other cores busy, booted up and started a new job. |
19)
Message boards :
Theory Application :
New version 5.00
(Message 6694)
Posted 26 Sep 2019 by ![]() Post: I allowed the Linux machine and 1 Windows host to get 1 of the latest 5.07s each so as not to have to Abort any on finding that it didn't work. From CP's comments on them possibly not being too happy if they weren't getting all the attention on start-up, I suspended most other work on those hosts and both are currently running Jobs. WooHoo I resumed all other LHC sixtracktest tasks but I have limited these at single-core, single Job so we'll see if they start up ok normally when these finish. Linux box finished 1 job, Task reported and credited, new task booted fine and new Job started. Strangely, "Show graphics" on both hosts lands on the SAME partially complete Vincia job, that I'm not running, but clicking through to the logs gets to the logs of the actual running jobs. Even the new Tasks again lands there. |
20)
Message boards :
Theory Application :
New version 5.00
(Message 6677)
Posted 23 Sep 2019 by ![]() Post: Single-core Base Memory has gone up from 730MB to 1500MB which means that where my 2-core Linux host with 4GB of RAM used to be able to run 2 x Theory (VBox or Native), it can now only run 1, with another "waiting for memory". 1 just started on a Windows host: 2-core (accidentally, I usually prefer singles) memory 2250MB From the BoincVM thread "it should be possible to manage the BOINC client in the Guest via a Web browser." "Show Graphics" button gets to an Apache landing page. Is this where that control will be? Same output as CP. Comparing the stderr of Laurence's successful task: mine gets to the line corresponding to 2019-09-23 16:00:54 (11016): Guest Log: 00:00:00.008616 main 5.2.6 r120293 started. Verbose level = 0 but no further. The next line is the shutdown so I'll leave it for now and see what happens in an hour or so. These might just be "Blanks", not intended to do any actual work as presumably there would be details of a Job between those 2 lines. |
©2023 CERN