Message boards : Number crunching : exceeded disk limit
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 8 Apr 15 Posts: 795 Credit: 13,775,303 RAC: 8,930 ![]() ![]() ![]() |
4/10/2015 1:10:30 PM | ATLAS@home | Aborting task 40GKDmnAi0lnDDn7oo6G73TpABFKDmABFKDmiiLKDmABFKDmhOKPIn_2: exceeded disk limit: 6528.21MB > 4291.53MB First CMS task ran fine while at the same time I was running Einstein GPU X2,vLHC X2,and Atlas X2 Second CMS task it would only let me run E-GPU X1 and vLHC X2 and Atlas X2 Third CMS task it will only let me run vLHC X2 and Einstein GPU X1 No longer can run any Atlas at the same time. Never had this happen before. Is that why not many members did more than one CMS task before they quit? Mad Scientist For Life ![]() |
Send message Joined: 5 Apr 15 Posts: 3 Credit: 1,606,870 RAC: 0 ![]() ![]() |
CMS-dev workunits like to leave their disk images behind, check your slots there's probably one or more with only a .vdi left in them. If you delete the vdi you should be good to go again. Why this causes other projects to get that exceeded disk space error even if Boinc has lot's of available space left I don't know but it's very annoying... ![]() |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 795 Credit: 13,775,303 RAC: 8,930 ![]() ![]() ![]() |
Yes I always image *remove* vdi's on all of my hosts since I am a long time T4T/vLHC member and the same with Atlas so I am used to doing that on my 6 hosts. I only have CMS running on my 8-core and it is my only SSD Of course I also run all of mine 24/7 and in my preferences have them all set at no limit. Strange that it worked normal during the first CMS and has got worse the following 2 tasks here. But then I did not have any problem getting the first 2 tasks here complete and validated. |
Send message Joined: 5 Apr 15 Posts: 3 Credit: 1,606,870 RAC: 0 ![]() ![]() |
My first CMS-devs were also ran on a rig using an SSD and I had the same result. I thought it might be due lack of space on the ssd and so switched CMS over to two other dualie rigs that use hdds. What was puzzling was it still happened on one of those even though Boinc has 180GB+ of space available. This rig was also running 2x Atlas, 2x Edge and 8x SRBase wus plus an Einstein on the gpu. Once I removed the leftover vdi it has been back to zero errors. The second rig had no leftovers and no disk space error problems. Both are due to finish their second CMSs tomorrow and it will be interesting to see what happens then. I also run T4T, vLHC and Atlas and haven't had this particular problem with them. I just find it weird that the CMS wus seem to keep running fine but yet cause problems for wus from other projects... ![]() |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 ![]() |
4/10/2015 1:10:30 PM | ATLAS@home | Aborting task 40GKDmnAi0lnDDn7oo6G73TpABFKDmABFKDmiiLKDmABFKDmhOKPIn_2: exceeded disk limit: 6528.21MB > 4291.53MB Hello, New Member too. My 1st job failed as well. and a bunch of Sixtrack and SETI bombed out with Disk Usage Exceeded error too. Job No2 is running so far... [edit]The VDI has grown to over 4Gb, and in the VM the loadav has grown to about 8, with each task now taking over 30 mins.[/edit] |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 ![]() |
The VDI has grown to over 4Gb, and in the VM the loadav has grown to about 8, with each task now taking over 30 mins. In fact the VM has now stalled using 1 CPU core to 100%, the console shows a frozen TOP and Apache wont connect. Lets try 26125 |
![]() ![]() Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 ![]() |
14 errors for me on this host since joining CMS-dev a couple of days ago. Sixtrack wu fails with LHC@home 1.0 | Aborting task.....exceeded disk limit: 6860.42MB > 572.20MB before it gets a chance to start. CMS itself appears to run fine and I'm not seeing any debris left behind after it finishes. Don't know how or why there should be this interaction but others have reported similar here, and further down this thread. No interference with vLHC VM but I'm only allowing 1 of them to run just now while there's Sixtrack work available. Currently running: 1 x CMS 1 x vLHC 2 x Sixtrack 8GB RAM with another 4GB allocated on a ReadyBoost USB stick. Boinc 7.4.42 VBox 4.3.26 |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 ![]() ![]() |
For my first CMS unit, when it completed I noticed it caused one my three running ATLAS tasks to error-out with a "computation error". The ATLAS output message is as follows: Stderr output <core_client_version>7.4.42</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> </stderr_txt> ]]> |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 ![]() ![]() |
I am also getting a bunch of these ATLAS errors now: Stderr output <core_client_version>7.4.42</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> 2015-04-16 18:04:25 (12728): vboxwrapper (7.5.26110): starting 2015-04-16 18:04:25 (12728): Feature: Checkpoint interval offset (391 seconds) 2015-04-16 18:04:26 (12728): Detected: VirtualBox 4.3.26r98988 2015-04-16 18:04:26 (12728): Detected: Minimum checkpoint interval (900.000000 seconds) 2015-04-16 18:04:26 (12728): successfully copied 'init_data.xml' to the shared directory. 2015-04-16 18:04:34 (12728): Create VM. (boinc_d5363a11d7601e3a, slot#1) 2015-04-16 18:04:34 (12728): Updating drive controller type and model for desired configuration. 2015-04-16 18:04:35 (12728): Setting CPU Count for VM. (1) 2015-04-16 18:04:35 (12728): Setting Memory Size for VM. (2048MB) 2015-04-16 18:04:35 (12728): Setting Chipset Options for VM. 2015-04-16 18:04:35 (12728): Setting Boot Options for VM. 2015-04-16 18:04:36 (12728): Setting Network Configuration for NAT. 2015-04-16 18:04:36 (12728): Disabling USB Support for VM. 2015-04-16 18:04:36 (12728): Disabling COM Port Support for VM. 2015-04-16 18:04:36 (12728): Disabling LPT Port Support for VM. 2015-04-16 18:04:37 (12728): Disabling Audio Support for VM. 2015-04-16 18:04:37 (12728): Disabling Clipboard Support for VM. 2015-04-16 18:04:37 (12728): Disabling Drag and Drop Support for VM. 2015-04-16 18:04:38 (12728): Adding storage controller to VM. 2015-04-16 18:04:38 (12728): Adding virtual ISO 9660 disk drive to VM. (vm_isocontext.iso) 2015-04-16 18:04:38 (12728): Adding VirtualBox Guest Additions to VM. 2015-04-16 18:04:38 (12728): Adding virtual cache disk drive to VM. (vm_cache.vdi) 2015-04-16 18:04:39 (12728): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2015-04-16 18:04:39 (12728): Enabling network access for VM. 2015-04-16 18:04:39 (12728): forwarding host port 52085 to guest port 80 2015-04-16 18:04:39 (12728): Enabling remote desktop for VM. 2015-04-16 18:04:40 (12728): Enabling shared directory for VM. 2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log used. 2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log Not Found. 2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log used. 2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log Not Found. 2015-04-16 18:04:40 (12728): Starting VM. </stderr_txt> ]]> Guess I will need to clean up my VM, will that help? |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
Guess I will need to clean up my VM, will that help? Also have a look in BOINCData slot directories for unwanted files. |
![]() ![]() Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 ![]() |
Follow-up on my sixtrack exceeded-space errors below: I have noticed that when CMS finishes, it doesn't quite do a full cleanup and leaves a disk image .vdi in the slot. I guess that Sixtrack tries to use that slot, thinking it is empty but finds the .vdi which pushes it over the permitted size, causing the error. I have reset LHC 1.0 just in case anything else may have become corrupted but I'll need to wait for the next batch of work to see if that fixes it. Until a fix can be found I'll only let CMS and sixtrack run when the other isn't and will delete the relevant slot to clear the debris when required. |
![]() ![]() Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 ![]() |
I've just posted this over at Einstein in the hope that my observations will get to whomever can make use of them. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
A 'private drop' (bare BOINC client, no installer, place in existing BOINC program directory) has been made available for people willing to test this. I believe it will also have the extra <slot_debug> logging that David mentioned. x86: http://boinc.berkeley.edu/dl/boinc.040515.x86.zip x64: http://boinc.berkeley.edu/dl/boinc.040515.x64.zip If you use it, please enable the extra logging and report back (whether success or failure) - we're looking for proper deletion of the .vdi file in all cases. |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
If you use it, please enable the extra logging and report back (whether success or failure) - we're looking for proper deletion of the .vdi file in all cases. 1 05 May 12:41:16 Starting BOINC client version 7.5.1 for windows_x86_64 2 05 May 12:41:16 This a development version of BOINC and may not function properly 3 05 May 12:41:16 log flags: file_xfer, sched_ops, task, slot_debug 40 05 May 12:41:16 Version change (7.4.42 -> 7.5.1) 81 CMS-dev 05 May 12:41:38 task CMS_30750_1427806618.586224_0 resumed by user 82 05 May 12:41:39 [slot] removed file slots/0/init_data.xml 83 05 May 12:41:39 [slot] removed file slots/0/boinc_temporary_exit |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
That seems normal - I'm seeing init_data.xml being removed and replaced regularly with earlier versions too. The interesting thing will be to compare the list of files deleted at the end of the run, between the earlier versions (which I will post - later today) and this private drop version - and to compare both with what happens in the slot directory, of course. |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
Btw: I'm running vboxwrapper 26166 in stead of the stock application 26165 and have set in the vbox job xml the flags <enable_vm_savestate_usage/> <disable_automatic_checkpoints/> However I made 1 snapshot myself between switching BOINC releases. I've seen with vLHC that those manual snapshots are properly cleared when the task finishes. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
OK, my first task finished, running under the previous pre-release BOINC v7.5.0 05/05/2015 16:51:14 | CMS-dev | Message from task: 0 It looks like a proper match with the file list for the directory while the task was running: D:\BOINCdata\slots\6\boinc_lockfile D:\BOINCdata\slots\6\boinc_task_state.xml D:\BOINCdata\slots\6\init_data.xml D:\BOINCdata\slots\6\stderr.txt D:\BOINCdata\slots\6\VBox.log D:\BOINCdata\slots\6\vboxwrapper_26165_windows_x86_64.exe D:\BOINCdata\slots\6\vboxwrapper_26165_windows_x86_64.pdb D:\BOINCdata\slots\6\vbox_checkpoint.xml D:\BOINCdata\slots\6\vbox_job.xml D:\BOINCdata\slots\6\vbox_remote_desktop.xml D:\BOINCdata\slots\6\vbox_webapi.xml D:\BOINCdata\slots\6\vm_floppy_6.img D:\BOINCdata\slots\6\vm_image.vdi and yes, vm_image.vdi was removed as it should have been. Rats. OK, upgrading to the private drop for the next one. |
![]() ![]() Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 ![]() |
Upgraded to 7.5.1 and set the slots flag and get similar output to Richard with all files deleted as they should. (As soon as you take your car to a mechanic, the rattle stops!) Maybe something else has changed between 7.4.42 and 7.5.1 which has unintentionally fixed the slot clean-out error? Task on the other machine is due to finish in an hour or so, so I'll watch it finish as well. [Afterthought] When the second one finishes, I'll go back to the earlier version but will leave the .pdb in the Boinc folder to see if it will produce any different output when the next one finishes. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
My second one completed as it should, cleaning up all files and not causing any errors for the following task. So, who's seeing these errors, and what's the common factor? Version of VBox? Version of Windows? Running as a service? In my case - 4.3.26, 7, no. |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
This is what programmers love: Error happening every than, but not now ;) Task finished after 24½ hours wall time leaving an empty map. From the messages: CMS-dev 06 May 09:18:28 Message from task: 0 06 May 09:18:28 [slot] cleaning out slots/0: handle_exited_app() 06 May 09:18:28 [slot] removed file slots/0/boinc_finish_called 06 May 09:18:28 [slot] removed file slots/0/boinc_task_state.xml 06 May 09:18:28 [slot] removed file slots/0/init_data.xml 06 May 09:18:28 [slot] removed file slots/0/output 06 May 09:18:28 [slot] removed file slots/0/stderr.txt 06 May 09:18:28 [slot] removed file slots/0/VBox.log 06 May 09:18:28 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe 06 May 09:18:28 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb 06 May 09:18:28 [slot] removed file slots/0/vbox_checkpoint.xml 06 May 09:18:28 [slot] removed file slots/0/vbox_job.xml 06 May 09:18:28 [slot] removed file slots/0/vbox_remote_desktop.xml 06 May 09:18:28 [slot] removed file slots/0/vbox_webapi.xml 06 May 09:18:28 [slot] removed file slots/0/vm_floppy_0.img 06 May 09:18:28 [slot] removed file slots/0/vm_image.vdi CMS-dev 06 May 09:18:28 Computation for task CMS_30750_1427806618.586224_0 finished Contents of slot 0 including subdirs just before finish: Map van D:\Boinc1\slots\0 06-05-2015 07:13 <DIR> . 06-05-2015 07:13 <DIR> .. 05-05-2015 12:41 <DIR> boinc_dfa50baba0af731e 05-05-2015 12:41 0 boinc_lockfile 06-05-2015 09:16 508 boinc_task_state.xml 05-05-2015 12:41 9.251 init_data.xml 06-05-2015 07:24 9.149 stderr.txt 05-05-2015 12:41 73.010 VBox.log 05-05-2015 08:39 102 vboxwrapper_26165_windows_x86_64.exe 05-05-2015 08:40 102 vboxwrapper_26165_windows_x86_64.pdb 06-05-2015 09:16 217 vbox_checkpoint.xml 05-05-2015 08:39 85 vbox_job.xml 05-05-2015 08:40 69 vbox_remote_desktop.xml 05-05-2015 08:40 53 vbox_webapi.xml 05-05-2015 10:45 28.672 vm_floppy_0.img 05-05-2015 10:45 2.219.835.392 vm_image.vdi 13 bestand(en) 2.219.956.610 bytes Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e 05-05-2015 12:41 <DIR> . 05-05-2015 12:41 <DIR> .. 05-05-2015 12:41 19.016 boinc_dfa50baba0af731e.vbox 05-05-2015 12:41 19.016 boinc_dfa50baba0af731e.vbox-prev 05-05-2015 12:41 <DIR> Logs 05-05-2015 12:41 <DIR> Snapshots 2 bestand(en) 38.032 bytes Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e\Logs 05-05-2015 12:41 <DIR> . 05-05-2015 12:41 <DIR> .. 05-05-2015 12:41 73.010 VBox.log 05-05-2015 12:39 92.068 VBox.log.1 05-05-2015 10:45 93.041 VBox.log.2 05-05-2015 12:42 390.350 VBoxStartup.log 4 bestand(en) 648.469 bytes Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e\Snapshots 05-05-2015 12:41 <DIR> . 05-05-2015 12:41 <DIR> .. 05-05-2015 10:45 389.243.928 2015-05-05T08-45-12-419914300Z.sav 05-05-2015 21:15 348.127.232 {1ad7d725-67f3-41da-8324-2d472759e787}.vdi 2 bestand(en) 737.371.160 bytes Totaal aantal weergegeven bestanden: 21 bestand(en) 2.958.014.271 bytes 11 map(pen) 694.943.940.608 bytes beschikbaar VBox 4.3.26; Win7-64; no service install. Next step: I'll create an error result :( |
©2025 CERN