Message boards : Number crunching : exceeded disk limit
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,602,269
RAC: 1,722
Message 241 - Posted: 10 Apr 2015, 20:59:29 UTC
Last modified: 10 Apr 2015, 21:06:28 UTC

4/10/2015 1:10:30 PM | ATLAS@home | Aborting task 40GKDmnAi0lnDDn7oo6G73TpABFKDmABFKDmiiLKDmABFKDmhOKPIn_2: exceeded disk limit: 6528.21MB > 4291.53MB


First CMS task ran fine while at the same time I was running Einstein GPU X2,vLHC X2,and Atlas X2

Second CMS task it would only let me run E-GPU X1 and vLHC X2 and Atlas X2

Third CMS task it will only let me run vLHC X2 and Einstein GPU X1

No longer can run any Atlas at the same time.

Never had this happen before.

Is that why not many members did more than one CMS task before they quit?
Mad Scientist For Life
ID: 241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LCB001

Send message
Joined: 5 Apr 15
Posts: 3
Credit: 1,606,870
RAC: 0
Message 242 - Posted: 10 Apr 2015, 21:42:40 UTC - in response to Message 241.  

CMS-dev workunits like to leave their disk images behind, check your slots there's probably one or more with only a .vdi left in them.

If you delete the vdi you should be good to go again.

Why this causes other projects to get that exceeded disk space error even if Boinc has lot's of available space left I don't know but it's very annoying...
ID: 242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,602,269
RAC: 1,722
Message 243 - Posted: 10 Apr 2015, 23:17:39 UTC - in response to Message 242.  
Last modified: 10 Apr 2015, 23:18:13 UTC

Yes I always image *remove* vdi's on all of my hosts since I am a long time T4T/vLHC member and the same with Atlas so I am used to doing that on my 6 hosts.

I only have CMS running on my 8-core and it is my only SSD

Of course I also run all of mine 24/7 and in my preferences have them all set at no limit.

Strange that it worked normal during the first CMS and has got worse the following 2 tasks here.

But then I did not have any problem getting the first 2 tasks here complete and validated.
ID: 243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LCB001

Send message
Joined: 5 Apr 15
Posts: 3
Credit: 1,606,870
RAC: 0
Message 244 - Posted: 11 Apr 2015, 0:59:47 UTC - in response to Message 243.  

My first CMS-devs were also ran on a rig using an SSD and I had the same result.

I thought it might be due lack of space on the ssd and so switched CMS over to two other dualie rigs that use hdds.

What was puzzling was it still happened on one of those even though Boinc has 180GB+ of space available.

This rig was also running 2x Atlas, 2x Edge and 8x SRBase wus plus an Einstein on the gpu.

Once I removed the leftover vdi it has been back to zero errors.

The second rig had no leftovers and no disk space error problems.

Both are due to finish their second CMSs tomorrow and it will be interesting to see what happens then.

I also run T4T, vLHC and Atlas and haven't had this particular problem with them.

I just find it weird that the CMS wus seem to keep running fine but yet cause problems for wus from other projects...
ID: 244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 246 - Posted: 11 Apr 2015, 6:08:05 UTC - in response to Message 241.  
Last modified: 11 Apr 2015, 6:22:04 UTC

4/10/2015 1:10:30 PM | ATLAS@home | Aborting task 40GKDmnAi0lnDDn7oo6G73TpABFKDmABFKDmiiLKDmABFKDmhOKPIn_2: exceeded disk limit: 6528.21MB > 4291.53MB


First CMS task ran fine while at the same time I was running Einstein GPU X2,vLHC X2,and Atlas X2

Second CMS task it would only let me run E-GPU X1 and vLHC X2 and Atlas X2

Third CMS task it will only let me run vLHC X2 and Einstein GPU X1

No longer can run any Atlas at the same time.

Never had this happen before.


Hello, New Member too.
My 1st job failed as well. and a bunch of Sixtrack and SETI bombed out with Disk Usage Exceeded error too.
Job No2 is running so far...

[edit]The VDI has grown to over 4Gb, and in the VM the loadav has grown to about 8, with each task now taking over 30 mins.[/edit]
ID: 246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 249 - Posted: 11 Apr 2015, 7:08:43 UTC - in response to Message 246.  

The VDI has grown to over 4Gb, and in the VM the loadav has grown to about 8, with each task now taking over 30 mins.

In fact the VM has now stalled using 1 CPU core to 100%, the console shows a frozen TOP and Apache wont connect.
Lets try 26125
ID: 249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 257 - Posted: 16 Apr 2015, 8:40:54 UTC

14 errors for me on this host since joining CMS-dev a couple of days ago.
Sixtrack wu fails with LHC@home 1.0 | Aborting task.....exceeded disk limit: 6860.42MB > 572.20MB before it gets a chance to start.

CMS itself appears to run fine and I'm not seeing any debris left behind after it finishes.

Don't know how or why there should be this interaction but others have reported similar here, and further down this thread.

No interference with vLHC VM but I'm only allowing 1 of them to run just now while there's Sixtrack work available.
Currently running:
1 x CMS
1 x vLHC
2 x Sixtrack

8GB RAM with another 4GB allocated on a ReadyBoost USB stick.
Boinc 7.4.42
VBox 4.3.26
ID: 257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 258 - Posted: 16 Apr 2015, 19:56:33 UTC

For my first CMS unit, when it completed I noticed it caused one my three running ATLAS tasks to error-out with a "computation error". The ATLAS output message is as follows:

Stderr output
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>

</stderr_txt>
]]>
ID: 258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 259 - Posted: 16 Apr 2015, 23:48:25 UTC - in response to Message 258.  

I am also getting a bunch of these ATLAS errors now:

Stderr output
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
2015-04-16 18:04:25 (12728): vboxwrapper (7.5.26110): starting
2015-04-16 18:04:25 (12728): Feature: Checkpoint interval offset (391 seconds)
2015-04-16 18:04:26 (12728): Detected: VirtualBox 4.3.26r98988
2015-04-16 18:04:26 (12728): Detected: Minimum checkpoint interval (900.000000 seconds)
2015-04-16 18:04:26 (12728): successfully copied 'init_data.xml' to the shared directory.
2015-04-16 18:04:34 (12728): Create VM. (boinc_d5363a11d7601e3a, slot#1)
2015-04-16 18:04:34 (12728): Updating drive controller type and model for desired configuration.
2015-04-16 18:04:35 (12728): Setting CPU Count for VM. (1)
2015-04-16 18:04:35 (12728): Setting Memory Size for VM. (2048MB)
2015-04-16 18:04:35 (12728): Setting Chipset Options for VM.
2015-04-16 18:04:35 (12728): Setting Boot Options for VM.
2015-04-16 18:04:36 (12728): Setting Network Configuration for NAT.
2015-04-16 18:04:36 (12728): Disabling USB Support for VM.
2015-04-16 18:04:36 (12728): Disabling COM Port Support for VM.
2015-04-16 18:04:36 (12728): Disabling LPT Port Support for VM.
2015-04-16 18:04:37 (12728): Disabling Audio Support for VM.
2015-04-16 18:04:37 (12728): Disabling Clipboard Support for VM.
2015-04-16 18:04:37 (12728): Disabling Drag and Drop Support for VM.
2015-04-16 18:04:38 (12728): Adding storage controller to VM.
2015-04-16 18:04:38 (12728): Adding virtual ISO 9660 disk drive to VM. (vm_isocontext.iso)
2015-04-16 18:04:38 (12728): Adding VirtualBox Guest Additions to VM.
2015-04-16 18:04:38 (12728): Adding virtual cache disk drive to VM. (vm_cache.vdi)
2015-04-16 18:04:39 (12728): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2015-04-16 18:04:39 (12728): Enabling network access for VM.
2015-04-16 18:04:39 (12728): forwarding host port 52085 to guest port 80
2015-04-16 18:04:39 (12728): Enabling remote desktop for VM.
2015-04-16 18:04:40 (12728): Enabling shared directory for VM.
2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log used.
2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log Not Found.
2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log used.
2015-04-16 18:04:40 (12728): WARNING: Stale VirtualBox VM Log Not Found.
2015-04-16 18:04:40 (12728): Starting VM.

</stderr_txt>
]]>

Guess I will need to clean up my VM, will that help?
ID: 259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 260 - Posted: 17 Apr 2015, 5:35:10 UTC - in response to Message 259.  

Guess I will need to clean up my VM, will that help?

Also have a look in BOINCData slot directories for unwanted files.
ID: 260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 262 - Posted: 18 Apr 2015, 17:15:26 UTC

Follow-up on my sixtrack exceeded-space errors below:
I have noticed that when CMS finishes, it doesn't quite do a full cleanup and leaves a disk image .vdi in the slot. I guess that Sixtrack tries to use that slot, thinking it is empty but finds the .vdi which pushes it over the permitted size, causing the error.
I have reset LHC 1.0 just in case anything else may have become corrupted but I'll need to wait for the next batch of work to see if that fixes it. Until a fix can be found I'll only let CMS and sixtrack run when the other isn't and will delete the relevant slot to clear the debris when required.
ID: 262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 311 - Posted: 4 May 2015, 20:24:32 UTC - in response to Message 310.  

I've just posted this over at Einstein in the hope that my observations will get to whomever can make use of them.
ID: 311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 312 - Posted: 5 May 2015, 9:33:10 UTC

A 'private drop' (bare BOINC client, no installer, place in existing BOINC program directory) has been made available for people willing to test this. I believe it will also have the extra <slot_debug> logging that David mentioned.

x86:
http://boinc.berkeley.edu/dl/boinc.040515.x86.zip

x64:
http://boinc.berkeley.edu/dl/boinc.040515.x64.zip

If you use it, please enable the extra logging and report back (whether success or failure) - we're looking for proper deletion of the .vdi file in all cases.
ID: 312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 313 - Posted: 5 May 2015, 10:43:26 UTC - in response to Message 312.  

If you use it, please enable the extra logging and report back (whether success or failure) - we're looking for proper deletion of the .vdi file in all cases.

1 05 May 12:41:16 Starting BOINC client version 7.5.1 for windows_x86_64
2 05 May 12:41:16 This a development version of BOINC and may not function properly
3 05 May 12:41:16 log flags: file_xfer, sched_ops, task, slot_debug
40 05 May 12:41:16 Version change (7.4.42 -> 7.5.1)
81 CMS-dev 05 May 12:41:38 task CMS_30750_1427806618.586224_0 resumed by user
82 05 May 12:41:39 [slot] removed file slots/0/init_data.xml
83 05 May 12:41:39 [slot] removed file slots/0/boinc_temporary_exit
ID: 313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 314 - Posted: 5 May 2015, 10:55:11 UTC - in response to Message 313.  

That seems normal - I'm seeing init_data.xml being removed and replaced regularly with earlier versions too. The interesting thing will be to compare the list of files deleted at the end of the run, between the earlier versions (which I will post - later today) and this private drop version - and to compare both with what happens in the slot directory, of course.
ID: 314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 315 - Posted: 5 May 2015, 12:40:09 UTC - in response to Message 314.  

Btw: I'm running vboxwrapper 26166 in stead of the stock application 26165 and have set in the vbox job xml the flags
<enable_vm_savestate_usage/>
<disable_automatic_checkpoints/>


However I made 1 snapshot myself between switching BOINC releases.
I've seen with vLHC that those manual snapshots are properly cleared when the task finishes.
ID: 315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 316 - Posted: 5 May 2015, 16:02:21 UTC

OK, my first task finished, running under the previous pre-release BOINC v7.5.0

05/05/2015 16:51:14 | CMS-dev | Message from task: 0
05/05/2015 16:51:14 | | [slot] cleaning out slots/6: handle_exited_app()
05/05/2015 16:51:14 | | [slot] removed file slots/6/boinc_finish_called
05/05/2015 16:51:14 | | [slot] removed file slots/6/boinc_task_state.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/init_data.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/output
05/05/2015 16:51:14 | | [slot] removed file slots/6/stderr.txt
05/05/2015 16:51:14 | | [slot] removed file slots/6/VBox.log
05/05/2015 16:51:14 | | [slot] removed file slots/6/vboxwrapper_26165_windows_x86_64.exe
05/05/2015 16:51:14 | | [slot] removed file slots/6/vboxwrapper_26165_windows_x86_64.pdb
05/05/2015 16:51:14 | | [slot] removed file slots/6/vbox_checkpoint.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/vbox_job.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/vbox_remote_desktop.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/vbox_webapi.xml
05/05/2015 16:51:14 | | [slot] removed file slots/6/vm_floppy_6.img
05/05/2015 16:51:14 | | [slot] removed file slots/6/vm_image.vdi
05/05/2015 16:51:14 | CMS-dev | Computation for task CMS_30526_1427806613.426709_0 finished
05/05/2015 16:51:14 | | [slot] cleaning out slots/6: get_free_slot()

It looks like a proper match with the file list for the directory while the task was running:

D:\BOINCdata\slots\6\boinc_lockfile
D:\BOINCdata\slots\6\boinc_task_state.xml
D:\BOINCdata\slots\6\init_data.xml
D:\BOINCdata\slots\6\stderr.txt
D:\BOINCdata\slots\6\VBox.log
D:\BOINCdata\slots\6\vboxwrapper_26165_windows_x86_64.exe
D:\BOINCdata\slots\6\vboxwrapper_26165_windows_x86_64.pdb
D:\BOINCdata\slots\6\vbox_checkpoint.xml
D:\BOINCdata\slots\6\vbox_job.xml
D:\BOINCdata\slots\6\vbox_remote_desktop.xml
D:\BOINCdata\slots\6\vbox_webapi.xml
D:\BOINCdata\slots\6\vm_floppy_6.img
D:\BOINCdata\slots\6\vm_image.vdi

and yes, vm_image.vdi was removed as it should have been. Rats. OK, upgrading to the private drop for the next one.
ID: 316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 317 - Posted: 5 May 2015, 18:44:05 UTC - in response to Message 316.  
Last modified: 5 May 2015, 19:40:06 UTC

Upgraded to 7.5.1 and set the slots flag and get similar output to Richard with all files deleted as they should. (As soon as you take your car to a mechanic, the rattle stops!)

Maybe something else has changed between 7.4.42 and 7.5.1 which has unintentionally fixed the slot clean-out error?

Task on the other machine is due to finish in an hour or so, so I'll watch it finish as well.

[Afterthought]
When the second one finishes, I'll go back to the earlier version but will leave the .pdb in the Boinc folder to see if it will produce any different output when the next one finishes.
ID: 317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 318 - Posted: 5 May 2015, 22:39:29 UTC

My second one completed as it should, cleaning up all files and not causing any errors for the following task.

So, who's seeing these errors, and what's the common factor?

Version of VBox?
Version of Windows?
Running as a service?

In my case - 4.3.26, 7, no.
ID: 318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 319 - Posted: 6 May 2015, 7:28:37 UTC - in response to Message 318.  
Last modified: 6 May 2015, 7:54:43 UTC

This is what programmers love: Error happening every than, but not now ;)
Task finished after 24½ hours wall time leaving an empty map. From the messages:

CMS-dev 06 May 09:18:28 Message from task: 0
06 May 09:18:28 [slot] cleaning out slots/0: handle_exited_app()
06 May 09:18:28 [slot] removed file slots/0/boinc_finish_called
06 May 09:18:28 [slot] removed file slots/0/boinc_task_state.xml
06 May 09:18:28 [slot] removed file slots/0/init_data.xml
06 May 09:18:28 [slot] removed file slots/0/output
06 May 09:18:28 [slot] removed file slots/0/stderr.txt
06 May 09:18:28 [slot] removed file slots/0/VBox.log
06 May 09:18:28 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe
06 May 09:18:28 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb
06 May 09:18:28 [slot] removed file slots/0/vbox_checkpoint.xml
06 May 09:18:28 [slot] removed file slots/0/vbox_job.xml
06 May 09:18:28 [slot] removed file slots/0/vbox_remote_desktop.xml
06 May 09:18:28 [slot] removed file slots/0/vbox_webapi.xml
06 May 09:18:28 [slot] removed file slots/0/vm_floppy_0.img
06 May 09:18:28 [slot] removed file slots/0/vm_image.vdi
CMS-dev 06 May 09:18:28 Computation for task CMS_30750_1427806618.586224_0 finished


Contents of slot 0 including subdirs just before finish:
Map van D:\Boinc1\slots\0

06-05-2015 07:13 <DIR> .
06-05-2015 07:13 <DIR> ..
05-05-2015 12:41 <DIR> boinc_dfa50baba0af731e
05-05-2015 12:41 0 boinc_lockfile
06-05-2015 09:16 508 boinc_task_state.xml
05-05-2015 12:41 9.251 init_data.xml
06-05-2015 07:24 9.149 stderr.txt
05-05-2015 12:41 73.010 VBox.log
05-05-2015 08:39 102 vboxwrapper_26165_windows_x86_64.exe
05-05-2015 08:40 102 vboxwrapper_26165_windows_x86_64.pdb
06-05-2015 09:16 217 vbox_checkpoint.xml
05-05-2015 08:39 85 vbox_job.xml
05-05-2015 08:40 69 vbox_remote_desktop.xml
05-05-2015 08:40 53 vbox_webapi.xml
05-05-2015 10:45 28.672 vm_floppy_0.img
05-05-2015 10:45 2.219.835.392 vm_image.vdi
13 bestand(en) 2.219.956.610 bytes

Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e

05-05-2015 12:41 <DIR> .
05-05-2015 12:41 <DIR> ..
05-05-2015 12:41 19.016 boinc_dfa50baba0af731e.vbox
05-05-2015 12:41 19.016 boinc_dfa50baba0af731e.vbox-prev
05-05-2015 12:41 <DIR> Logs
05-05-2015 12:41 <DIR> Snapshots
2 bestand(en) 38.032 bytes

Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e\Logs

05-05-2015 12:41 <DIR> .
05-05-2015 12:41 <DIR> ..
05-05-2015 12:41 73.010 VBox.log
05-05-2015 12:39 92.068 VBox.log.1
05-05-2015 10:45 93.041 VBox.log.2
05-05-2015 12:42 390.350 VBoxStartup.log
4 bestand(en) 648.469 bytes

Map van D:\Boinc1\slots\0\boinc_dfa50baba0af731e\Snapshots

05-05-2015 12:41 <DIR> .
05-05-2015 12:41 <DIR> ..
05-05-2015 10:45 389.243.928 2015-05-05T08-45-12-419914300Z.sav
05-05-2015 21:15 348.127.232 {1ad7d725-67f3-41da-8324-2d472759e787}.vdi
2 bestand(en) 737.371.160 bytes

Totaal aantal weergegeven bestanden:
21 bestand(en) 2.958.014.271 bytes
11 map(pen) 694.943.940.608 bytes beschikbaar


VBox 4.3.26; Win7-64; no service install.

Next step: I'll create an error result :(
ID: 319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : exceeded disk limit


©2024 CERN