Message boards : Theory Application : DISK_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 4128 - Posted: 19 Sep 2016, 6:49:06 UTC
Last modified: 19 Sep 2016, 6:55:27 UTC

My last 3 Theory tasks, on 3 separate hosts, have ended with this error:

259960
259772
259685

Off to work so no time this morning for further investigation. Will check in later.
ID: 4128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 358
Message 4129 - Posted: 19 Sep 2016, 8:12:38 UTC - in response to Message 4128.  
Last modified: 19 Sep 2016, 8:13:03 UTC

My last 3 Theory tasks, on 3 separate hosts, have ended with this error:

259960
259772
259685

Off to work so no time this morning for further investigation. Will check in later.

259960: Peak disk usage 7,651.36 MB
259772: Peak disk usage 8,038.34 MB
259685: Peak disk usage 7,840.43 MB
Please let us know, what's in BOINC's Event log.
Maybe you have to search in stdoutdae.txt or stdoutdae.old.
ID: 4129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 4130 - Posted: 19 Sep 2016, 10:43:40 UTC - in response to Message 4129.  

Won't be able to check the logs till I get home, by which time the relevant entries will likely be long gone. I do see tasks successfully completed today so possibly my own fault for micromanaging Boinc over the weekend with quite a few stop/starts to allow priority to Sixtrack work. (Maybe VMs didn't properly tidy up checkpoints or something, allowing the space used to grow above limit.)
Coming up to 12 years there so I like to catch any work that's intermittently available.
ID: 4130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 358
Message 4131 - Posted: 19 Sep 2016, 14:36:45 UTC - in response to Message 4130.  

The maximum allowed used disk space for 1 slot including subfolders is 8.000.000.000 bytes equals 7629.39453125 MB.
ID: 4131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 4134 - Posted: 19 Sep 2016, 19:07:51 UTC
Last modified: 19 Sep 2016, 19:31:40 UTC

Didn't fiddle with this one at all as I've been out all day. A Benchmark did go through which may have swapped out the Theory one.

19-Sep-2016 18:03:11 [vLHCathome-dev] Aborting task Theory_14915_1474113130.683715_0: exceeded disk limit: 9296.32MB > 7629.39MB
19-Sep-2016 18:03:11 [vLHCathome-dev] [task] task_state=ABORT_PENDING for Theory_14915_1474113130.683715_0 from request_abort
19-Sep-2016 18:03:11 [---] request_abort(): PID 1064 has 1 descendants
19-Sep-2016 18:03:11 [---] PID 4740
19-Sep-2016 18:03:11 [vLHCathome-dev] [task] result state=COMPUTE_ERROR for Theory_14915_1474113130.683715_0 from CS::report_result_error
19-Sep-2016 18:03:11 [vLHCathome-dev] [task] result state=ABORTED for Theory_14915_1474113130.683715_0 from abort_task
19-Sep-2016 18:03:34 [vLHCathome-dev] [task] Process for Theory_14915_1474113130.683715_0 exited, exit code 194, task state 5
19-Sep-2016 18:03:34 [vLHCathome-dev] [task] task_state=ABORTED for Theory_14915_1474113130.683715_0 from handle_exited_app
19-Sep-2016 18:03:34 [vLHCathome-dev] Computation for task Theory_14915_1474113130.683715_0 finished
19-Sep-2016 18:03:34 [vLHCathome-dev] [task] result state=COMPUTE_ERROR for Theory_14915_1474113130.683715_0 from CS::app_finished

The other host had 2x 2-core Theorys, both just short of 5GB after 6 hours so I have gracefully ended them. Even the errored tasks seem to have been returning jobs and adding to McPlots so there can't be much wrong other than the VM getting too big for its slot.

Could be coincidental but the problem started since migrating to the new https url. Maybe something didn't enjoy the change or got corrupted along the way. I have reset the project on two hosts to get all new clean files (keeping my own app_config, which has worked well up to now) and will do the 3rd when it finishes its current Alice. So we'll see in the morning whether or not I'm back in action.
ID: 4134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 358
Message 4135 - Posted: 19 Sep 2016, 19:53:43 UTC
Last modified: 19 Sep 2016, 20:19:14 UTC

Just testing your config with a dual core VM.
After running 6 hours the total contents of the slot incl subdirs is 5.022.515.200 bytes.
An optional snapshot will add about 400 MB to it.

On the production project I also found tasks errorring out with the EXIT_DISK_LIMIT_EXCEEDED as exit status.

Example of Peak disk usage 11,177.55 MB: http://lhcathome2.cern.ch/vLHCathome/result.php?resultid=6493208
ID: 4135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 4139 - Posted: 20 Sep 2016, 21:12:11 UTC

Thanks for looking, CP.
Could have been the dodgy Cern router not allowing the VMs to send back their data, letting it build up too far, or Boinc leaving them suspended too long while running Sixtrack tasks?
Anyway, after the project reset all seems fine again with the VMs staying well within limits.

I did have to gracefully end 2 tasks today that had finished their jobs but wouldn't self-terminate (I didn't save the console but there were multiple messages about something having expired). Again the VMs on that host had been swapped out to run Sixtrack, possibly for too long (4hrs then 2hrs soon after) although I thought the suspend/resume had already been successfully tested over 8+ hours. (I might have forgotten to set LAIM back on yesterday, (oops) which won't have helped.)
ID: 4139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : DISK_LIMIT_EXCEEDED


©2024 CERN