Message boards : Number crunching : exceeded disk limit
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 320 - Posted: 6 May 2015, 8:13:14 UTC - in response to Message 319.  

VBox 4.3.26; Win7-64; no service install.

Next step: I'll create an error result :(

Slot properly cleaned.

109 CMS-dev 06 May 09:54:59 Scheduler request completed: got 1 new tasks
110 06 May 09:55:01 [slot] cleaning out slots/0: get_free_slot()
111 CMS-dev 06 May 09:55:01 [slot] assigning slot 0 to CMS_30830_1427806620.502770_0
112 CMS-dev 06 May 09:55:01 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/vboxwrapper_26165_windows_x86_64.exe to slots/0/vboxwrapper_26165_windows_x86_64.exe
113 CMS-dev 06 May 09:55:01 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/CMS_26_03_2015a.xml to slots/0/vbox_job.xml
114 CMS-dev 06 May 09:55:01 Starting task CMS_30830_1427806620.502770_0
115 06 May 09:55:56 [slot] removed file slots/0/init_data.xml
116 CMS-dev 06 May 09:55:56 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/vboxwrapper_26165_windows_x86_64.pdb to slots/0/vboxwrapper_26165_windows_x86_64.pdb
117 CMS-dev 06 May 10:08:20 task CMS_30830_1427806620.502770_0 suspended by user
118 CMS-dev 06 May 10:09:04 task CMS_30830_1427806620.502770_0 resumed by user
119 06 May 10:09:05 [slot] removed file slots/0/init_data.xml
120 06 May 10:09:05 [slot] removed file slots/0/boinc_temporary_exit
121 06 May 10:09:06 [slot] cleaning out slots/0: handle_exited_app()
122 06 May 10:09:06 [slot] removed file slots/0/boinc_c622628eff463924/boinc_c622628eff463924.vbox
123 06 May 10:09:06 [slot] removed file slots/0/boinc_c622628eff463924/boinc_c622628eff463924.vbox-prev
124 06 May 10:09:06 [slot] removed file slots/0/boinc_c622628eff463924/Logs/VBox.log
125 06 May 10:09:06 [slot] removed file slots/0/boinc_c622628eff463924/Logs/VBoxStartup.log
126 06 May 10:09:06 [slot] removed file slots/0/boinc_c622628eff463924/Snapshots/2015-05-06T08-08-22-948266400Z.sav
127 06 May 10:09:06 [slot] removed file slots/0/boinc_lockfile
128 06 May 10:09:06 [slot] removed file slots/0/boinc_task_state.xml
129 06 May 10:09:06 [slot] removed file slots/0/init_data.xml
130 06 May 10:09:06 [slot] removed file slots/0/stderr.txt
131 06 May 10:09:06 [slot] removed file slots/0/VBox.log
132 06 May 10:09:06 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe
133 06 May 10:09:06 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb
134 06 May 10:09:06 [slot] removed file slots/0/vbox_checkpoint.xml
135 06 May 10:09:06 [slot] removed file slots/0/vbox_job.xml
136 06 May 10:09:06 [slot] removed file slots/0/vbox_remote_desktop.xml
137 06 May 10:09:06 [slot] removed file slots/0/vbox_webapi.xml
138 06 May 10:09:06 [slot] removed file slots/0/vm_floppy_0.img
139 06 May 10:09:06 [slot] removed file slots/0/vm_image.vdi
140 CMS-dev 06 May 10:09:06 Computation for task CMS_30830_1427806620.502770_0 finished
ID: 320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 321 - Posted: 6 May 2015, 9:22:53 UTC

All the error message reports that I've seen refer to problems with a .vdi file of 5 GB or more. Neither of the two I've run so far got that large - the second one was a bit short of 4 GB. I don't see why DeleteFile() should have problems with any particular file size, but if there is a boundary, 4 GB sounds like a significant number. I don't know what feature of CMS controls the growth of the .vdi, but that might be something to watch.
ID: 321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 322 - Posted: 6 May 2015, 9:23:48 UTC

Next step: BOINC running as a service and wait:

1 06 May 11:14:13 Starting BOINC client version 7.5.1 for windows_x86_64
2 06 May 11:14:13 This a development version of BOINC and may not function properly
3 06 May 11:14:13 log flags: file_xfer, sched_ops, task, slot_debug
5 06 May 11:14:13 Running as a daemon (GPU computing disabled)
101 CMS-dev 06 May 11:14:52 Scheduler request completed: got 1 new tasks
102 CMS-dev 06 May 11:14:54 [slot] assigning slot 0 to CMS_30682_1427806617.088559_0
103 CMS-dev 06 May 11:14:54 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/vboxwrapper_26165_windows_x86_64.exe to slots/0/vboxwrapper_26165_windows_x86_64.exe
104 CMS-dev 06 May 11:14:54 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/CMS_26_03_2015a.xml to slots/0/vbox_job.xml
105 CMS-dev 06 May 11:14:54 Starting task CMS_30682_1427806617.088559_0
106 06 May 11:14:56 [slot] removed file slots/0/init_data.xml
107 CMS-dev 06 May 11:14:56 [slot] linked ../../projects/boincai05.cern.ch_CMS-dev/vboxwrapper_26165_windows_x86_64.pdb to slots/0/vboxwrapper_26165_windows_x86_64.pdb
ID: 322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 323 - Posted: 6 May 2015, 9:30:44 UTC - in response to Message 321.  
Last modified: 6 May 2015, 9:38:19 UTC

All the error message reports that I've seen refer to problems with a .vdi file of 5 GB or more. Neither of the two I've run so far got that large - the second one was a bit short of 4 GB. I don't see why DeleteFile() should have problems with any particular file size, but if there is a boundary, 4 GB sounds like a significant number. I don't know what feature of CMS controls the growth of the .vdi, but that might be something to watch.

I've noted that too and mentioned it in the WCG-forums and you may probably have seen that.
http://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=491651

That's also the reason why I ran my first test 24 hours to let the vdi grow, but the filesize was only 2.219.835.392 bytes.
ID: 323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 324 - Posted: 6 May 2015, 10:13:58 UTC - in response to Message 322.  

06 May 11:14:13 Running as a daemon (GPU computing disabled)

Yea! That'll cut down on the help-desk workload.
ID: 324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 325 - Posted: 6 May 2015, 12:33:47 UTC

Interesting! Not the vm_image.vdi left in the slot directory, but stderr.txt:

124 CMS-dev 06 May 14:22:48 Message from task: 0
125 06 May 14:22:48 [slot] cleaning out slots/0: handle_exited_app()
126 06 May 14:22:48 [slot] removed file slots/0/boinc_finish_called
127 06 May 14:22:48 [slot] removed file slots/0/boinc_task_state.xml
128 06 May 14:22:48 [slot] removed file slots/0/init_data.xml
129 06 May 14:22:48 [slot] removed file slots/0/output
130 06 May 14:22:48 [slot] failed to remove file slots/0/stderr.txt: unlink() failed
131 06 May 14:22:48 [slot] removed file slots/0/VBox.log
132 06 May 14:22:48 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe
133 06 May 14:22:48 [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb
134 06 May 14:22:48 [slot] removed file slots/0/vbox_checkpoint.xml
135 06 May 14:22:48 [slot] removed file slots/0/vbox_job.xml
136 06 May 14:22:48 [slot] removed file slots/0/vbox_remote_desktop.xml
137 06 May 14:22:48 [slot] removed file slots/0/vbox_webapi.xml
138 06 May 14:22:48 [slot] removed file slots/0/vm_floppy_0.img
139 06 May 14:22:48 [slot] removed file slots/0/vm_image.vdi
140 CMS-dev 06 May 14:22:48 Computation for task CMS_30682_1427806617.088559_0 finished
141 CMS-dev 06 May 14:22:52 Sending scheduler request: To report completed tasks.
142 CMS-dev 06 May 14:22:52 Reporting 1 completed tasks
143 CMS-dev 06 May 14:22:52 Not requesting tasks: "no new tasks" requested via Manager
144 CMS-dev 06 May 14:22:54 Scheduler request completed

VBox 4.3.26; Win7-64; BOINC installed as service.
ID: 325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 326 - Posted: 6 May 2015, 12:45:18 UTC - in response to Message 325.  

Verrrrrrrrry interesting. There's been a lot of speculation about that at SETI - this may be a smoking gun.

I wonder why just that one file. Have you still got it, and does it contain anything significant?
ID: 326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 327 - Posted: 6 May 2015, 13:15:30 UTC - in response to Message 326.  

Verrrrrrrrry interesting. There's been a lot of speculation about that at SETI - this may be a smoking gun.

I wonder why just that one file. Have you still got it, and does it contain anything significant?

I saved that file, but the contents is just what's in result's Stderr output.

After restarting the BOINC service:

06 May 14:48:08 [slot] cleaning out slots/0: delete old slot dirs
06 May 14:48:08 [slot] removed file slots/0/stderr.txt


I fetched a new CMS-task using an extended VM-size (1.8GB) and running Rom's newest vboxwrapper 26167.
Maybe I need several steps to approach the 4GB.
ID: 327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,942,314
RAC: 3,195
Message 328 - Posted: 6 May 2015, 13:40:13 UTC - in response to Message 321.  

All the error message reports that I've seen refer to problems with a .vdi file of 5 GB or more. Neither of the two I've run so far got that large - the second one was a bit short of 4 GB. I don't see why DeleteFile() should have problems with any particular file size, but if there is a boundary, 4 GB sounds like a significant number. I don't know what feature of CMS controls the growth of the .vdi, but that might be something to watch.

My impression is that it's the result files being held on "disk" that cause the growth of the VM. I suppose you've noticed Richard that the VM has a 10 GB limit, which caused me some loss of hair first time I encountered it, because my BOINC limits were way above the 10 GB it was claiming as a limit. Notice the blue line, where a >5 GB VM was persisting in the slot directory, causing the task to fail at 10 GB.

ID: 328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 331 - Posted: 6 May 2015, 15:15:01 UTC

Jason suggests that for Windows, the DeleteFile() API call - which underpins all BOINC's cleansing calls - may return 'success' (file queued for deletion), but with a large file, subsequent calls by other threads - e.g. the disk limit check by the next task - may find that it is still present pending the completion of other OS housekeeping operations.

I'm not sure that this would account for cases like ritterm's, where - if I read him right - the file was still visible for manual deletion some hours or days later.
ID: 331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 332 - Posted: 6 May 2015, 21:11:02 UTC

Reproducable. BOINC running as a service.

06-May-2015 23:06:30 [CMS-dev] Message from task: 0
06-May-2015 23:06:30 [---] [slot] cleaning out slots/0: handle_exited_app()
06-May-2015 23:06:30 [---] [slot] removed file slots/0/boinc_finish_called
06-May-2015 23:06:30 [---] [slot] removed file slots/0/boinc_task_state.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/init_data.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/output
06-May-2015 23:06:30 [---] [slot] failed to remove file slots/0/stderr.txt: unlink() failed
06-May-2015 23:06:30 [---] [slot] removed file slots/0/VBox.log
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vbox_checkpoint.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vbox_job.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vbox_remote_desktop.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vbox_webapi.xml
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vm_floppy_0.img
06-May-2015 23:06:30 [---] [slot] removed file slots/0/vm_image.vdi
06-May-2015 23:06:30 [CMS-dev] Computation for task CMS_30819_1427806620.244734_0 finished
ID: 332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 333 - Posted: 7 May 2015, 7:00:49 UTC

Returned a next task, but now BOINC not running as a service again.
Everything in the slot was cleaned. The vdi-file was almost 2GB.

07-May-2015 08:52:24 [CMS-dev] Message from task: 0
07-May-2015 08:52:24 [---] [slot] cleaning out slots/0: handle_exited_app()
07-May-2015 08:52:24 [---] [slot] removed file slots/0/boinc_finish_called
07-May-2015 08:52:24 [---] [slot] removed file slots/0/boinc_task_state.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/init_data.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/output
07-May-2015 08:52:24 [---] [slot] removed file slots/0/stderr.txt
07-May-2015 08:52:24 [---] [slot] removed file slots/0/VBox.log
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.exe
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vboxwrapper_26165_windows_x86_64.pdb
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vbox_checkpoint.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vbox_job.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vbox_remote_desktop.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vbox_webapi.xml
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vm_floppy_0.img
07-May-2015 08:52:24 [---] [slot] removed file slots/0/vm_image.vdi
07-May-2015 08:52:24 [CMS-dev] Computation for task CMS_30674_1427806616.874120_0 finished
ID: 333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,942,314
RAC: 3,195
Message 334 - Posted: 7 May 2015, 14:56:06 UTC - in response to Message 331.  

Jason suggests that for Windows, the DeleteFile() API call - which underpins all BOINC's cleansing calls - may return 'success' (file queued for deletion), but with a large file, subsequent calls by other threads - e.g. the disk limit check by the next task - may find that it is still present pending the completion of other OS housekeeping operations.

I'm not sure that this would account for cases like ritterm's, where - if I read him right - the file was still visible for manual deletion some hours or days later.

Richard, my original instance (which prompted the graphs above) is detailed in the thread VBox Wrappers Updated to 26157. Note especially Message 172 ff.
ID: 334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 335 - Posted: 7 May 2015, 16:51:05 UTC - in response to Message 334.  

Jason suggests that for Windows, the DeleteFile() API call - which underpins all BOINC's cleansing calls - may return 'success' (file queued for deletion), but with a large file, subsequent calls by other threads - e.g. the disk limit check by the next task - may find that it is still present pending the completion of other OS housekeeping operations.

I'm not sure that this would account for cases like ritterm's, where - if I read him right - the file was still visible for manual deletion some hours or days later.

Richard, my original instance (which prompted the graphs above) is detailed in the thread VBox Wrappers Updated to 26157. Note especially Message 172 ff.

Ah - I see you've been round this cycle internally once before. Sorry, I'm a late arrival - I only got involved when '196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED' became a live issue at other projects too, and was eventually tracked back to those same 5.7 GB .vdi leftovers that you saw in the slot directories back in March.

But my question remains - how on earth do those files manage to survive three separate BOINC attempts to avoid them? As I posted over at SETI - read from message 1674370 - responsibility for clearing the slot directories lies with BOINC, and

In theory, according to David's initial response, the current logic specifies:

1) Delete everything in the slot folder on task exit (this failed in the example)
2) Delete everything - i.e. anything remaining - in the slot before reuse
3) Don't reuse the slot if (2) fails

The error we were originally investigating implies that step (3) failed. That may have been because step (2) failed without returning an error ...

The implication of Jason's reply was that Windows' retval from DeleteFile() might be 'success', but in practice signify a queued housekeeping task, to be actioned later. I'm sceptical that the disk latencies involved could be so macroscopic as to cause failures hours or even days later - and if they are, then I suggest that it's important that BOINC hardens its attitude to verification that the assumed OS sub-operation has indeed performed its job as expected.

I became involved in this issue because of errors reported at Milkyway, Einstein and (I discovered later) WCG. As I put in my initial email to boinc_alpha, "Cross-project errors like this, where the behaviour of one project leads to errors for another project, are hard for project staff to analyse and remediate. Would it be wise for BOINC to perform an additional safety check, that a slot directory which it is proposing to re-use is indeed empty as expected?".
ID: 335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 336 - Posted: 7 May 2015, 20:22:03 UTC

I don't know how long the undeleted images would have remained without manual deletion but certainly overnight as I normally check my machines before going to work in the morning and returning in the evening. Restarting Boinc wouldn't clear them and Boinc would continually try to reuse the non-empty slot resulting in a whole batch of failed LHC/Sixtrack tasks which is what prompted my search for a cause. Only manual deletion seemed to resolve the problem and free up the slot for use. (As stated in earlier post, if CMS reused the slot, the debris image would be overwritten by the new image, restarting at the initial size.)

Now that we're watching, the slots are being properly cleaned out on exit. The image is only getting to a little short of 2GB so it may just have been the 5GB size that was causing the issue. Perhaps the file deletion was started but not completed by the time Boinc called exit so the deletion was then cancelled. However, as Richard says, Boinc should notice that the slot is not empty and therefore not try to reuse it, and/or clean out any debris before reallocating that slot.

Maybe not much help, but might point someone in the right direction.
ID: 336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 337 - Posted: 8 May 2015, 18:49:18 UTC

After fours days of watching (not continuously!), we have a winner - or perhaps a loser.

After a seemingly normal run and exit, I was left with

D:\BOINCdata\slots\1>dir
 Volume in drive D is Data
 Volume Serial Number is 7031-B70C

 Directory of D:\BOINCdata\slots\1

08/05/2015  19:27    <DIR>          .
08/05/2015  19:27    <DIR>          ..
08/05/2015  19:27     5,148,508,160 vm_image.vdi
               1 File(s)  5,148,508,160 bytes
               2 Dir(s)  445,699,624,960 bytes free

D:\BOINCdata\slots\1>

- and that's the first vm_image.vdi file over 4GB that I've seem.

Slot cleaning appeared normal:

08/05/2015 19:27:43 | CMS-dev | Message from task: 0
08/05/2015 19:27:43 | | [slot] cleaning out slots/1: handle_exited_app()
08/05/2015 19:27:43 | | [slot] removed file slots/1/boinc_finish_called
08/05/2015 19:27:43 | | [slot] removed file slots/1/boinc_task_state.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/init_data.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/output
08/05/2015 19:27:43 | | [slot] removed file slots/1/stderr.txt
08/05/2015 19:27:43 | | [slot] removed file slots/1/VBox.log
08/05/2015 19:27:43 | | [slot] removed file slots/1/vboxwrapper_26165_windows_x86_64.exe
08/05/2015 19:27:43 | | [slot] removed file slots/1/vboxwrapper_26165_windows_x86_64.pdb
08/05/2015 19:27:43 | | [slot] removed file slots/1/vbox_checkpoint.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/vbox_job.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/vbox_remote_desktop.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/vbox_webapi.xml
08/05/2015 19:27:43 | | [slot] removed file slots/1/vm_floppy_1.img
08/05/2015 19:27:43 | CMS-dev | Computation for task CMS_30776_1427806619.199316_0 finished
08/05/2015 19:27:43 | | [slot] cleaning out slots/1: get_free_slot()
08/05/2015 19:27:43 | NumberFields@home | [slot] assigning slot 1 to wu_sf3_DS-10x271_Grp502611of682667_0
08/05/2015 19:27:43 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/GetDecics_2.00_windows_intelx86 to slots/1/GetDecics_2.00_windows_intelx86
08/05/2015 19:27:43 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/sf3_DS-10x271_Grp502611of682667.dat to slots/1/in
08/05/2015 19:27:43 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/wu_sf3_DS-10x271_Grp502611of682667_0_0 to slots/1/out
08/05/2015 19:27:43 | NumberFields@home | [cpu_sched] Restarting task wu_sf3_DS-10x271_Grp502611of682667_0 using GetDecics version 200 in slot 1
08/05/2015 19:27:44 | NumberFields@home | Aborting task wu_sf3_DS-10x271_Grp502611of682667_0: exceeded disk limit: 4910.01MB > 244.14MB
08/05/2015 19:27:44 | NumberFields@home | [sched_op] Deferring communication for 00:01:23
08/05/2015 19:27:44 | NumberFields@home | [sched_op] Reason: Unrecoverable error for task wu_sf3_DS-10x271_Grp502611of682667_0
08/05/2015 19:27:45 | | [slot] cleaning out slots/1: handle_exited_app()
08/05/2015 19:27:45 | | [slot] removed file slots/1/boinc_lockfile
08/05/2015 19:27:45 | | [slot] removed file slots/1/GetDecics_2.00_windows_intelx86
08/05/2015 19:27:45 | | [slot] removed file slots/1/in
08/05/2015 19:27:45 | | [slot] removed file slots/1/init_data.xml
08/05/2015 19:27:45 | | [slot] removed file slots/1/out
08/05/2015 19:27:45 | | [slot] removed file slots/1/stderr.txt
08/05/2015 19:27:45 | NumberFields@home | Computation for task wu_sf3_DS-10x271_Grp502611of682667_0 finished

- but notice there's no reference to vm_image.vdi in either 'cleaning out slots' loop. 15 minutes later (after drafting the above), the file remained visible to both Command Prompt and Windows Explorer, and could be copied to another folder (taking the length of time you'd expect for a 4GB file).

But it remained invisible to BOINC's clear-out routine:

08/05/2015 19:43:15 | NumberFields@home | task wu_sf3_DS-10x271_Grp502822of682667_0 resumed by user
08/05/2015 19:43:16 | | [slot] cleaning out slots/1: get_free_slot()
08/05/2015 19:43:16 | NumberFields@home | [slot] assigning slot 1 to wu_sf3_DS-10x271_Grp502822of682667_0
08/05/2015 19:43:16 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/GetDecics_2.00_windows_intelx86 to slots/1/GetDecics_2.00_windows_intelx86
08/05/2015 19:43:16 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/sf3_DS-10x271_Grp502822of682667.dat to slots/1/in
08/05/2015 19:43:16 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/wu_sf3_DS-10x271_Grp502822of682667_0_0 to slots/1/out
08/05/2015 19:43:16 | NumberFields@home | [cpu_sched] Restarting task wu_sf3_DS-10x271_Grp502822of682667_0 using GetDecics version 200 in slot 1
08/05/2015 19:43:17 | NumberFields@home | Aborting task wu_sf3_DS-10x271_Grp502822of682667_0: exceeded disk limit: 4910.01MB > 244.14MB
08/05/2015 19:43:17 | NumberFields@home | [sched_op] Deferring communication for 00:01:32
08/05/2015 19:43:17 | NumberFields@home | [sched_op] Reason: Unrecoverable error for task wu_sf3_DS-10x271_Grp502822of682667_0
08/05/2015 19:43:18 | | [slot] cleaning out slots/1: handle_exited_app()
08/05/2015 19:43:18 | | [slot] removed file slots/1/boinc_lockfile
08/05/2015 19:43:18 | | [slot] removed file slots/1/GetDecics_2.00_windows_intelx86
08/05/2015 19:43:18 | | [slot] removed file slots/1/in
08/05/2015 19:43:18 | | [slot] removed file slots/1/init_data.xml
08/05/2015 19:43:18 | | [slot] removed file slots/1/out
08/05/2015 19:43:18 | | [slot] removed file slots/1/stderr.txt
08/05/2015 19:43:18 | NumberFields@home | Computation for task wu_sf3_DS-10x271_Grp502822of682667_0 finished

Starting a new CMS-dev task in the same slot replaced the old file with a new one, with a new datestamp and a new file-size. QED.
ID: 337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 338 - Posted: 8 May 2015, 20:59:24 UTC

That's great news, Richard.

I was almost to 4GB, but it's hard to come there.

I hope Ivan/Laurence are reading here too, cause I found under which circumstance the size of the vm_image.vdi file is growing more than normal.
You would expect, it's when the VM is doing normal work, but no, it's just when CMS's job queue (not BOINC-queue) is empty and the VM don't get jobs.
That was/is today the case and that's why your VM could grow above the 4GB.
ID: 338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 339 - Posted: 8 May 2015, 21:20:56 UTC - in response to Message 338.  

Those are presumably the times when the VM console just shows an endless queue of

tail: /home/boinc/stderr: file truncated

- as both machines have been showing every time I've looked today.
ID: 339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 340 - Posted: 8 May 2015, 21:35:09 UTC

David's found the problem:

fa3f6be5128071fe7e15563e727b5478c45a63b8
ID: 340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 341 - Posted: 9 May 2015, 7:03:39 UTC - in response to Message 340.  
Last modified: 9 May 2015, 7:22:49 UTC

David's found the problem

I've been testing other circumstances where BOINC has to handle >4GB VM-files.

1. The project has a 4.5GB base VM in the project directory.
BOINC is not able to copy_temp that file into a slot directory. Every minute/few minutes a new copy_temp file is created mostly with 0 bytes and also an init_data.xml is renewed.
Task is running but a VM can never be created.

2. Task is suspended with LAIM off. VM with >4 GB vdi-file in a slot is saved.
When resuming the task it lasts a bit longer, but then the vdi-file is shrinked to 1.35GB, but the VM is restored and running.

I start using the BOINC client of May 8th also called 7.5.1.
ID: 341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : exceeded disk limit


©2024 CERN