Message boards :
Number crunching :
exceeded disk limit
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
David says Thanks. and Rom says Here is a new private drop: Looking at the code change, I think there's a reasonable expectation that this will solve the copying problem too - they've switched from using a 32-bit to a 64-bit version of a low-level Windows library function. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
Looking at the code change, I think there's a reasonable expectation that this will solve the copying problem too - they've switched from using a 32-bit to a 64-bit version of a low-level Windows library function. Thanks Richard. I try to check the solving of the delete problem first. The VM in the slot is now 3.668 MB and 21 hours to go. Will not touch the task - no pause, suspend, BOINC restart etc. If that succeeds, I'll test the BOINC-copy to a slot from a >4GB project VM. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
The VM in the slot is now 3.668 MB and 21 hours to go. Required file size of vm_image.vdi achieved: 4,22 GB and 18 hours to go. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,945,852 RAC: 0 |
Good work David et al. This is the bit of Boincing I enjoy(?) most: helping to find, identify, isolate a problem to allow others to hopefully find a resolution as I've no idea about the coding side so can't help there. Much better than those who have a problem but just go away until it's fixed. I've currently got one of these with the "file truncated" messages. At 4 hours the image was 2.25GB, at 14 hours, 4.5GB and currently 21 hours 5.2GB. CPU usage is 2% so it's clearly not doing anything useful, yet the image continues to grow. (That's another issue, for Dr. Ivan to look at.) I've had a few of these before and I think a manual reset through VBox sparked them back into useful work. Before doing that with this one, is there a particular log file that I could copy to provide any useful diagnostic into what caused the apparent blockage in the first place? Change of plan: <3 hours for that one to finish so shortly before it does I'm going to suspend it and upgrade to the test Boinc, hoping that the VM won't reset itself in that process. If that works, we will soon see if the large file gets deleted. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
I have one with a 5.2 GB file, already running under the new private drop - but it's not due to finish for another 4 hours, so Ray will beat me to it. Likewise, I had to stop BOINC to upgrade part-way through: the VM console is still showing nothing but "file truncated" messages, and very low CPU usage. Although I did a little Test4Theory testing in the early days, I've forgotten what little I learned about the fine control of the VM from externally or within BOINC. It's a pity that the VM/BOINC combination doesn't yet allow a VM-based project to release un-needed resources back for other BOINC projects to use if, as Crystal Pellet suggests, the looping messages only appear "when CMS's job queue (not BOINC-queue) is empty and the VM don't get jobs". That's a question I remember raising with Ben Segal at the 2010 BOINC workshop in London, and I fear it will deter some of the more competitive volunteers from participating in this type of project. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,945,852 RAC: 0 |
I've been wondering whether these will accept a "graceful finish" by editing the checkpoint file like T4T do/did? Experiment for 2moro. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
I've been wondering whether these will accept a "graceful finish" by editing the checkpoint file like T4T do/did? Experiment for 2moro. It will, but as stated before, at least with the BOINC client of the 4th of May, the vdi-file >4GB will shrunk, so the delete fix can't be tested as long the vdi is not grown again over 4GB. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,945,852 RAC: 0 |
Success 8¬) using 080515.x64 Image had grown to 6GB but everything in the slot got cleared out. Task 56691 09/05/2015 18:32:46 | CMS-dev | Message from task: 0 09/05/2015 18:32:46 | | [slot] cleaning out slots/4: handle_exited_app() 09/05/2015 18:32:46 | | [slot] removed file slots/4/boinc_finish_called 09/05/2015 18:32:46 | | [slot] removed file slots/4/boinc_task_state.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/init_data.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/output 09/05/2015 18:32:46 | | [slot] removed file slots/4/stderr.txt 09/05/2015 18:32:46 | | [slot] removed file slots/4/VBox.log 09/05/2015 18:32:46 | | [slot] removed file slots/4/vboxwrapper_26165_windows_x86_64.exe 09/05/2015 18:32:46 | | [slot] removed file slots/4/vboxwrapper_26165_windows_x86_64.pdb 09/05/2015 18:32:46 | | [slot] removed file slots/4/vbox_checkpoint.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/vbox_job.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/vbox_remote_desktop.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/vbox_webapi.xml 09/05/2015 18:32:46 | | [slot] removed file slots/4/vm_floppy_4.img 09/05/2015 18:32:46 | | [slot] removed file slots/4/vm_image.vdi 09/05/2015 18:32:46 | CMS-dev | Computation for task CMS_30897_1427806622.031532_0 finished New task again has the "file truncated" messages and in ALT-F5 in red: ERROR:root:No message received!: Nothing to do! so I guess there is no actual work available for the VM so I'll suspend this one for now while upgrading the other machine and will let the new T4T Databridge use that core for a while. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 |
Success 8¬) using 080515.x64 Well done! I'll try that build tomorrow. ERROR:root:No message received!: Nothing to do! Same here, I tried starting and stopping it a few times but its sulking. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
And the same here, with the image file close to the 5.7 GB that was reported in the error reports that brought me here. 09/05/2015 19:46:19 | CMS-dev | Message from task: 0 09/05/2015 19:46:19 | | [slot] cleaning out slots/1: handle_exited_app() 09/05/2015 19:46:19 | | [slot] removed file slots/1/boinc_finish_called 09/05/2015 19:46:19 | | [slot] removed file slots/1/boinc_task_state.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/init_data.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/output 09/05/2015 19:46:19 | | [slot] removed file slots/1/stderr.txt 09/05/2015 19:46:19 | | [slot] removed file slots/1/VBox.log 09/05/2015 19:46:19 | | [slot] removed file slots/1/vboxwrapper_26165_windows_x86_64.exe 09/05/2015 19:46:19 | | [slot] removed file slots/1/vboxwrapper_26165_windows_x86_64.pdb 09/05/2015 19:46:19 | | [slot] removed file slots/1/vbox_checkpoint.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/vbox_job.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/vbox_remote_desktop.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/vbox_webapi.xml 09/05/2015 19:46:19 | | [slot] removed file slots/1/vm_floppy_1.img 09/05/2015 19:46:19 | | [slot] removed file slots/1/vm_image.vdi 09/05/2015 19:46:19 | CMS-dev | Computation for task CMS_30909_1427806622.286095_0 finished 09/05/2015 19:46:19 | | [slot] cleaning out slots/1: get_free_slot() 09/05/2015 19:46:19 | NumberFields@home | [slot] assigning slot 1 to wu_sf3_DS-10x271_Grp504817of682667_0 09/05/2015 19:46:19 | | [slot] removed file slots/1/init_data.xml 09/05/2015 19:46:19 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/GetDecics_2.00_windows_intelx86 to slots/1/GetDecics_2.00_windows_intelx86 09/05/2015 19:46:19 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/sf3_DS-10x271_Grp504817of682667.dat to slots/1/in 09/05/2015 19:46:19 | NumberFields@home | [slot] linked ../../projects/numberfields.asu.edu_NumberFields/wu_sf3_DS-10x271_Grp504817of682667_0_0 to slots/1/out 09/05/2015 19:46:19 | | [slot] removed file slots/1/boinc_temporary_exit 09/05/2015 19:46:19 | NumberFields@home | Starting task wu_sf3_DS-10x271_Grp504817of682667_0 09/05/2015 19:46:19 | NumberFields@home | [cpu_sched] Starting task wu_sf3_DS-10x271_Grp504817of682667_0 using GetDecics version 200 in slot 1 and as you can see, the following task from another project started cleanly and is running normally. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
And delete success here too with the newest private BOINC-client boinc.080515.x64 leading to an empty slot: CMS-dev 10 May 09:04:16 Message from task: 0 10 May 09:04:16 [slot] cleaning out slots/8: handle_exited_app() 10 May 09:04:16 [slot] removed file slots/8/boinc_finish_called 10 May 09:04:16 [slot] removed file slots/8/boinc_task_state.xml 10 May 09:04:16 [slot] removed file slots/8/init_data.xml 10 May 09:04:16 [slot] removed file slots/8/output 10 May 09:04:16 [slot] removed file slots/8/stderr.txt 10 May 09:04:16 [slot] removed file slots/8/VBox.log 10 May 09:04:16 [slot] removed file slots/8/vboxwrapper_26165_windows_x86_64.exe 10 May 09:04:16 [slot] removed fileslots/8/vboxwrapper_26165_windows_x86_64.pdb 10 May 09:04:16 [slot] removed file slots/8/vbox_checkpoint.xml 10 May 09:04:16 [slot] removed file slots/8/vbox_job.xml 10 May 09:04:16 [slot] removed file slots/8/vbox_remote_desktop.xml 10 May 09:04:16 [slot] removed file slots/8/vbox_webapi.xml 10 May 09:04:16 [slot] removed file slots/8/vm_floppy_8.img 10 May 09:04:16 [slot] removed file slots/8/vm_image.vdi CMS-dev 10 May 09:04:16 Computation for task CMS_30966_1427806623.586496_0 finished Directory contents a few minutes before the task finished: 09-05-2015 09:23 0 boinc_lockfile 10-05-2015 09:02 508 boinc_task_state.xml 09-05-2015 11:26 9.255 init_data.xml 10-05-2015 08:21 8.447 stderr.txt 09-05-2015 15:47 123.660 VBox.log 09-05-2015 08:51 102 vboxwrapper_26165_windows_x86_64.exe 09-05-2015 08:51 102 vboxwrapper_26165_windows_x86_64.pdb 10-05-2015 09:02 217 vbox_checkpoint.xml 09-05-2015 08:51 85 vbox_job.xml 09-05-2015 08:51 69 vbox_remote_desktop.xml 09-05-2015 08:51 53 vbox_webapi.xml 09-05-2015 09:19 28.672 vm_floppy_8.img 10-05-2015 08:53 6.429.868.032 vm_image.vdi Now I'll test the other 2 circumstances: 1. BOINC-copy >4GB file from project directory to a slot - Outcome: It's working with a 4.28GB project file. 2. Resume task with a vdi file >4GB in the slot - Outcome: This is working too. The now 4.44GB vdi file in the slot didn't shrunk like before. Remark 1: Although no real work was done by the VM, the CPU-usage was 19.1% of the elapsed time. Remark 2: I don't know why I'm getting so high credits for those tasks. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
My second machine also transitioned gracefully from CMS to another project and back again, without errors - but since it happened at 02:30 in the morning, I didn't see what the final image file size was. I've been wondering whether these will accept a "graceful finish" by editing the checkpoint file like T4T do/did? Experiment for 2moro. I couldn't find any equivalent number in the current file-set, but I did find <job_duration>86400</job_duration> in CMS_26_03_2015a.xml (project folder). Changing that to 43200 didn't affect the running task, but it did start the next on course for what looks like is going to be a 12-hour run - so the transitions will happen in daylight, for a while at least. Another useful question turned up on the BOINC message boards yesterday: What is this: vbox_windows set to 0 |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 101 |
It's here <vbox_window>0|1</vbox_window> If enabled, launch VBox applications in full interactive mode. Otherwise, run them silently with VBoxHeadless. John |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
Yes - that's what I added yesterday, in response to the question ;-) (check the Wiki history tab!) It's based on my own observation, rather than formal guidance from the developers - it wan't documented when the question was asked. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 101 |
Ahh, that explains it. It was rather a long way down the list. Hopefully the developers will realise that updating the documentation is as important as updating the code. I'll go back to sleep. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
I've been wondering whether these will accept a "graceful finish" by editing the checkpoint file like T4T do/did? Experiment for 2moro. I didn't react earlier, because I wasn't sure what would happen after a task-resume with a oversized VM in a slot with the new BOINC client. Had to test that first; with success. Like T4T you may end your task gracefully by editing the vbox_checkpoint.xml in the corresponding slot of the task. Normally the job duration is 86400 seconds, so if you want an early finish - suspend the task with 'Leave application in memory' off. The VM will be in saved state and not in paused state. - change the elapsed time to 86400 in vbox_checkpoint.xml in the slot. - Resume the task. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
It was rather a long way down the list. Alphabetical order. Recent versions of BOINC auto-generate a fully-populated cc_config.xml template, also in alphabetic order, when some options are set via the GUI. That caused problems for some users, who added their own tags manually and ended up with duplicate entries. Best to keep the strict alphabetic order for both the configuration file and its documentation. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
David thanks us all, but implicitly asks us to keep our eyes open for any other glitches: That's good news. Following a question from Sekerob, I've tested that my 4.79 GB .vdi file was properly detected and deleted under a 32-bit Windows OS and the 32-bit version of the private drop. I've now set host 393 up to run under the 32-bit versions of VBox wrapper, but otherwise mimicking the stock delivery - an 1-hour first test run seems to have started normally, and has reached the 'file truncated' loop. If this run works, I'll let the next one run the full 24 hours, but suspend work fetch after that (I have to be away for a few days next week). |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
Hi Richard, I don't know, whether testing 32bit is useful at CMS, cause CMS has only vbox64 applications and the Linux VM is 64bit too. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
Well, it seems to be running as well as the others at the moment: - we won't know for certain until the back end starts supplying jobs again, of course. CMS only supplies 64-bit apps, sure - but the app it supplies is BOINC's 64-bit VBox wrapper. I simply set up an app_info file and substituted the 32-bit wrapper files, and off it went. The whole point of a VM is that the guest OS doesn't have to match the host: my hardware on this host is fully 64-bit capable, and includes the virtualization hooks to enable VBox to run. |
©2024 CERN