Message boards : Number crunching : exceeded disk limit
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
Well, it seems to be running as well as the others at the moment: I was aware you used the app_info.xml, vbox32 en BOINC32, but could not see if your machine was 64bit capable. I suppose you also placed your 4.79GB vdi-file in the project folder and linked to it in your app_info, so you don't have to wait whether the size of the vdi is increased >4GB. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
Well, it seems to be running as well as the others at the moment: Well, I did a quick cheat-test by copying the 4.79 GB file I saved a couple of days ago to the 32-bit machine, and putting in in another project's slot directory. 'Exceeded disk limit' came up with the right numbers, the task was aborted and the file deleted, and everything carried on working properly. |
Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0 ![]() |
Any ideas why this bug affected CMS and not ATLAS? Or did it? Ben |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
Any ideas why this bug affected CMS and not ATLAS? Or did it? Hi Ben, Probably because ATLAS vdi-files in the slot never exceeds 4GB. Initial the vdi is 1.57 and the job within the VM is doing 1 single job lasting about 2-3 hours. Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue. I've noted here before that 19% cpu is used even when there is nothing to do and several python processes are running in the VM. Maybe they are creating big loggings. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 ![]() |
the CMS-vdi is growing and growing when there are no jobs available in the queue. That is one hell of a lot of writing! |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0 ![]() |
Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue. I've been digging around on this. It doesn't appear to be logs, rather something to do with cvmfs. Increase in "disk" usage in /var/lib/cvmfs/shared matches the usage increase in /, and almost matches the size increase of the .vdi image; e.g. in the last hour these increased by 165,700K, 165,780K and 170,918K respectively. (This on my SLC6 machine.) I'll check overnight growth in the morning. ![]() |
Send message Joined: 13 Feb 15 Posts: 1223 Credit: 935,551 RAC: 1,039 ![]() ![]() ![]() |
Any ideas why this bug affected CMS and not ATLAS? Or did it? I have searched for tasks on ATLAS and the highest slot use I could find, was 3.5 GB. That are all files in the slot directory and subdirectories together including the snapshot files, which are still made by the wrapper ATLAS has in use. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1141 Credit: 8,310,612 RAC: 0 ![]() |
Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue. Managed to get the overnight usage figures, moments before a campus-wide power failure took out all my computers (it may have been even wider, one of my PCs at home stopped reporting at the same time...). Of course, the breaker to my office tripped so it was without power for two hours. So, the usage in /var/lib/cvmfs/shared increased by 1,217,932K; amount used in / increased by 1,220,940K, and the .vdi file increased by 2,213,544K. Probably need a cvmfs expert to tell us what that implies. [Edit] Looked at the figures after I restarted. Image "disk" usage seems to have gone back to normal startup values: /var/lib/cvmfs dropped 1,920,440K from the pre-cut figure and / is using 1,932,470K less. The .vdi file, however, is 174,064K larger. [/Edit] ![]() |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 ![]() |
I hit BOINC's "reset" to fetch wrapper and disk image again. Current task although doing no work, has not grown abouve 1.6GB vdi. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 795 Credit: 13,775,303 RAC: 8,930 ![]() ![]() ![]() |
CMS-dev has been running with no problems for me this month. http://boincai05.cern.ch/CMS-dev/results.php?userid=192 Only time I had any error was when I lost power for a few minutes or did a reboot for Windows update but got around that too. Atlas is another story. Mad Scientist For Life ![]() |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
I really think we need to get a grip on this problem. To recap: there's a bug in the BOINC client which means it fails to delete files larger than 4 GB when it should. This project is (still, today) producing files larger than 4 GB. When those files are left lying around, we cause errors for every other BOINC project that our computers may be attached to. That's not nice. Eric M has had to put up a front-page warning at LHC classic, so he can get on with his work. The cure is simple and permanent: apply the 080515 hotfix BOINC client. But I had a look through the top 200 hosts yesterday (pretty much the active user base here), and only 9 of the 154 windows machines had the hotfix applied - take a bow, rbpeake, Crystal Pellet, Ray Murray, and m. (the other three were mine) Since the message clearly isn't getting through, even to people who have posted in this thread, I'm going to send a PM to the admins asking them to reinforce the message via a front-page news item and BOINC 'Notice': and if that doesn't work, to ask them to enforce a minimum BOINC version of 7.5.1 for Windows computers attached to this project. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 ![]() |
OK, messages sent to Ivan, Laurence and Hendrik. The files that are needed to apply the hotfix are For 64-bit BOINC boinc.080515.x64.zip For 32-bit BOINC boinc.080515.x86.zip Simply extract the two files for your version from the .zip archive, and copy them to your BOINC program folder - you'll need to stop the BOINC client while you do this, and restart it again afterwards. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 ![]() ![]() |
A point to note. If you, like me, were running an older version of BOINC (7.2.42, it works well so why change?) and install the fix, you may have problems due to not having the required version of the Visual C runtimes. You need to change to BOINC 7.4.42 first, then add/replace the installed files with those from the 080515 archive. John. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 795 Credit: 13,775,303 RAC: 8,930 ![]() ![]() ![]() |
![]() Mad Scientist For Life ![]() |
©2025 CERN