Message boards : Number crunching : exceeded disk limit
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 362 - Posted: 10 May 2015, 21:30:31 UTC - in response to Message 361.  

Well, it seems to be running as well as the others at the moment:

image

- we won't know for certain until the back end starts supplying jobs again, of course.

CMS only supplies 64-bit apps, sure - but the app it supplies is BOINC's 64-bit VBox wrapper. I simply set up an app_info file and substituted the 32-bit wrapper files, and off it went. The whole point of a VM is that the guest OS doesn't have to match the host: my hardware on this host is fully 64-bit capable, and includes the virtualization hooks to enable VBox to run.

I was aware you used the app_info.xml, vbox32 en BOINC32, but could not see if your machine was 64bit capable.
I suppose you also placed your 4.79GB vdi-file in the project folder and linked to it in your app_info, so you don't have to wait whether the size of the vdi is increased >4GB.
ID: 362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 363 - Posted: 10 May 2015, 21:37:05 UTC - in response to Message 362.  

Well, it seems to be running as well as the others at the moment:

image

- we won't know for certain until the back end starts supplying jobs again, of course.

CMS only supplies 64-bit apps, sure - but the app it supplies is BOINC's 64-bit VBox wrapper. I simply set up an app_info file and substituted the 32-bit wrapper files, and off it went. The whole point of a VM is that the guest OS doesn't have to match the host: my hardware on this host is fully 64-bit capable, and includes the virtualization hooks to enable VBox to run.

I was aware you used the app_info.xml, vbox32 en BOINC32, but could not see if your machine was 64bit capable.
I suppose you also placed your 4.79GB vdi-file in the project folder and linked to it in your app_info, so you don't have to wait whether the size of the vdi is increased >4GB.

Well, I did a quick cheat-test by copying the 4.79 GB file I saved a couple of days ago to the 32-bit machine, and putting in in another project's slot directory. 'Exceeded disk limit' came up with the right numbers, the task was aborted and the file deleted, and everything carried on working properly.
ID: 363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 364 - Posted: 11 May 2015, 4:45:11 UTC - in response to Message 363.  

Any ideas why this bug affected CMS and not ATLAS? Or did it?

Ben
ID: 364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 365 - Posted: 11 May 2015, 7:23:25 UTC - in response to Message 364.  

Any ideas why this bug affected CMS and not ATLAS? Or did it?

Ben

Hi Ben,

Probably because ATLAS vdi-files in the slot never exceeds 4GB.
Initial the vdi is 1.57 and the job within the VM is doing 1 single job lasting about 2-3 hours.

Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue.
I've noted here before that 19% cpu is used even when there is nothing to do and several python processes are running in the VM.
Maybe they are creating big loggings.
ID: 365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 367 - Posted: 11 May 2015, 16:40:19 UTC - in response to Message 365.  

the CMS-vdi is growing and growing when there are no jobs available in the queue.
I've noted here before that 19% cpu is used even when there is nothing to do and several python processes are running in the VM.
Maybe they are creating big loggings.

That is one hell of a lot of writing!
ID: 367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,884
RAC: 3,177
Message 368 - Posted: 11 May 2015, 17:09:24 UTC - in response to Message 365.  

Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue.
I've noted here before that 19% cpu is used even when there is nothing to do and several python processes are running in the VM.
Maybe they are creating big loggings.

I've been digging around on this. It doesn't appear to be logs, rather something to do with cvmfs. Increase in "disk" usage in /var/lib/cvmfs/shared matches the usage increase in /, and almost matches the size increase of the .vdi image; e.g. in the last hour these increased by 165,700K, 165,780K and
170,918K respectively. (This on my SLC6 machine.) I'll check overnight growth in the morning.
ID: 368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 369 - Posted: 12 May 2015, 8:59:01 UTC - in response to Message 364.  

Any ideas why this bug affected CMS and not ATLAS? Or did it?

Ben

I have searched for tasks on ATLAS and the highest slot use I could find, was 3.5 GB.
That are all files in the slot directory and subdirectories together including the snapshot files, which are still made by the wrapper ATLAS has in use.
ID: 369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,884
RAC: 3,177
Message 370 - Posted: 12 May 2015, 11:07:22 UTC - in response to Message 368.  
Last modified: 12 May 2015, 11:16:48 UTC

Maybe you can give some attention to the fact that the CMS-vdi is growing and growing when there are no jobs available in the queue.
I've noted here before that 19% cpu is used even when there is nothing to do and several python processes are running in the VM.
Maybe they are creating big loggings.

I've been digging around on this. It doesn't appear to be logs, rather something to do with cvmfs. Increase in "disk" usage in /var/lib/cvmfs/shared matches the usage increase in /, and almost matches the size increase of the .vdi image; e.g. in the last hour these increased by 165,700K, 165,780K and
170,918K respectively. (This on my SLC6 machine.) I'll check overnight growth in the morning.

Managed to get the overnight usage figures, moments before a campus-wide power failure took out all my computers (it may have been even wider, one of my PCs at home stopped reporting at the same time...). Of course, the breaker to my office tripped so it was without power for two hours.
So, the usage in /var/lib/cvmfs/shared increased by 1,217,932K; amount used in / increased by 1,220,940K, and the .vdi file increased by 2,213,544K. Probably need a cvmfs expert to tell us what that implies.

[Edit] Looked at the figures after I restarted. Image "disk" usage seems to have gone back to normal startup values: /var/lib/cvmfs dropped 1,920,440K from the pre-cut figure and / is using 1,932,470K less. The .vdi file, however, is 174,064K larger. [/Edit]
ID: 370 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 371 - Posted: 14 May 2015, 18:01:09 UTC

I hit BOINC's "reset" to fetch wrapper and disk image again.
Current task although doing no work, has not grown abouve 1.6GB vdi.
ID: 371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,461
RAC: 8,388
Message 372 - Posted: 14 May 2015, 20:59:51 UTC

CMS-dev has been running with no problems for me this month.

http://boincai05.cern.ch/CMS-dev/results.php?userid=192

Only time I had any error was when I lost power for a few minutes or did a reboot for Windows update but got around that too.

Atlas is another story.
Mad Scientist For Life
ID: 372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 377 - Posted: 19 May 2015, 9:47:27 UTC

I really think we need to get a grip on this problem.

To recap: there's a bug in the BOINC client which means it fails to delete files larger than 4 GB when it should.

This project is (still, today) producing files larger than 4 GB.

When those files are left lying around, we cause errors for every other BOINC project that our computers may be attached to. That's not nice. Eric M has had to put up a front-page warning at LHC classic, so he can get on with his work.

The cure is simple and permanent: apply the 080515 hotfix BOINC client. But I had a look through the top 200 hosts yesterday (pretty much the active user base here), and only 9 of the 154 windows machines had the hotfix applied - take a bow, rbpeake, Crystal Pellet, Ray Murray, and m. (the other three were mine)

Since the message clearly isn't getting through, even to people who have posted in this thread, I'm going to send a PM to the admins asking them to reinforce the message via a front-page news item and BOINC 'Notice': and if that doesn't work, to ask them to enforce a minimum BOINC version of 7.5.1 for Windows computers attached to this project.
ID: 377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 378 - Posted: 19 May 2015, 10:18:02 UTC

OK, messages sent to Ivan, Laurence and Hendrik.

The files that are needed to apply the hotfix are

For 64-bit BOINC
boinc.080515.x64.zip

For 32-bit BOINC
boinc.080515.x86.zip

Simply extract the two files for your version from the .zip archive, and copy them to your BOINC program folder - you'll need to stop the BOINC client while you do this, and restart it again afterwards.
ID: 378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 382 - Posted: 19 May 2015, 20:45:07 UTC

A point to note. If you, like me, were running an older version of
BOINC (7.2.42, it works well so why change?) and install the fix,
you may have problems due to not having the required version of the
Visual C runtimes. You need to change to BOINC 7.4.42 first,
then add/replace the installed files with those from the 080515
archive.

John.
ID: 382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,461
RAC: 8,388
Message 385 - Posted: 21 May 2015, 7:11:28 UTC


Mad Scientist For Life
ID: 385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : exceeded disk limit


©2024 CERN