Message boards : News : VBox Wrappers Updated to 26157
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 96
Message 154 - Posted: 22 Mar 2015, 23:04:51 UTC
Last modified: 22 Mar 2015, 23:06:13 UTC

The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26157. Also the tag enable_cern_dataformat has been removed from in the job XML file.

Let us know how it goes.
ID: 154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 156 - Posted: 23 Mar 2015, 2:00:09 UTC - in response to Message 154.  
Last modified: 23 Mar 2015, 2:33:48 UTC


Let us know how it goes.


For me, not good.
Ubuntu Linux. VBox 4.3.12 BOINC 7.2.42.

The VM console contains:-

Starting vmcontext_epilog ...
bootlogd: no process killed

tail: /home/boinc/stderr: file truncated
tail: /home/boinc/stderr: file truncated
tail: /home/boinc/stderr: file truncated

this last line is repeated at approx 2 minute intervals... 25 times as I write.

stderr.log contains repeats, at 2 minute intervals, of this sequence:-

[23/03/15 01:20:31] Traceback (most recent call last):
[23/03/15 01:20:31] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>
[23/03/15 01:20:31] user = config['BOINC_USERID']
[23/03/15 01:20:31] KeyError: 'BOINC_USERID'

edit:- and so to bed.

John.
ID: 156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,232
Message 157 - Posted: 23 Mar 2015, 8:25:38 UTC - in response to Message 154.  

Let us know how it goes.


I tested v26157 (Win64) on VirtualLHC and there it was running fine.
Here the wrapper is running OK too, but is there something wrong with getting CMS-work for the VM?

23/03/15 09:12:19] Traceback (most recent call last):
23/03/15 09:12:19] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>
23/03/15 09:12:19] user = config['BOINC_USERID']
23/03/15 09:12:19] KeyError: 'BOINC_USERID'
23/03/15 09:13:03] Traceback (most recent call last):
23/03/15 09:13:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>
23/03/15 09:13:03] user = config['BOINC_USERID']
23/03/15 09:13:03] KeyError: 'BOINC_USERID'
23/03/15 09:15:03] Traceback (most recent call last):
23/03/15 09:15:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>
23/03/15 09:15:03] user = config['BOINC_USERID']
23/03/15 09:15:03] KeyError: 'BOINC_USERID'
23/03/15 09:17:03] Traceback (most recent call last):
23/03/15 09:17:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>
23/03/15 09:17:03] user = config['BOINC_USERID']
23/03/15 09:17:03] KeyError: 'BOINC_USERID'
ID: 157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 96
Message 158 - Posted: 23 Mar 2015, 9:06:55 UTC
Last modified: 23 Mar 2015, 9:28:46 UTC

Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file.
ID: 158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,232
Message 159 - Posted: 23 Mar 2015, 9:45:09 UTC - in response to Message 158.  

Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file.

Got your new CMS_23_03_2015.xml and now it's running again :)

[23/03/15 10:41:59] cmsRun -j FrameworkJobReport.xml PSet.py
ID: 159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 160 - Posted: 23 Mar 2015, 10:16:23 UTC - in response to Message 158.  
Last modified: 23 Mar 2015, 10:16:58 UTC

Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file.

Yes, and this is now working on Mac too, with the latest wrapper !!!
ID: 160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 161 - Posted: 23 Mar 2015, 10:50:32 UTC

Seems fine here at work, on both Windows and SLC6. On the Linux box it looks like the cable commands have been switched, so that "on" now comes after "off". :-)
ID: 161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 162 - Posted: 23 Mar 2015, 22:20:21 UTC - in response to Message 161.  

Seems fine here at work, on both Windows and SLC6. On the Linux box it looks like the cable commands have been switched, so that "on" now comes after "off". :-)

...And it's now working on my Linux Mint system at home too.
Has anyone had any problems since this morning? It might be time to move on to the next phase. I'm trying to learn as much as possible about how to run this, but at the moment I still have to defer to the ultimate developers.
ID: 162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 163 - Posted: 24 Mar 2015, 0:46:20 UTC
Last modified: 24 Mar 2015, 1:34:16 UTC

Running fine here on Ubuntu Linux - 2 hosts.

Can't get a task on Win7, just reports "No tasks sent"... Keep trying.

Reset project, now says "Reached limit on tasks in progress" since it doesn't have any tasks in progress, I'm stuck.

Edit:- It's sorted itself out and started a task but the VM has a problem.

stdout.log:-

[24/03/15 01:10:16] --- we had a job exeption!
[24/03/15 01:10:16] output was: 
[24/03/15 01:10:16] 
[24/03/15 01:10:16] --- error output was: 
[24/03/15 01:10:16] 
[24/03/15 01:10:16] Done with CMS Job, uploading results to somewehere...
[24/03/15 01:10:16] String is: {"tm_dbs_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader", "tm_publication": "F", "tm_job_arch": "slc5_amd64_gcc462", "tm_job_sw": "CMSSW_5_3_4", "tm_user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=spiga/CN=606831/CN=Daniele Spiga", "tm_end_injection": "None", "tm_user_sandbox":
....and lots more stuff like this.

stderr.log:-
[37m[24/03/15 01:10:16] Traceback (most recent call last):
[24/03/15 01:10:16] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 151, in <module>
[24/03/15 01:10:16] runJob(req.text)
[24/03/15 01:10:16] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 104, in runJob
[24/03/15 01:10:16] tar.add(name)
[24/03/15 01:10:16] File "/usr/lib64/python2.6/tarfile.py", line 1971, in add
[24/03/15 01:10:16] tarinfo = self.gettarinfo(name, arcname)
[24/03/15 01:10:16] File "/usr/lib64/python2.6/tarfile.py", line 1840, in gettarinfo
[24/03/15 01:10:16] statres = os.lstat(name)
[24/03/15 01:10:16] OSError: [Errno 2] No such file or directory: 'FrameworkJobReport.xml'

The wrapper is still running but it's making no attempt to recover. I'll abort this task and start again.

John.
ID: 163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 164 - Posted: 24 Mar 2015, 2:11:23 UTC

win7.
Reset project to get a clean start. Now running OK.

Maybe I should have left the previous attempt to see if the wrapper would
have realised the VM wasn't running and rebooted it eventually.

John.
ID: 164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 165 - Posted: 24 Mar 2015, 3:20:52 UTC

On a second Win7 host, ended the previous CMS task (old wrapper) "gracefully". New task downloaded and started without problem.

Noticed these as the VM started:-

Tue Mar 24 02:49:36 2015: Starting atd: ^[[60G[^[[0;32m OK ^[[0;39m]
Tue Mar 24 02:49:37 2015: Running CernVM context boot hooks:
Tue Mar 24 02:49:37 2015: cvmfs: unrecognized service
Tue Mar 24 02:49:37 2015: Bringing up loopback interface: ^[[60G[^[[0;32m OK ^[[0;39m]

is the "unrecognised service" message OK?.

Now running OK on 2 Linux and 2 Win7 boxes.

Goodnight.
ID: 165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 166 - Posted: 24 Mar 2015, 10:00:15 UTC - in response to Message 165.  
Last modified: 24 Mar 2015, 10:24:29 UTC

Noticed these as the VM started:-

Tue Mar 24 02:49:36 2015: Starting atd: ^[[60G[^[[0;32m OK ^[[0;39m]
Tue Mar 24 02:49:37 2015: Running CernVM context boot hooks:
Tue Mar 24 02:49:37 2015: cvmfs: unrecognized service
Tue Mar 24 02:49:37 2015: Bringing up loopback interface: ^[[60G[^[[0;32m OK ^[[0;39m]

is the "unrecognised service" message OK?.

Well, I have it here on my SLC6 box too. I'd noticed it before, when we were having the network problems.
The job on my Windows box failed overnight, but when I looked more closely I saw Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED!

Hmm, that's strange:
23-Mar-2015 16:57:57 [CMS-dev] Aborting task CMS_8563_1426858394.917974_0: exceeded disk limit: 9657.06MB > 9536.74MB
There's 375 GB free on the disk and my local preferences set disk to unlimited.
Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit?
ID: 166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,232
Message 169 - Posted: 24 Mar 2015, 21:13:02 UTC - in response to Message 166.  

Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit?

Not yet, but will let it run this time for 24 hours.
To avoid early crashing due to run into the limit, I increased the bound to 19.07 GB.
At the moment after 6 hours run time the size of the CMS-slot is: 4131MB
The size of the snapshot files however vary from 360 to 700 MB and
when a snapshot is made there is a short period with 2 snapshots, before the oldest snapshot is deleted.
ID: 169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 171 - Posted: 24 Mar 2015, 22:42:55 UTC - in response to Message 169.  

Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit?

Not yet, but will let it run this time for 24 hours.
To avoid early crashing due to run into the limit, I increased the bound to 19.07 GB.
At the moment after 6 hours run time the size of the CMS-slot is: 4131MB
The size of the snapshot files however vary from 360 to 700 MB and
when a snapshot is made there is a short period with 2 snapshots, before the oldest snapshot is deleted.

OK, I just started a task here at home and set up a script to save the size of the slots directory tree every 60 seconds. I'll try to ssh into my Windows machine at work and set up the same script so I might have an answer by tomorrow lunchtime.
ID: 171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 172 - Posted: 24 Mar 2015, 23:05:40 UTC - in response to Message 171.  
Last modified: 24 Mar 2015, 23:14:54 UTC

Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit?

Not yet, but will let it run this time for 24 hours.
To avoid early crashing due to run into the limit, I increased the bound to 19.07 GB.
At the moment after 6 hours run time the size of the CMS-slot is: 4131MB
The size of the snapshot files however vary from 360 to 700 MB and
when a snapshot is made there is a short period with 2 snapshots, before the oldest snapshot is deleted.

OK, I just started a task here at home and set up a script to save the size of the slots directory tree every 60 seconds. I'll try to ssh into my Windows machine at work and set up the same script so I might have an answer by tomorrow lunchtime.

Well, one immediate difference, the directory started out at ~1.2 GB in Mint Linux, but 7.4 GB in Windows 7! There's a 5.8 GB vm_cache.vdi in Windows that doesn't exist on Linux:
[homepc01:BOINC] > ls -lrS slots/8
total 1925292
-rw-r--r-- 1 ivan ivan          0 Mar 24 22:23 boinc_lockfile
-rw-r--r-- 1 ivan ivan         50 Mar 24 22:23 vbox_webapi.xml
-rw-r--r-- 1 ivan ivan         66 Mar 24 22:23 vbox_remote_desktop.xml
-rw-r--r-- 1 ivan ivan         83 Mar 24 22:21 vbox_job.xml
-rw-r--r-- 1 ivan ivan        102 Mar 24 22:21 vboxwrapper_26157_x86_64-pc-linux-gnu
-rw-r--r-- 1 ivan ivan        208 Mar 24 22:54 vbox_checkpoint.xml
-rw-r--r-- 1 ivan ivan        337 Mar 24 22:54 boinc_task_state.xml
-rw-r--r-- 1 ivan ivan       3526 Mar 24 22:54 vbox_replay.txt
-rw-r--r-- 1 ivan ivan       3943 Mar 24 22:54 stderr.txt
drwx------ 4 ivan ivan       4096 Mar 24 22:54 boinc_b66ab0b176951b62
-rw-r--r-- 1 ivan ivan       7801 Mar 24 22:23 init_data.xml
-rw-r--r-- 1 ivan ivan       8192 Mar 24 23:02 boinc_mmap_file
-rw-r--r-- 1 ivan ivan      11745 Mar 24 22:54 vbox_trace.txt
-rw-r--r-- 1 ivan ivan      28672 Mar 24 22:23 vm_floppy_8.img
-rw------- 1 ivan ivan      76583 Mar 24 23:02 VBox.log
-rwxr-xr-x 1 ivan ivan 1971322880 Mar 24 22:54 vm_image.vdi
vs.
admD405@W7-SE-D304-01 /cygdrive/d/ProgramData/BOINC
$ ls -lrS slots/10
total 7474294
-rwx------+ 1 Administrators None          0 Mar 24 22:48 boinc_lockfile
drwx------+ 1 Administrators None          0 Mar 24 22:48 boinc_b497de900721d9bf
-rwx------+ 1 Administrators None         53 Mar 24 22:48 vbox_webapi.xml
-rwx------+ 1 Administrators None         69 Mar 24 22:48 vbox_remote_desktop.xml
-rwx------+ 1 Administrators None         84 Mar 24 22:48 vbox_job.xml
-rwx------+ 1 Administrators None        102 Mar 24 22:48 vboxwrapper_26157_windows_x86_64.pdb
-rwx------+ 1 Administrators None        102 Mar 24 22:48 vboxwrapper_26157_windows_x86_64.exe
-rwx------+ 1 Administrators None        209 Mar 24 22:48 vbox_checkpoint.xml
-rwx------+ 1 Administrators None       3592 Mar 24 22:49 stderr.txt
-rwx------+ 1 Administrators None       9218 Mar 24 22:48 init_data.xml
-rwx------+ 1 Administrators None      28672 Mar 24 22:48 vm_floppy_10.img
-rwx------+ 1 Administrators None      58735 Mar 24 22:49 VBox.log
-rwx------+ 1 Administrators None 1795162112 Mar 24 22:49 vm_image.vdi
-rwx------+ 1 Administrators None 5847908352 Mar 19 14:45 vm_cache.vdi


Could a Mac user please check what's in their slots directory? The bash script I'm using to monitor the slots directory is:
for ((;;)) ; do echo -n `date +"%T"`" "; du -s slots/8; sleep 60; done
(with appropriate slot number, of course...) redirected into a text file to save it; I'm monitoring this with tail -f.
ID: 172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 173 - Posted: 25 Mar 2015, 0:24:24 UTC
Last modified: 25 Mar 2015, 1:07:37 UTC

After two to three hours running, disk space here is ~5.0G on Win7
and ~5.4G on Linux. I plan to leave all 4 tasks to run to completion naturally.

CMS tasks are sharing hosts with both traditional and VM (T4T) tasks so there is some (boinc) task swapping going on.

RAM is the constraint here, preventing CMS running well with any other VM project except T4T.

Edit:-
Snapshot file sizes are:-

W7 417M and 434M
Linux 438M and 425M

Edit edit These numbers are the sum of the "base" snapshot image and the difference file.

I haven't changed anything, they are running as received.
ID: 173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 174 - Posted: 25 Mar 2015, 0:51:51 UTC - in response to Message 173.  

After two to three hours running, disk space here is ~5.0G on Win7
and ~5.4G on Linux. I plan to leave all 4 tasks to run to completion naturally.

Thanks. What versions of VirtualBox are you running?
ID: 174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 175 - Posted: 25 Mar 2015, 1:03:11 UTC - in response to Message 174.  

VBox is 4.3.12r93733 on all hosts.
ID: 175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 176 - Posted: 25 Mar 2015, 1:10:52 UTC - in response to Message 175.  
Last modified: 25 Mar 2015, 1:11:17 UTC

VBox is 4.3.12r93733 on all hosts.

Thanks. Can you switch to 4.3.26 (or later) as soon as T4T gives the all-clear? Your Windows directory size is somewhat smaller than mine.
ID: 176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 177 - Posted: 25 Mar 2015, 1:22:36 UTC - in response to Message 176.  
Last modified: 25 Mar 2015, 1:47:01 UTC

I plan to remain with 4.3.12 unless and until there is a definite reason to change. However I can turn off T4T and change VB version pro tem on one Win7 host if you want. Could do Linux similarly if it will help. I can probably do this tomorrow, er, today if you want.

It will take three or four days to get 24 hours running time (@ ca 7 hours per night)


Edit:- I've downloaded VB 4.3.26-98988 Win & Linux just in case (ISP doesn't count data overnight). I'm off to bed.
ID: 177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : VBox Wrappers Updated to 26157


©2024 CERN