Message boards :
News :
VBox Wrappers Updated to 26157
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 332,243 RAC: 148 |
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26157. Also the tag enable_cern_dataformat has been removed from in the job XML file. Let us know how it goes. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
For me, not good. Ubuntu Linux. VBox 4.3.12 BOINC 7.2.42. The VM console contains:- Starting vmcontext_epilog ... bootlogd: no process killed tail: /home/boinc/stderr: file truncated tail: /home/boinc/stderr: file truncated tail: /home/boinc/stderr: file truncated this last line is repeated at approx 2 minute intervals... 25 times as I write. stderr.log contains repeats, at 2 minute intervals, of this sequence:- [37m[23/03/15 01:20:31] Traceback (most recent call last):[0m [37m[23/03/15 01:20:31] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module>[0m [37m[23/03/15 01:20:31] user = config['BOINC_USERID'][0m [37m[23/03/15 01:20:31] KeyError: 'BOINC_USERID'[0m edit:- and so to bed. John. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 172 |
Let us know how it goes. I tested v26157 (Win64) on VirtualLHC and there it was running fine. Here the wrapper is running OK too, but is there something wrong with getting CMS-work for the VM? 23/03/15 09:12:19] Traceback (most recent call last): 23/03/15 09:12:19] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module> 23/03/15 09:12:19] user = config['BOINC_USERID'] 23/03/15 09:12:19] KeyError: 'BOINC_USERID'[0m 23/03/15 09:13:03] Traceback (most recent call last): 23/03/15 09:13:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module> 23/03/15 09:13:03] user = config['BOINC_USERID'] 23/03/15 09:13:03] KeyError: 'BOINC_USERID' 23/03/15 09:15:03] Traceback (most recent call last): 23/03/15 09:15:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module> 23/03/15 09:15:03] user = config['BOINC_USERID'] 23/03/15 09:15:03] KeyError: 'BOINC_USERID' 23/03/15 09:17:03] Traceback (most recent call last): 23/03/15 09:17:03] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 142, in <module> 23/03/15 09:17:03] user = config['BOINC_USERID'] 23/03/15 09:17:03] KeyError: 'BOINC_USERID' |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 332,243 RAC: 148 |
Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 172 |
Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file. Got your new CMS_23_03_2015.xml and now it's running again :) [23/03/15 10:41:59] cmsRun -j FrameworkJobReport.xml PSet.py |
Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0 |
Sorry, I misunderstood an email from Rom Walton, I have put back the enable_cern_dataformat tag in the job XML file. Yes, and this is now working on Mac too, with the latest wrapper !!! |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
Seems fine here at work, on both Windows and SLC6. On the Linux box it looks like the cable commands have been switched, so that "on" now comes after "off". :-) |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
Seems fine here at work, on both Windows and SLC6. On the Linux box it looks like the cable commands have been switched, so that "on" now comes after "off". :-) ...And it's now working on my Linux Mint system at home too. Has anyone had any problems since this morning? It might be time to move on to the next phase. I'm trying to learn as much as possible about how to run this, but at the moment I still have to defer to the ultimate developers. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
Running fine here on Ubuntu Linux - 2 hosts. Can't get a task on Win7, just reports "No tasks sent"... Keep trying. Reset project, now says "Reached limit on tasks in progress" since it doesn't have any tasks in progress, I'm stuck. Edit:- It's sorted itself out and started a task but the VM has a problem. stdout.log:- [37m[24/03/15 01:10:16] --- we had a job exeption![0m [37m[24/03/15 01:10:16] output was: [0m [37m[24/03/15 01:10:16] [0m [37m[24/03/15 01:10:16] --- error output was: [0m [37m[24/03/15 01:10:16] [0m [37m[24/03/15 01:10:16] Done with CMS Job, uploading results to somewehere...[0m [37m[24/03/15 01:10:16] String is: {"tm_dbs_url": "https://cmsweb.cern.ch/dbs/prod/global/DBSReader", "tm_publication": "F", "tm_job_arch": "slc5_amd64_gcc462", "tm_job_sw": "CMSSW_5_3_4", "tm_user_dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=spiga/CN=606831/CN=Daniele Spiga", "tm_end_injection": "None", "tm_user_sandbox": ....and lots more stuff like this. stderr.log:- [37m[24/03/15 01:10:16] Traceback (most recent call last):[0m [37m[24/03/15 01:10:16] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 151, in <module>[0m [37m[24/03/15 01:10:16] runJob(req.text)[0m [37m[24/03/15 01:10:16] File "/cvmfs/cms.cern.ch/CMS@Home/agent/CMSJobAgent.py", line 104, in runJob[0m [37m[24/03/15 01:10:16] tar.add(name)[0m [37m[24/03/15 01:10:16] File "/usr/lib64/python2.6/tarfile.py", line 1971, in add[0m [37m[24/03/15 01:10:16] tarinfo = self.gettarinfo(name, arcname)[0m [37m[24/03/15 01:10:16] File "/usr/lib64/python2.6/tarfile.py", line 1840, in gettarinfo[0m [37m[24/03/15 01:10:16] statres = os.lstat(name)[0m [37m[24/03/15 01:10:16] OSError: [Errno 2] No such file or directory: 'FrameworkJobReport.xml'[0m The wrapper is still running but it's making no attempt to recover. I'll abort this task and start again. John. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
win7. Reset project to get a clean start. Now running OK. Maybe I should have left the previous attempt to see if the wrapper would have realised the VM wasn't running and rebooted it eventually. John. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
On a second Win7 host, ended the previous CMS task (old wrapper) "gracefully". New task downloaded and started without problem. Noticed these as the VM started:- Tue Mar 24 02:49:36 2015: Starting atd: ^[[60G[^[[0;32m OK ^[[0;39m] Tue Mar 24 02:49:37 2015: Running CernVM context boot hooks: Tue Mar 24 02:49:37 2015: cvmfs: unrecognized service Tue Mar 24 02:49:37 2015: Bringing up loopback interface: ^[[60G[^[[0;32m OK ^[[0;39m] is the "unrecognised service" message OK?. Now running OK on 2 Linux and 2 Win7 boxes. Goodnight. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
Noticed these as the VM started:- Well, I have it here on my SLC6 box too. I'd noticed it before, when we were having the network problems. The job on my Windows box failed overnight, but when I looked more closely I saw Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED! Hmm, that's strange: 23-Mar-2015 16:57:57 [CMS-dev] Aborting task CMS_8563_1426858394.917974_0: exceeded disk limit: 9657.06MB > 9536.74MBThere's 375 GB free on the disk and my local preferences set disk to unlimited. Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit? |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 172 |
Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit? Not yet, but will let it run this time for 24 hours. To avoid early crashing due to run into the limit, I increased the bound to 19.07 GB. At the moment after 6 hours run time the size of the CMS-slot is: 4131MB The size of the snapshot files however vary from 360 to 700 MB and when a snapshot is made there is a short period with 2 snapshots, before the oldest snapshot is deleted. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit? OK, I just started a task here at home and set up a script to save the size of the slots directory tree every 60 seconds. I'll try to ssh into my Windows machine at work and set up the same script so I might have an answer by tomorrow lunchtime. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
Ah! I remember! There's a limit in the job template, we had this trouble before, and raised it from 5 GB to 10 GB then. Is anyone else running into this limit? Well, one immediate difference, the directory started out at ~1.2 GB in Mint Linux, but 7.4 GB in Windows 7! There's a 5.8 GB vm_cache.vdi in Windows that doesn't exist on Linux: [homepc01:BOINC] > ls -lrS slots/8 total 1925292 -rw-r--r-- 1 ivan ivan 0 Mar 24 22:23 boinc_lockfile -rw-r--r-- 1 ivan ivan 50 Mar 24 22:23 vbox_webapi.xml -rw-r--r-- 1 ivan ivan 66 Mar 24 22:23 vbox_remote_desktop.xml -rw-r--r-- 1 ivan ivan 83 Mar 24 22:21 vbox_job.xml -rw-r--r-- 1 ivan ivan 102 Mar 24 22:21 vboxwrapper_26157_x86_64-pc-linux-gnu -rw-r--r-- 1 ivan ivan 208 Mar 24 22:54 vbox_checkpoint.xml -rw-r--r-- 1 ivan ivan 337 Mar 24 22:54 boinc_task_state.xml -rw-r--r-- 1 ivan ivan 3526 Mar 24 22:54 vbox_replay.txt -rw-r--r-- 1 ivan ivan 3943 Mar 24 22:54 stderr.txt drwx------ 4 ivan ivan 4096 Mar 24 22:54 boinc_b66ab0b176951b62 -rw-r--r-- 1 ivan ivan 7801 Mar 24 22:23 init_data.xml -rw-r--r-- 1 ivan ivan 8192 Mar 24 23:02 boinc_mmap_file -rw-r--r-- 1 ivan ivan 11745 Mar 24 22:54 vbox_trace.txt -rw-r--r-- 1 ivan ivan 28672 Mar 24 22:23 vm_floppy_8.img -rw------- 1 ivan ivan 76583 Mar 24 23:02 VBox.log -rwxr-xr-x 1 ivan ivan 1971322880 Mar 24 22:54 vm_image.vdivs. admD405@W7-SE-D304-01 /cygdrive/d/ProgramData/BOINC $ ls -lrS slots/10 total 7474294 -rwx------+ 1 Administrators None 0 Mar 24 22:48 boinc_lockfile drwx------+ 1 Administrators None 0 Mar 24 22:48 boinc_b497de900721d9bf -rwx------+ 1 Administrators None 53 Mar 24 22:48 vbox_webapi.xml -rwx------+ 1 Administrators None 69 Mar 24 22:48 vbox_remote_desktop.xml -rwx------+ 1 Administrators None 84 Mar 24 22:48 vbox_job.xml -rwx------+ 1 Administrators None 102 Mar 24 22:48 vboxwrapper_26157_windows_x86_64.pdb -rwx------+ 1 Administrators None 102 Mar 24 22:48 vboxwrapper_26157_windows_x86_64.exe -rwx------+ 1 Administrators None 209 Mar 24 22:48 vbox_checkpoint.xml -rwx------+ 1 Administrators None 3592 Mar 24 22:49 stderr.txt -rwx------+ 1 Administrators None 9218 Mar 24 22:48 init_data.xml -rwx------+ 1 Administrators None 28672 Mar 24 22:48 vm_floppy_10.img -rwx------+ 1 Administrators None 58735 Mar 24 22:49 VBox.log -rwx------+ 1 Administrators None 1795162112 Mar 24 22:49 vm_image.vdi -rwx------+ 1 Administrators None 5847908352 Mar 19 14:45 vm_cache.vdi Could a Mac user please check what's in their slots directory? The bash script I'm using to monitor the slots directory is: for ((;;)) ; do echo -n `date +"%T"`" "; du -s slots/8; sleep 60; done(with appropriate slot number, of course...) redirected into a text file to save it; I'm monitoring this with tail -f. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
After two to three hours running, disk space here is ~5.0G on Win7 and ~5.4G on Linux. I plan to leave all 4 tasks to run to completion naturally. CMS tasks are sharing hosts with both traditional and VM (T4T) tasks so there is some (boinc) task swapping going on. RAM is the constraint here, preventing CMS running well with any other VM project except T4T. Edit:- Snapshot file sizes are:- W7 417M and 434M Linux 438M and 425M Edit edit These numbers are the sum of the "base" snapshot image and the difference file. I haven't changed anything, they are running as received. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
After two to three hours running, disk space here is ~5.0G on Win7 Thanks. What versions of VirtualBox are you running? |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
VBox is 4.3.12r93733 on all hosts. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,973,351 RAC: 944 |
VBox is 4.3.12r93733 on all hosts. Thanks. Can you switch to 4.3.26 (or later) as soon as T4T gives the all-clear? Your Windows directory size is somewhat smaller than mine. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 11 |
I plan to remain with 4.3.12 unless and until there is a definite reason to change. However I can turn off T4T and change VB version pro tem on one Win7 host if you want. Could do Linux similarly if it will help. I can probably do this tomorrow, er, today if you want. It will take three or four days to get 24 hours running time (@ ca 7 hours per night) Edit:- I've downloaded VB 4.3.26-98988 Win & Linux just in case (ISP doesn't count data overnight). I'm off to bed. |
©2024 CERN