Thread 'New Version v47.20'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3647 - Posted: 12 Jul 2016, 12:09:37 UTC This updated image uses the production version of CernVM and the CVMFS configuration has been modified to address some issues that we have been seeing with some volunteers. ID: 3647 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 793 Credit: 4,220,534 RAC: 3,895	Message 3668 - Posted: 13 Jul 2016, 15:45:02 UTC Last modified: 13 Jul 2016, 15:49:08 UTC Win 10(x64)pro, Virtualbox 5.0.24, Boinc 7.6.22 Helo Laurence missing Job finished with 0. for the last task, Condor ended with Return-Value N/A. Is this alright so? 2016-07-13 13:26:52 (6900): Guest Log: [INFO] Job finished with 0. 2016-07-13 13:26:52 (6900): Guest Log: [INFO] New Job Starting 2016-07-13 13:26:52 (6900): Guest Log: [INFO] Condor JobID: 1257047 2016-07-13 13:27:12 (6900): Guest Log: [INFO] CRAB ID: 4982 2016-07-13 14:57:52 (6900): Status Report: Job Duration: '64800.000000' 2016-07-13 14:57:52 (6900): Status Report: Elapsed Time: '42055.182323' 2016-07-13 14:57:52 (6900): Status Report: CPU Time: '31666.875000' 2016-07-13 15:45:15 (6900): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2016-07-13 16:37:52 (6900): Status Report: Job Duration: '64800.000000' 2016-07-13 16:37:52 (6900): Status Report: Elapsed Time: '48056.017391' 2016-07-13 16:37:52 (6900): Status Report: CPU Time: '36318.734375' 2016-07-13 16:55:38 (6900): VM Completion File Detected. 2016-07-13 16:55:38 (6900): VM Completion Message: Condor exited with return value N/A. . 2016-07-13 16:55:38 (6900): Powering off VM. 2016-07-13 16:55:41 (6900): Successfully stopped VM. 2016-07-13 16:55:46 (6900): Deregistering VM. (boinc_02bff7bd7c32aff8, slot#1) 2016-07-13 16:55:52 (6900): Removing virtual disk drive(s) from VM. 2016-07-13 16:55:52 (6900): Removing network bandwidth throttle group from VM. 2016-07-13 16:55:52 (6900): Removing storage controller(s) from VM. 2016-07-13 16:55:53 (6900): Removing VM from VirtualBox. 16:55:59 (6900): called boinc_finish(0) ID: 3668 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3672 - Posted: 13 Jul 2016, 18:55:05 UTC - in response to Message 3668. It looks like it is a logging problem. The StarterLog shows that the job finished fine. 07/13/16 13:26:03 (pid:6981) Create_Process succeeded, pid=6987 07/13/16 16:49:06 (pid:6981) Process exited, pid=6987, status=0 It could be something like a buffer isn't flushed before the VM is powered off. ID: 3672 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 793 Credit: 4,220,534 RAC: 3,895	Message 3679 - Posted: 14 Jul 2016, 6:57:18 UTC Thank you for the answer. In the CMS-Task running at the moment is a finished_2.log from 14-Jul-2016 02:02 and a running.log from 14-Jul-2016 02:01 and a running-.log from 14-Jul-2016 08:47 Is this difference for the running log, because of the date-change at 0.00 UTC? saw before only one running.log. ID: 3679 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3681 - Posted: 14 Jul 2016, 7:44:29 UTC - in response to Message 3679. No. Once you get a new VM one should be running-slot1.log and the other just running.log. When running multiple jobs there will be one running log per slot but the console will only show running.log, i.e. running-slot1.log. ID: 3681 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 165	Message 3687 - Posted: 14 Jul 2016, 11:32:19 UTC - in response to Message 3672. It looks like it is a logging problem. The StarterLog shows that the job finished fine. 07/13/16 13:26:03 (pid:6981) Create_Process succeeded, pid=6987 07/13/16 16:49:06 (pid:6981) Process exited, pid=6987, status=0 Yes, everything looks fine my end; the result file is on the data bridge. It could be something like a buffer isn't flushed before the VM is powered off. ID: 3687 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3703 - Posted: 14 Jul 2016, 22:06:04 UTC Last modified: 14 Jul 2016, 22:06:51 UTC I have started a cms-task after a little while. I noticed, the running-.log has a lot of repeating information in it. The size is 1.7Mb after just one hour.If that is uploaded, that is awfully big. I hope, it is compressed, before upload. This appears over and over again: ------- PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec Settings (changes only) ------------------------- \| \| \| Name \| Now \| Default Min Max \| \| \| \| \| \| Next:numberShowEvent \| 0 \| 1 0 \| \| ParticleDecays:allowPhotonRadiation \| on \| off \| \| ParticleDecays:limitTau0 \| on \| off \| \| ParticleDecays:mixB \| off \| on \| \| ProcessLevel:all \| off \| on \| \| \| ------- End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec Settings ------------------------------------ -------- PYTHIA Particle Data Table (changed only) ------------------------------------------------------------------------------ id name antiName spn chg col m0 mWidth mMin mMax tau0 res dec ext vis wid no onMode bRatio meMode products no particle data has been changed from its default value -------- End PYTHIA Particle Data Table ----------------------------------------------------------------------------------------- ID: 3703 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 165	Message 3707 - Posted: 15 Jul 2016, 8:17:43 UTC - in response to Message 3703. You should have seen it before I managed to find out how to cut it back! They were 43 MB for 150,000 events -- compressed! (Current batch is 200,000 events.) Uncompressed they were nearly 700 MB... No, the full log isn't returned. CRAB returns the first 1,000 and last 3,000 lines, so a bit under 500 kB. I can handle that by compressing them on the server; they compress well due to the repetitive nature. ID: 3707 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3714 - Posted: 15 Jul 2016, 13:56:00 UTC - in response to Message 3707. Last modified: 15 Jul 2016, 13:56:13 UTC I compressed one of the log files with 7z.(just for fun) Went down from 3.6MB to 50K. ID: 3714 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 165	Message 3715 - Posted: 15 Jul 2016, 18:52:04 UTC - in response to Message 3714. I compressed one of the log files with 7z.(just for fun) Went down from 3.6MB to 50K. If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) ID: 3715 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3716 - Posted: 15 Jul 2016, 19:37:27 UTC - in response to Message 3715. If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) No, i have not. I am just curious, why it was implemented the way it is, even without transmitting it over network in mind. The test was just to see, what would be theoretically possible, no as criticism. ID: 3716 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 165	Message 3717 - Posted: 15 Jul 2016, 22:42:23 UTC - in response to Message 3716. Last modified: 15 Jul 2016, 22:42:59 UTC If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) No, i have not. I am just curious, why it was implemented the way it is, even without transmitting it over network in mind. The test was just to see, what would be theoretically possible, no as criticism. No, no, no criticism taken or inferred. OK, it's been a while since I did this, but the implementations were in C on an Atari ST Mega4 sometime in the mid- (or maybe early-) 1990s. Someone was promoting a new algorithm as better than Lempel-Ziv-Welch (LZW) which was a big thing at the time as, IIRC, someone was asserting copyright over LZW and demanding licencing fees... It turned out not to be so good (I still have the Atari, one day I'll fire it up and recover all the algorithms I made at the time, stored on its 1 GB SCSI hard drive...). The basics of LZW are to look for strings of the same consecutive character sequences, and to replace them with a single-byte token. As the file progresses, longer and longer same-sequence strings get replaced by just one token. In the case of the Pythia recurring output (every 100 events) there is no, or almost no, change in this output, so after a while the algorithm will be replacing identical sequences of several hundred, or even thousands, of bytes with just one token. Hence, the compression ratio goes through the roof! ID: 3717 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1000 Credit: 17,850,696 RAC: 19,033	Message 3718 - Posted: 16 Jul 2016, 3:36:32 UTC ID: 3718 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 165	Message 3720 - Posted: 16 Jul 2016, 10:47:21 UTC - in response to Message 3718. o o \| \_/ ID: 3720 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3729 - Posted: 17 Jul 2016, 20:52:14 UTC I have shut down boinc for 5 min(LAIM off). Not only did it not resume the job, it was working on and restarted a new one, it also disregarded the 12h limit and nearly continued to the 18h limit(manual shutdown to prevent loss of job). It seem all the achievements regarding suspend/resume in the past have not been implemented in the newer versions. ID: 3729 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 793 Credit: 4,220,534 RAC: 3,895	Message 3836 - Posted: 27 Jul 2016, 19:31:34 UTC THREE CMS-Tasks in one Computer are running with the change of app_config.xml <app_config> <project_max_concurrent>3</project_max_concurrent> <app> <name>ATLAS</name><max_concurrent>1</max_concurrent></app> <app> <name>ALICE</name><max_concurrent>1</max_concurrent></app> <app> <name>CMS</name><max_concurrent>3</max_concurrent></app> <app> <name>LHCb</name><max_concurrent>1</max_concurrent></app> <app> <name>Theory</name><max_concurrent>1</max_concurrent> </app> </app_config> ID: 3836 · Rating: 0 · rate: / Reply Quote

Development for LHC@home