Message boards :
CMS Application :
New Version v47.20
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
This updated image uses the production version of CernVM and the CVMFS configuration has been modified to address some issues that we have been seeing with some volunteers. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 463 |
Win 10(x64)pro, Virtualbox 5.0.24, Boinc 7.6.22 Helo Laurence missing Job finished with 0. for the last task, Condor ended with Return-Value N/A. Is this alright so? 2016-07-13 13:26:52 (6900): Guest Log: [INFO] Job finished with 0. 2016-07-13 13:26:52 (6900): Guest Log: [INFO] New Job Starting 2016-07-13 13:26:52 (6900): Guest Log: [INFO] Condor JobID: 1257047 2016-07-13 13:27:12 (6900): Guest Log: [INFO] CRAB ID: 4982 2016-07-13 14:57:52 (6900): Status Report: Job Duration: '64800.000000' 2016-07-13 14:57:52 (6900): Status Report: Elapsed Time: '42055.182323' 2016-07-13 14:57:52 (6900): Status Report: CPU Time: '31666.875000' 2016-07-13 15:45:15 (6900): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2016-07-13 16:37:52 (6900): Status Report: Job Duration: '64800.000000' 2016-07-13 16:37:52 (6900): Status Report: Elapsed Time: '48056.017391' 2016-07-13 16:37:52 (6900): Status Report: CPU Time: '36318.734375' 2016-07-13 16:55:38 (6900): VM Completion File Detected. 2016-07-13 16:55:38 (6900): VM Completion Message: Condor exited with return value N/A. . 2016-07-13 16:55:38 (6900): Powering off VM. 2016-07-13 16:55:41 (6900): Successfully stopped VM. 2016-07-13 16:55:46 (6900): Deregistering VM. (boinc_02bff7bd7c32aff8, slot#1) 2016-07-13 16:55:52 (6900): Removing virtual disk drive(s) from VM. 2016-07-13 16:55:52 (6900): Removing network bandwidth throttle group from VM. 2016-07-13 16:55:52 (6900): Removing storage controller(s) from VM. 2016-07-13 16:55:53 (6900): Removing VM from VirtualBox. 16:55:59 (6900): called boinc_finish(0) |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
It looks like it is a logging problem. The StarterLog shows that the job finished fine.
It could be something like a buffer isn't flushed before the VM is powered off. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 463 |
Thank you for the answer. In the CMS-Task running at the moment is a finished_2.log from 14-Jul-2016 02:02 and a running.log from 14-Jul-2016 02:01 and a running-.log from 14-Jul-2016 08:47 Is this difference for the running log, because of the date-change at 0.00 UTC? saw before only one running.log. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
No. Once you get a new VM one should be running-slot1.log and the other just running.log. When running multiple jobs there will be one running log per slot but the console will only show running.log, i.e. running-slot1.log. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
It looks like it is a logging problem. The StarterLog shows that the job finished fine. Yes, everything looks fine my end; the result file is on the data bridge. It could be something like a buffer isn't flushed before the VM is powered off. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have started a cms-task after a little while. I noticed, the running-.log has a lot of repeating information in it. The size is 1.7Mb after just one hour.If that is uploaded, that is awfully big. I hope, it is compressed, before upload. This appears over and over again:
|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
You should have seen it before I managed to find out how to cut it back! They were 43 MB for 150,000 events -- compressed! (Current batch is 200,000 events.) Uncompressed they were nearly 700 MB... No, the full log isn't returned. CRAB returns the first 1,000 and last 3,000 lines, so a bit under 500 kB. I can handle that by compressing them on the server; they compress well due to the repetitive nature. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I compressed one of the log files with 7z.(just for fun) Went down from 3.6MB to 50K. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
I compressed one of the log files with 7z.(just for fun) If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) No, i have not. I am just curious, why it was implemented the way it is, even without transmitting it over network in mind. The test was just to see, what would be theoretically possible, no as criticism. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-) No, no, no criticism taken or inferred. OK, it's been a while since I did this, but the implementations were in C on an Atari ST Mega4 sometime in the mid- (or maybe early-) 1990s. Someone was promoting a new algorithm as better than Lempel-Ziv-Welch (LZW) which was a big thing at the time as, IIRC, someone was asserting copyright over LZW and demanding licencing fees... It turned out not to be so good (I still have the Atari, one day I'll fire it up and recover all the algorithms I made at the time, stored on its 1 GB SCSI hard drive...). The basics of LZW are to look for strings of the same consecutive character sequences, and to replace them with a single-byte token. As the file progresses, longer and longer same-sequence strings get replaced by just one token. In the case of the Pythia recurring output (every 100 events) there is no, or almost no, change in this output, so after a while the algorithm will be replacing identical sequences of several hundred, or even thousands, of bytes with just one token. Hence, the compression ratio goes through the roof! |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,152,486 RAC: 2,125 |
|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
o o | \_/ |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have shut down boinc for 5 min(LAIM off). Not only did it not resume the job, it was working on and restarted a new one, it also disregarded the 12h limit and nearly continued to the 18h limit(manual shutdown to prevent loss of job). It seem all the achievements regarding suspend/resume in the past have not been implemented in the newer versions. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 463 |
THREE CMS-Tasks in one Computer are running with the change of app_config.xml <app_config> <project_max_concurrent>3</project_max_concurrent> <app> <name>ATLAS</name><max_concurrent>1</max_concurrent></app> <app> <name>ALICE</name><max_concurrent>1</max_concurrent></app> <app> <name>CMS</name><max_concurrent>3</max_concurrent></app> <app> <name>LHCb</name><max_concurrent>1</max_concurrent></app> <app> <name>Theory</name><max_concurrent>1</max_concurrent> </app> </app_config> |
©2024 CERN