Message boards : CMS Application : New Version v47.20
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 3647 - Posted: 12 Jul 2016, 12:09:37 UTC

This updated image uses the production version of CernVM and the CVMFS configuration has been modified to address some issues that we have been seeing with some volunteers.
ID: 3647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 432
Credit: 1,261,674
RAC: 37
Message 3668 - Posted: 13 Jul 2016, 15:45:02 UTC
Last modified: 13 Jul 2016, 15:49:08 UTC

Win 10(x64)pro, Virtualbox 5.0.24, Boinc 7.6.22

Helo Laurence

missing Job finished with 0. for the last task,
Condor ended with Return-Value N/A.
Is this alright so?

2016-07-13 13:26:52 (6900): Guest Log: [INFO] Job finished with 0.
2016-07-13 13:26:52 (6900): Guest Log: [INFO] New Job Starting
2016-07-13 13:26:52 (6900): Guest Log: [INFO] Condor JobID: 1257047
2016-07-13 13:27:12 (6900): Guest Log: [INFO] CRAB ID: 4982
2016-07-13 14:57:52 (6900): Status Report: Job Duration: '64800.000000'
2016-07-13 14:57:52 (6900): Status Report: Elapsed Time: '42055.182323'
2016-07-13 14:57:52 (6900): Status Report: CPU Time: '31666.875000'
2016-07-13 15:45:15 (6900): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2016-07-13 16:37:52 (6900): Status Report: Job Duration: '64800.000000'
2016-07-13 16:37:52 (6900): Status Report: Elapsed Time: '48056.017391'
2016-07-13 16:37:52 (6900): Status Report: CPU Time: '36318.734375'
2016-07-13 16:55:38 (6900): VM Completion File Detected.
2016-07-13 16:55:38 (6900): VM Completion Message: Condor exited with return value N/A.
.
2016-07-13 16:55:38 (6900): Powering off VM.
2016-07-13 16:55:41 (6900): Successfully stopped VM.
2016-07-13 16:55:46 (6900): Deregistering VM. (boinc_02bff7bd7c32aff8, slot#1)
2016-07-13 16:55:52 (6900): Removing virtual disk drive(s) from VM.
2016-07-13 16:55:52 (6900): Removing network bandwidth throttle group from VM.
2016-07-13 16:55:52 (6900): Removing storage controller(s) from VM.
2016-07-13 16:55:53 (6900): Removing VM from VirtualBox.
16:55:59 (6900): called boinc_finish(0)
ID: 3668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 3672 - Posted: 13 Jul 2016, 18:55:05 UTC - in response to Message 3668.  

It looks like it is a logging problem. The StarterLog shows that the job finished fine.

07/13/16 13:26:03 (pid:6981) Create_Process succeeded, pid=6987
07/13/16 16:49:06 (pid:6981) Process exited, pid=6987, status=0


It could be something like a buffer isn't flushed before the VM is powered off.
ID: 3672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 432
Credit: 1,261,674
RAC: 37
Message 3679 - Posted: 14 Jul 2016, 6:57:18 UTC

Thank you for the answer.

In the CMS-Task running at the moment is a
finished_2.log from 14-Jul-2016 02:02 and a
running.log from 14-Jul-2016 02:01 and a
running-.log from 14-Jul-2016 08:47

Is this difference for the running log, because of the date-change at 0.00 UTC?

saw before only one running.log.
ID: 3679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 3681 - Posted: 14 Jul 2016, 7:44:29 UTC - in response to Message 3679.  

No. Once you get a new VM one should be running-slot1.log and the other just running.log. When running multiple jobs there will be one running log per slot but the console will only show running.log, i.e. running-slot1.log.
ID: 3681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 3687 - Posted: 14 Jul 2016, 11:32:19 UTC - in response to Message 3672.  

It looks like it is a logging problem. The StarterLog shows that the job finished fine.

07/13/16 13:26:03 (pid:6981) Create_Process succeeded, pid=6987
07/13/16 16:49:06 (pid:6981) Process exited, pid=6987, status=0

Yes, everything looks fine my end; the result file is on the data bridge.

It could be something like a buffer isn't flushed before the VM is powered off.

ID: 3687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 965
Credit: 1,201,500
RAC: 5
Message 3703 - Posted: 14 Jul 2016, 22:06:04 UTC
Last modified: 14 Jul 2016, 22:06:51 UTC

I have started a cms-task after a little while.

I noticed, the running-.log has a lot of repeating information in it.
The size is 1.7Mb after just one hour.If that is uploaded, that is awfully big.
I hope, it is compressed, before upload.


This appears over and over again:


*------- PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec Settings (changes only) -------------------------*
| |
| Name | Now | Default Min Max |
| | | |
| Next:numberShowEvent | 0 | 1 0 |
| ParticleDecays:allowPhotonRadiation | on | off |
| ParticleDecays:limitTau0 | on | off |
| ParticleDecays:mixB | off | on |
| ProcessLevel:all | off | on |
| |
*------- End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec Settings ------------------------------------*

-------- PYTHIA Particle Data Table (changed only) ------------------------------------------------------------------------------

id name antiName spn chg col m0 mWidth mMin mMax tau0 res dec ext vis wid
no onMode bRatio meMode products

no particle data has been changed from its default value

-------- End PYTHIA Particle Data Table -----------------------------------------------------------------------------------------
ID: 3703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 3707 - Posted: 15 Jul 2016, 8:17:43 UTC - in response to Message 3703.  

You should have seen it before I managed to find out how to cut it back! They were 43 MB for 150,000 events -- compressed! (Current batch is 200,000 events.) Uncompressed they were nearly 700 MB... No, the full log isn't returned. CRAB returns the first 1,000 and last 3,000 lines, so a bit under 500 kB. I can handle that by compressing them on the server; they compress well due to the repetitive nature.
ID: 3707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 965
Credit: 1,201,500
RAC: 5
Message 3714 - Posted: 15 Jul 2016, 13:56:00 UTC - in response to Message 3707.  
Last modified: 15 Jul 2016, 13:56:13 UTC

I compressed one of the log files with 7z.(just for fun)
Went down from 3.6MB to 50K.
ID: 3714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 3715 - Posted: 15 Jul 2016, 18:52:04 UTC - in response to Message 3714.  

I compressed one of the log files with 7z.(just for fun)
Went down from 3.6MB to 50K.

If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-)
ID: 3715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 965
Credit: 1,201,500
RAC: 5
Message 3716 - Posted: 15 Jul 2016, 19:37:27 UTC - in response to Message 3715.  

If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-)


No, i have not.

I am just curious, why it was implemented the way it is, even without transmitting it over network in mind.

The test was just to see, what would be theoretically possible, no as criticism.
ID: 3716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 3717 - Posted: 15 Jul 2016, 22:42:23 UTC - in response to Message 3716.  
Last modified: 15 Jul 2016, 22:42:59 UTC

If you've ever written an implementation of some of the better compression algorithms, or even just studied them, you'll know why. :-)


No, i have not.

I am just curious, why it was implemented the way it is, even without transmitting it over network in mind.

The test was just to see, what would be theoretically possible, no as criticism.

No, no, no criticism taken or inferred.

OK, it's been a while since I did this, but the implementations were in C on an Atari ST Mega4 sometime in the mid- (or maybe early-) 1990s. Someone was promoting a new algorithm as better than Lempel-Ziv-Welch (LZW) which was a big thing at the time as, IIRC, someone was asserting copyright over LZW and demanding licencing fees...

It turned out not to be so good (I still have the Atari, one day I'll fire it up and recover all the algorithms I made at the time, stored on its 1 GB SCSI hard drive...). The basics of LZW are to look for strings of the same consecutive character sequences, and to replace them with a single-byte token. As the file progresses, longer and longer same-sequence strings get replaced by just one token. In the case of the Pythia recurring output (every 100 events) there is no, or almost no, change in this output, so after a while the algorithm will be replacing identical sequences of several hundred, or even thousands, of bytes with just one token. Hence, the compression ratio goes through the roof!
ID: 3717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 540
Credit: 7,616,583
RAC: 1,500
Message 3718 - Posted: 16 Jul 2016, 3:36:32 UTC

ID: 3718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 3720 - Posted: 16 Jul 2016, 10:47:21 UTC - in response to Message 3718.  

 o o
  |
 \_/

ID: 3720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 965
Credit: 1,201,500
RAC: 5
Message 3729 - Posted: 17 Jul 2016, 20:52:14 UTC

I have shut down boinc for 5 min(LAIM off).
Not only did it not resume the job, it was working on and restarted a new one, it also disregarded the 12h limit and nearly continued to the 18h limit(manual shutdown to prevent loss of job).

It seem all the achievements regarding suspend/resume in the past have not been implemented in the newer versions.
ID: 3729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 432
Credit: 1,261,674
RAC: 37
Message 3836 - Posted: 27 Jul 2016, 19:31:34 UTC

THREE CMS-Tasks in one Computer are running with the change of app_config.xml

<app_config>
<project_max_concurrent>3</project_max_concurrent>
<app>
<name>ATLAS</name><max_concurrent>1</max_concurrent></app>
<app>
<name>ALICE</name><max_concurrent>1</max_concurrent></app>
<app>
<name>CMS</name><max_concurrent>3</max_concurrent></app>
<app>
<name>LHCb</name><max_concurrent>1</max_concurrent></app>
<app>
<name>Theory</name><max_concurrent>1</max_concurrent>
</app>
</app_config>
ID: 3836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : New Version v47.20


©2020 CERN