Message boards : CMS Application : New Refactored Version (47.01)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2857 - Posted: 19 Apr 2016, 15:10:43 UTC

A new refactored version (47.01) of the CMS application is now available. It uses the same approach as the Theory application and hence the glidein is no longer used. There is a high chance that a few things like the consoles and Web logs are broken but hopefully it will in general work. Please post issues to this thread. Note that this update is only for the development project, the beta application in the production server will remain with the old version.
ID: 2857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,537
RAC: 143
Message 2858 - Posted: 19 Apr 2016, 15:22:30 UTC - in response to Message 2857.  
Last modified: 19 Apr 2016, 15:36:45 UTC

"Giving up on download of CMS_2016_04_19.vdi: permanent HTTP error"

[/Edit]
From Task report:
{core_client_version}7.2.33{/core_client_version}
{![CDATA[
{message}
app_version download error: couldn't get input files:
{file_xfer_error}
{file_name}CMS_2016_04_19.vdi{/file_name}
{error_code}-224{/error_code}
{error_message}permanent HTTP error{/error_message}
{/file_xfer_error}

{/message}

[/Edit]
ID: 2858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2860 - Posted: 19 Apr 2016, 16:02:23 UTC - in response to Message 2858.  

Try again.
ID: 2860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,537
RAC: 143
Message 2861 - Posted: 19 Apr 2016, 16:21:45 UTC - in response to Message 2860.  
Last modified: 19 Apr 2016, 16:28:09 UTC

Try again.

Download worked. Thanks.
Just suspended the current jobs and it started running, but it stopped after 42" with "Waiting to run (Scheduler wait: VM environment needed to be cleaned up.)"
[Edit] I stopped BOINC and restarted it. The task has now been running for over 4 minutes but there's no sign of console or log buttons yet. [/Edit]
ID: 2861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,537
RAC: 143
Message 2862 - Posted: 19 Apr 2016, 16:39:05 UTC - in response to Message 2861.  

ID: 2862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,537
RAC: 143
Message 2863 - Posted: 19 Apr 2016, 16:56:24 UTC - in response to Message 2862.  

It failed after 10 minutes. Job report is http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=149364

OK, aborted all the old jobs, downloaded more and one started (as per my app_config file). Now there are console and log buttons. It started up remarkably quickly, and Alt-F3 shows cmsRun running. No Alt-F4 or Alt-F5 consoles, and just the Condor logs on the Web pages.
This appears to be my job in the condor_status display on the server:
9-22-24270.9-22-24 LINUX X86_64 Claimed Busy 1.350 4500 0+00:09:51
ID: 2863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2864 - Posted: 19 Apr 2016, 17:21:00 UTC - in response to Message 2863.  

Will fix the logging tomorrow. It would be interesting to see how long it now takes for people. I know some have done measurements before. As far as I recall the start time was around 8 minutes.
ID: 2864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keiken

Send message
Joined: 2 Jul 15
Posts: 15
Credit: 140,962
RAC: 0
Message 2865 - Posted: 19 Apr 2016, 20:53:23 UTC

Just noticed the following line in the MasterLog and StartLog:
PERMISSION DENIED to condor@261-554-14495 from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15

After 10 minutes nothing has happened (no output to the logs). Is this an error?
ID: 2865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2866 - Posted: 19 Apr 2016, 21:10:38 UTC - in response to Message 2865.  

It is an error but should not cause any problems. The logs still need to be plugged into the consoles so the best thing to do for now is to look at the output of the top command. I haven't seen any significant drop in the running capacity so it looks like this is working for everyone.
ID: 2866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,537
RAC: 143
Message 2867 - Posted: 19 Apr 2016, 21:31:58 UTC

Looks like we have 42 out of 312 converts so far:
Tue Apr 19 22:25:49
[cms005@lcggwms02:~] > condor_status|grep -v glidein|grep LINUX|wc
42 335 3353
Tue Apr 19 22:26:00
[cms005@lcggwms02:~] > condor_status|grep glidein|grep LINUX|wc
270 2160 21600


Are you one of them?
198-288-8135.198-2 LINUX X86_64 Claimed Busy 1.050 4500 0+00:16:28
217-735-15116.217- LINUX X86_64 Claimed Busy 1.070 4500 0+00:52:47
217-735-20188.217- LINUX X86_64 Claimed Busy 1.030 4500 0+00:52:35
217-735-22179.217- LINUX X86_64 Claimed Busy 1.080 4500 0+00:41:22
217-735-632.217-73 LINUX X86_64 Claimed Busy 0.980 4500 0+00:52:41
244-439-11664.244- LINUX X86_64 Claimed Busy 1.070 4500 0+01:10:40
244-439-13224.244- LINUX X86_64 Claimed Busy 1.060 4500 0+00:37:46
244-439-26870.244- LINUX X86_64 Claimed Busy 1.150 4500 0+00:55:26
244-439-30920.244- LINUX X86_64 Claimed Busy 1.210 4500 0+00:20:44
244-439-6552.244-4 LINUX X86_64 Claimed Busy 0.970 4500 0+01:42:47
244-440-31443.244- LINUX X86_64 Claimed Busy 1.140 4500 0+00:33:35
244-440-8273.244-4 LINUX X86_64 Claimed Busy 1.070 4500 0+00:08:23
244-442-2227.244-4 LINUX X86_64 Claimed Busy 1.310 4500 0+01:04:43
244-442-23758.244- LINUX X86_64 Claimed Busy 1.240 4500 0+01:03:30
244-443-15714.244- LINUX X86_64 Claimed Busy 1.050 4500 0+01:24:04
244-443-22958.244- LINUX X86_64 Claimed Busy 1.190 4500 0+00:29:14
244-443-23209.244- LINUX X86_64 Claimed Busy 1.040 4500 0+00:12:43
244-443-29506.244- LINUX X86_64 Claimed Busy 1.000 4500 0+00:00:14
244-444-30450.244- LINUX X86_64 Claimed Busy 1.030 4500 0+01:51:03
244-444-31073.244- LINUX X86_64 Claimed Busy 1.200 4500 0+01:32:24
244-444-8816.244-4 LINUX X86_64 Claimed Busy 1.170 4500 0+00:27:09
244-446-32076.244- LINUX X86_64 Claimed Busy 1.530 4500 0+00:57:50
244-446-6442.244-4 LINUX X86_64 Claimed Busy 1.260 4500 0+00:40:38
244-455-31729.244- LINUX X86_64 Claimed Busy 0.170 4500 0+00:02:25
244-458-7846.244-4 LINUX X86_64 Claimed Busy 1.220 4500 0+00:58:26
244-497-2229.244-4 LINUX X86_64 Claimed Busy 1.010 4500 0+01:28:24
244-497-25124.244- LINUX X86_64 Claimed Busy 1.460 4500 0+01:25:40
275-614-1073.275-6 LINUX X86_64 Claimed Busy 1.380 4500 0+00:30:33
275-614-10834.275- LINUX X86_64 Claimed Busy 1.330 4500 0+00:30:36
275-614-16843.275- LINUX X86_64 Claimed Busy 1.270 4500 0+00:38:30
275-614-30274.275- LINUX X86_64 Claimed Busy 1.090 4500 0+00:25:36
282-625-23922.282- LINUX X86_64 Claimed Busy 1.080 4500 0+00:40:14
320-780-23078.320- LINUX X86_64 Claimed Retiring 1.140 4500 0+00:01:34
320-780-27662.320- LINUX X86_64 Claimed Retiring 1.130 4500 0+00:01:34
320-780-4714.320-7 LINUX X86_64 Claimed Retiring 0.980 4500 0+00:01:34
35-836-20182.35-83 LINUX X86_64 Claimed Busy 1.640 4500 0+00:36:31
361-1065-27903.361 LINUX X86_64 Claimed Busy 1.180 4500 0+00:11:51
55-767-16738.55-76 LINUX X86_64 Claimed Busy 0.890 4500 0+00:08:11
9-22-24038.9-22-24 LINUX X86_64 Claimed Busy 1.940 4500 0+00:17:20
9-22-24270.9-22-24 LINUX X86_64 Claimed Retiring 1.070 4500 0+00:00:43
vc-cms-dev-03.cern LINUX X86_64 Claimed Busy 1.140 4500 0+00:49:04

ID: 2867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 300
Message 2869 - Posted: 20 Apr 2016, 2:29:46 UTC

I let the versions change under their own steam. cmsRun started at 18mins but it took until 29mins to start using significant CPU. If anything, slower than previous version.

This error appeared here, too:-

04/20/16 02:50:40 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 3528
04/20/16 02:53:28 PERMISSION DENIED to condor@178-1024-9302 from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15


Since cmsRun is using 80-90% CPU things are, presumably, running OK.
ID: 2869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 2870 - Posted: 20 Apr 2016, 5:47:12 UTC - in response to Message 2864.  
Last modified: 20 Apr 2016, 5:56:36 UTC

It would be interesting to see how long it now takes for people. I know some have done measurements before. As far as I recall the start time was around 8 minutes.

I don't see any difference to the former version, so I suppose I'm still running the old one.
Because of the flood I still have old tasks in queue and will abort the ones not yet started.

That gives me the chance to monitor the startup with the old version once again and compare with the new version when downloaded later today.
Edit: Meanwhile I've the new version and 2 tasks "Ready to start" - Breakfast first.

Boot.log: Wed Apr 20 07:14:10 2016: Setting hostname localhost:
Begin processing the 1st record: at 20-Apr-2016 07:20:20.474 CEST
ID: 2870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 2871 - Posted: 20 Apr 2016, 6:45:32 UTC - in response to Message 2870.  
Last modified: 20 Apr 2016, 6:50:10 UTC

That gives me the chance to monitor the startup with the old version once again and compare with the new version when downloaded later today.

With the new version, cmsRun started between 2 and 3 minutes after the VM booted.
The only log available is MasterLog. I got some red messages from any Log through the 'top' window.
The user 'boinc' disappeared (like in the Theory app) and user 'nobody' appeared.
The 'top' console is no longer accessible for commands like 'u', 'h', 'b' etc.
ID: 2871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2875 - Posted: 20 Apr 2016, 11:43:20 UTC

The 'top' console is no longer accessible for commands like 'u', 'h', 'b' etc.

I also have to ask, to put this feature back,please.

Startup time from boinc reporting the task being started to CMS-run process reaching "steady-state" (cpu>90% on cmsRun)6 min (10Mbit down/1Mbit up)

No job progress displayed (neigher in console nor trough "show graphics").
ID: 2875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2876 - Posted: 20 Apr 2016, 15:00:36 UTC
Last modified: 20 Apr 2016, 15:01:04 UTC

The red text is output over the top console window.
Please redirect to the correct window.
ID: 2876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2878 - Posted: 20 Apr 2016, 19:36:08 UTC - in response to Message 2875.  

Have just pushed an update. Please check the logs and top command in about 1 hour.
ID: 2878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2879 - Posted: 21 Apr 2016, 7:44:56 UTC
Last modified: 21 Apr 2016, 7:47:24 UTC

Console F4 and F5 have an output, now.

F5 shows: accessing /var /lib/...../cmsRun-stdout.log :no such file or directory
ID: 2879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 2880 - Posted: 21 Apr 2016, 7:47:13 UTC - in response to Message 2878.  

Have just pushed an update. Please check the logs and top command in about 1 hour.

Some improvement is done.
'top' commands are accepted again. Thanks!
Colours for consoles 4 and 5 badly readable.
Console 5 (events processing etc.) can't access the data, cause the file is not there, although directory structure for jobs seems to be created.
However we can find now 'somewhere' which job is running.
ID: 2880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2897 - Posted: 21 Apr 2016, 13:40:41 UTC - in response to Message 2880.  

Fixes published. Will be available for new tasks in about one hour.
ID: 2897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 2913 - Posted: 21 Apr 2016, 20:19:00 UTC

Task is running now 13 hours and 16 minutes.
After last job has finished, the VM is idle for longer than 50 minutes.
No 'nobody' processes present for mentioned time.

I expect killing mechanism jumping in when idle for so long time.

The last 4 lines were added to the console after the last job was finished.
ID: 2913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : New Refactored Version (47.01)


©2024 CERN