Message boards : ATLAS Application : New Experimental ATLAS Application
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Zurlistuta [Puglia]

Send message
Joined: 15 Apr 16
Posts: 3
Credit: 8,855
RAC: 0
Message 2887 - Posted: 21 Apr 2016, 10:33:36 UTC

I've completed 3 tasks today but all of them have finished with the message app not supported shutting down. Attached the log of the longest one, the other two just run 5-10min with the same end message.
http://lhcathomedev.cern.ch/vLHCathome-dev/results.php?userid=374

2016-04-21 11:03:34 (6456): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2016-04-21 11:34:26 (6456): Status Report: Job Duration: '64800.000000'
2016-04-21 11:34:26 (6456): Status Report: Elapsed Time: '6005.172062'
2016-04-21 11:34:26 (6456): Status Report: CPU Time: '3787.392278'
2016-04-21 13:14:32 (6456): Status Report: Job Duration: '64800.000000'
2016-04-21 13:14:32 (6456): Status Report: Elapsed Time: '12011.862814'
2016-04-21 13:14:32 (6456): Status Report: CPU Time: '7062.929675'
2016-04-21 14:54:38 (6456): Status Report: Job Duration: '64800.000000'
2016-04-21 14:54:38 (6456): Status Report: Elapsed Time: '18017.661955'
2016-04-21 14:54:38 (6456): Status Report: CPU Time: '11542.560790'
2016-04-21 16:34:43 (6456): Status Report: Job Duration: '64800.000000'
2016-04-21 16:34:43 (6456): Status Report: Elapsed Time: '24022.470469'
2016-04-21 16:34:43 (6456): Status Report: CPU Time: '14407.690756'
2016-04-21 18:13:00 (6456): Guest Log: [ERROR] App is not supported. Shutting down!
2016-04-21 18:13:00 (6456): VM Completion File Detected.
2016-04-21 18:13:00 (6456): VM Completion Message: 0
ID: 2887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 2890 - Posted: 21 Apr 2016, 11:47:19 UTC - in response to Message 2887.  

[ERROR] App is not supported. Shutting down!


Ignore this message, I don't know why we get this message but it is cosmetic.
ID: 2890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2891 - Posted: 21 Apr 2016, 11:50:24 UTC - in response to Message 2890.  

Ignore this message, I don't know why we get this message but it is cosmetic.


Tasks shutting down after 10 min is not.
ID: 2891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 2892 - Posted: 21 Apr 2016, 12:29:01 UTC - in response to Message 2891.  

That is very true! We were out of jobs. Have just submitted some more. Hopefully we can automate this soon and have constant job pressure. The error message issue has been identified and a fix should be there in an hour when new tasks are started.
ID: 2892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2893 - Posted: 21 Apr 2016, 12:35:19 UTC

Thanks, Laurence.
Is there any way, we can see, if JOBS are available?
We can see boinc-tasks on the SSP.
ID: 2893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 2894 - Posted: 21 Apr 2016, 13:18:34 UTC - in response to Message 2893.  

Not for now. If the task finishes successfully after about 10 minutes with the message "Normal DAEMON_SHUTDOWN encountered", this suggests that we are out of jobs. However as the results are currently going to dev null, we should not put too many cycles into this. The only purpose here is to identify issues and improve the application.
ID: 2894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2982 - Posted: 23 Apr 2016, 21:10:45 UTC

Console F2 no output.
Console F3,F4,F5, (F6) have output.
However, no progress display and the info in F4 and F5 seem to contain no useful(real-time) information.

No running.log in "show Graphics"
ID: 2982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2987 - Posted: 24 Apr 2016, 7:30:37 UTC

I got this error:

stderr.log:

+ for i in '$ol'
+ '[' -f log.08060270._020799.job.log.tgz.1 ']'
+ flist='*.diag ./jobSmallFiles.tgz ./output.list ./output.list ./log.08060270._020799.job.log.1 ./log.08060270._020799.job.log.1 log.08060270._020799.job.log.tgz.1'
+ for i in '$ol'
+ '[' -f HITS.08060270._020799.pool.root.1 ']'
+ flist='*.diag ./jobSmallFiles.tgz ./output.list ./output.list ./log.08060270._020799.job.log.1 ./log.08060270._020799.job.log.1 log.08060270._020799.job.log.tgz.1 HITS.08060270._020799.pool.root.1'
+ tar czvf result.tar.gz lGcKDmwAJEon7jp7oou28CBqABFKDmABFKDmKJHKDmABFKDmFD4epn.diag ./jobSmallFiles.tgz ./output.list ./output.list ./log.08060270._020799.job.log.1 ./log.08060270._020799.job.log.1 log.08060270._020799.job.log.tgz.1 HITS.08060270._020799.pool.root.1
++ pwd
+ gfal-copy file:////var/lib/condor/execute/dir_24345/result.tar.gz https://data-bridge-test.cern.ch/myfed/atlas-boinc/output/3717416_ATLAS_result
gfal-copy error: 110 (Connection timed out) - DESTINATION OVERWRITE Connection timed out
ID: 2987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2988 - Posted: 24 Apr 2016, 8:25:05 UTC

Cannot start new task.
It is stuck at requesting credentials.
Server issue?
ID: 2988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 2989 - Posted: 24 Apr 2016, 8:42:24 UTC - in response to Message 2987.  

sever fellover
http://lhcathomedev.cern.ch/vLHCathome-dev/forum_thread.php?id=203&postid=2985#2985
ID: 2989 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2993 - Posted: 24 Apr 2016, 12:47:10 UTC
Last modified: 24 Apr 2016, 12:54:13 UTC

Console F1 bootscreen ok.
Credetials seem to work, but tasks shutting down after 7 min.

NO JOBS?

http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=156950
ID: 2993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 2995 - Posted: 24 Apr 2016, 13:38:16 UTC - in response to Message 2993.  

You are right! Out of jobs. More submitted.
ID: 2995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2999 - Posted: 24 Apr 2016, 14:37:38 UTC - in response to Message 2995.  

Seems to be working.
Thanks.
ID: 2999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3003 - Posted: 24 Apr 2016, 17:58:11 UTC

Jobs are quite large.
Upload size about 140MB.
ID: 3003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3004 - Posted: 25 Apr 2016, 7:26:32 UTC

Running job output should appear here.


Output at console F2 and "running.log"
ID: 3004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 3005 - Posted: 25 Apr 2016, 8:11:38 UTC - in response to Message 3003.  

Each output file is about 20MB. Where do you get the figure of 140MB?
ID: 3005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3007 - Posted: 25 Apr 2016, 10:10:04 UTC - in response to Message 3005.  
Last modified: 25 Apr 2016, 10:18:45 UTC

I waited, until a job was close to finish and monitored the transfer with process explorer.It took about 20 min on a 1Mbit/s upload.

EDIT: jobs run for about 3h on my machine.
ID: 3007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 3008 - Posted: 25 Apr 2016, 10:59:32 UTC - in response to Message 3007.  

At the end of the job, you should see in Console 5 (stderr.log) a gfal-copy command. After than has run it should show the bandwidth experienced.

Bandwidth: xxx
ID: 3008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 3009 - Posted: 25 Apr 2016, 11:28:19 UTC - in response to Message 3008.  

At the end of the job, you should see in Console 5 (stderr.log) a gfal-copy command. After than has run it should show the bandwidth experienced.

Bandwidth: xxx

I just happened to look whilst it was doing a gfal-copy, it did put up the headers for various columns to describe the copying but it then scrolled up so fast as it moved on to the next job/process I didn't get to read it. Can't find it in any of the logs yet.

How many jobs are being run at a time ?
ID: 3009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,211,392
RAC: 8,905
Message 3010 - Posted: 25 Apr 2016, 12:03:36 UTC - in response to Message 3009.  

I was looking at F5 with everything scrolling up fast !
F4 is still showing that the bandwidth was 510497 (23,866,170 bytes)

Not sure how many jobs it has done but has been running for over 5 hours now.
ID: 3010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : ATLAS Application : New Experimental ATLAS Application


©2024 CERN