Message boards : ATLAS Application : New Experimental ATLAS Application
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3011 - Posted: 25 Apr 2016, 12:21:55 UTC - in response to Message 3010.  

Console 4 (stdout) should now report when a job starts and stops. There should be two gfal-copy calls per job; one input and one output.
ID: 3011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3012 - Posted: 25 Apr 2016, 12:41:50 UTC - in response to Message 3011.  

I tested it for the third time now.
Console F4 shows "Bandwidth 117683"during the upload process.

Upload for the vboxheadless process: 128MB.
Upload was running at full speed for 18min (checked with task monitor)

IT IS UPLOADING 120-150MB!
You can say whatever you want, but it is happening.
ID: 3012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3013 - Posted: 25 Apr 2016, 13:41:41 UTC - in response to Message 3012.  

If we are observing something different, we should dig a little deeper.

The stdout and stderr Web logs (show graphics) are now appending for each job so should contain all information. If you search for gfal-copy in stderr you should see the file is being copied. Go to the URL below and search for that file.

http://data-bridge-test.cern.ch/myfed/atlas-boinc/output/

The size of that file should be displayed and you can download it to verify.
ID: 3013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 36
Message 3014 - Posted: 25 Apr 2016, 13:45:10 UTC

From stdout.log:

Copying 19992536 bytes file:////var/lib/condor/execute/dir_3618/result.tar.gz => https://data-bridge-test.cern.ch/myfed/atlas-boinc/output/3727707_ATLAS_result
Bandwidth: 1134948
ID: 3014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3015 - Posted: 25 Apr 2016, 13:48:59 UTC - in response to Message 3013.  
Last modified: 25 Apr 2016, 14:27:17 UTC

Thanks, i will.
One indication may be, that the upload was continuing for long AFTER the "Bandwidth xxxxxx" display showed on Console F4.

Once the upload completed it showed" Starting new task"
ID: 3015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 36
Message 3016 - Posted: 25 Apr 2016, 14:10:41 UTC - in response to Message 3015.  

Maybe you saw the transfer upload and directly thereafter the download for the new job.
The downloads are as far I have noticed 53.3MB for 1 job.
ID: 3016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3017 - Posted: 25 Apr 2016, 14:19:43 UTC - in response to Message 3016.  

Thanks for the tip.
It was JUST the upload.
Task manager showed max upload on the graph (1Mbit/s) for 18min.
ID: 3017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3019 - Posted: 25 Apr 2016, 18:38:17 UTC - in response to Message 3016.  
Last modified: 25 Apr 2016, 18:38:39 UTC

CP is right. Search for the curl command to see the input file and look for it here.

http://data-bridge-test.cern.ch/myfed/atlas-boinc/input/

So with just the input and output file there is ~75MB per job.
ID: 3019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3029 - Posted: 26 Apr 2016, 9:51:41 UTC

Task ending after 7 min.

Out of jobs-again?
ID: 3029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3033 - Posted: 26 Apr 2016, 11:23:29 UTC - in response to Message 3029.  

More jobs submitted.
ID: 3033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3034 - Posted: 26 Apr 2016, 11:27:43 UTC

Thanks, Laurence.
ID: 3034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3041 - Posted: 26 Apr 2016, 18:39:29 UTC
Last modified: 26 Apr 2016, 19:12:03 UTC

Thanks for adding the finished x.log files.
Now i can see, if a job passed or failed.


EDIT:Sorry, wrong thread.
ID: 3041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3043 - Posted: 26 Apr 2016, 20:56:46 UTC
Last modified: 26 Apr 2016, 21:04:42 UTC

I did some more testing.
The job finished (after about 4h50min)it started uploading.
The first 25 or so MB were transmitted to cephrgw10.cern.ch.
Then it continued transmitting to alicondorce01.cern.ch for another 110MB.
The main IP addresses were:188.184.129.127:9618
and
188.184.187.167:9618.

Why and what is it transmitting?

This is way to specific to be an accident.

EDIT: the job terminated shortly after the end of the upload.

http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=159615
ID: 3043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3044 - Posted: 26 Apr 2016, 21:30:27 UTC - in response to Message 3043.  
Last modified: 26 Apr 2016, 21:31:20 UTC


Then it continued transmitting to alicondorce01.cern.ch for another 110MB.


That's not good. Shouldn't be doing that. It looks like it is transferring the whole scratch directory back. Will investigate tomorrow.
ID: 3044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3051 - Posted: 27 Apr 2016, 14:24:22 UTC

That's not good. Shouldn't be doing that. It looks like it is transferring the whole scratch directory back. Will investigate tomorrow.


Any progess?
ID: 3051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3055 - Posted: 27 Apr 2016, 15:18:53 UTC - in response to Message 3051.  

Sorry, got a little sidetracked with the day job as Ivan would say :)

There was only one running job left in the queue so submitted some more. These should not transfer the output back to alicondorxxx.
ID: 3055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3061 - Posted: 27 Apr 2016, 20:06:05 UTC

These should not transfer the output back to alicondorxxx.


Not working. Upload size still >100MB and to the above.
ID: 3061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3062 - Posted: 27 Apr 2016, 20:54:46 UTC - in response to Message 3061.  

Try again. I hope you don't have a data cap from your ISP!
ID: 3062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 6 Mar 15
Posts: 19
Credit: 142,109
RAC: 0
Message 3093 - Posted: 29 Apr 2016, 3:39:03 UTC

Estimated duration ~75 hours
Actual duration ~ 3 minutes

Tasks say Running High Priority but no elapsed time and stuck at 0% for a few minutes as a minimum.
ID: 3093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 11 Mar 16
Posts: 23
Credit: 68,680
RAC: 0
Message 3854 - Posted: 29 Jul 2016, 9:56:43 UTC

Should we expect the resumption of the ATLAS application, or the application will be tested on your own test server?
ID: 3854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : ATLAS Application : New Experimental ATLAS Application


©2024 CERN