Message boards : Number crunching : issue of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1201 - Posted: 8 Oct 2015, 22:18:45 UTC - in response to Message 1200.  
Last modified: 8 Oct 2015, 23:17:14 UTC


Yeah, thanks, but now I've got a chance to wind down a bit after a hectic day, how did you get there? Dashboard is a bit like the old game of Colossal Cave Adventure: "a maze of twisty little passages, all alike!"


Indeed.

The first step of this has already been posted, but this is the whole thing...

1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days
period (at the top of the page, anything longer than 1 day works), navigate to Ivan Reid.
This gets you your (Ivan's) task list in a new window.

2. Click on the task name you want. You can select (for example) failed jobs here by
clicking instead on the red "failed" block. This (eventually) gets you the list of jobs, in
another new window.

3. Navigate to the job you want - you need to know the number. The search boxes
are a bit weird, they select subsets of the list incrementally after the first
character, so if you type a 4 digit job, type quickly so that it doesn't
start searching in between digits. Only the first digit will appear initially
but just type them all - it will work. You need the data view (default)

4. Click the + sign at the left of the line. This gets you the state of the job.

5. Click on the number of the attempt you want... you're done... in yet
another new window.

It's a lot easier than it looks from this, honest. In Windows it's easiest to set up a
short cut to the task list. If I pick the 1 day period (the default) at step 1 my browser
goes wild with continual refreshes. I don't know if it's requesting these from CERN;
it doesn't stop if I disconnect the network, so hopefully not.

[pedantMode]Oh yes, it's got columns labelled "retries", surely they're "tries"[/pedantMode]

Edit:-
Different browsers may behave differently, this is for Firefox and NoScript with
as many bells and whistles turned off as possible.

There are lots of useful looking things here, some work but some don't. The logs just time
out for me and some, but not all, of the help files need a CERN login... very helpful.
ID: 1201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 17 Aug 15
Posts: 62
Credit: 296,695
RAC: 0
Message 1202 - Posted: 8 Oct 2015, 22:46:22 UTC

I've played Adventure on an ONYX computer with UNIX in the Eighties. Then I found it described by Tracy Kidder in his book "The soul of a new machine' which I translated for Mondadori in 1983.
Tullio
ID: 1202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1204 - Posted: 9 Oct 2015, 9:53:49 UTC - in response to Message 1201.  

1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days
period (at the top of the page, anything longer than 1 day works), navigate to Ivan Reid.
This gets you your (Ivan's) task list in a new window.

Very interesting site, I'm wondering what does it mean that 25% of the Jobs are Status "Unknown"

ID: 1204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1205 - Posted: 9 Oct 2015, 10:36:23 UTC - in response to Message 1204.  
Last modified: 9 Oct 2015, 10:45:47 UTC

Very interesting site, I'm wondering what does it mean that 25% of the Jobs are Status "Unknown"


I think that most "unknowns" haven't been started yet. You can compare the state of a given job at your end with that in dashboard to work out how it goes.

There's an amazing amount of stuff there. Most of the CMS things work.
[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant]
Better than interesting, it looks like it could be useful. e.g When there are not too many, you can
check the job details of the failures and make sure that your IPs aren't there.
It would be nice to be able to sort out worker IPs without having to go to each job separately; any ideas... any experts out there? There's probably a CLI somewhere. I don't want to poke around too much lest CERN IT or someone thinks
that further restricting public access is a good fix for something.
ID: 1205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1208 - Posted: 9 Oct 2015, 10:43:39 UTC - in response to Message 1205.  

[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant]

It's very easy: Each VM-Job is only ONE Atlas-Job, so, if your Atlas-Job is failing you see it here: (Couldn't find user m at Atlas, so I took my account):

http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=6&appid=
and

http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=5&appid=
ID: 1208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1210 - Posted: 9 Oct 2015, 10:58:30 UTC - in response to Message 1208.  

[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant]

It's very easy: Each VM-Job is only ONE Atlas-Job, so, if your Atlas-Job is failing you see it here:

Yes, I got carried away a bit, there. It's just that ATLAS seem a bit less open
than T4T or CMS. Perhaps we've been spoilt.
(Couldn't find user m at Atlas, so I took my account):
Just click the m.
ID: 1210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1211 - Posted: 9 Oct 2015, 11:02:44 UTC

Back to CMS here, I will send you a PM
ID: 1211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1212 - Posted: 9 Oct 2015, 14:18:33 UTC - in response to Message 1201.  


Yeah, thanks, but now I've got a chance to wind down a bit after a hectic day, how did you get there? Dashboard is a bit like the old game of Colossal Cave Adventure: "a maze of twisty little passages, all alike!"
1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days
period (at the top of the page, anything longer than 1 day works), navigate to Ivan Reid.
This gets you your (Ivan's) task list in a new window.

Actually, 7 days mightn't work now, since the batch is over a week old, but you can use the Filter at the top of the tasks page to change the period from some presets, or specify start and end dates.

5. Click on the number of the attempt you want... you're done... in yet
another new window.

Ah, that was the bit I was missing, I didn't realise that was clickable -- typical Dashboard!

Thanks.
ID: 1212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1213 - Posted: 9 Oct 2015, 14:35:08 UTC - in response to Message 1212.  
Last modified: 9 Oct 2015, 14:48:01 UTC


Actually, 7 days mightn't work now, since the batch is over a week old, but you can use the Filter at the top of the tasks page to change the period from some presets, or specify start and end dates.


You're right, just edit the bookmark/shortcut to show the last two weeks. Fifteen entries is about all I've got space for on the screen... why do designers tend to assume their creations will be viewed on cinema size displays???


5. Click on the number of the attempt you want... you're done... in yet
another new window.


Ah, that was the bit I was missing, I didn't realise that was clickable -- typical Dashboard!


That should show up as a link in your browser, the ones that don't are the coloured blobs for the job categories (unknown,success,failed, whatever)
these are useful because you can quickly see the failed jobs.
ID: 1213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1214 - Posted: 9 Oct 2015, 14:45:22 UTC - in response to Message 1211.  

OK, thanks, Yeti.
ID: 1214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1215 - Posted: 10 Oct 2015, 10:34:28 UTC
Last modified: 10 Oct 2015, 10:39:56 UTC

Rather by accident I've come across another "phantom" job similar to
that described here.

CMS job 7398 was abandoned by me shutting down BOINC whilst it was in progress. On restart the machine then went on to run and finish job 7422.

Dashboard shows 7398 as finished with my IP. Unfortunately I have no details of the controlling BOINC task. I don't even remember which machine it was.
However the times shown in Dashboard, start 8/10 22.33.15 stop 8/10 23.26.35 don't correspond with the original aborted run of 7398 about 12 hours earlier.
Normally machines here start automatically and none were running at that time so it wasn't run on another machine at the same IP.
Dashboard shows correct times for job 7422.

With the benefit of not knowing how any of this works, it appears as if the job was rescheduled, and finished, on another machine ('cos whatever finished it, it wasn't mine)
but Dashboard wrote the details of this in the original job slot.

Curious indeed.
ID: 1215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1216 - Posted: 10 Oct 2015, 13:40:33 UTC - in response to Message 1215.  
Last modified: 10 Oct 2015, 14:06:37 UTC


With the benefit of not knowing how any of this works, it appears as if the job was rescheduled, and finished, on another machine ('cos whatever finished it, it wasn't mine)
but Dashboard wrote the details of this in the original job slot.

Curious indeed.


Maybe not so curious. Maybe by design. The "help file" contains:-

" In case when the job was resubmitted multiple times, clicking at the number at the "Submission Attempts" column allows you to see all resubmissions corresponding to a given job
inside the task. Here we are referring not to the resubmissions triggered by resource broker, but to resubmissions done by the user. "

The user, after all, neither knows nor cares where the job was run, only that it finished. The job details page for a "retried" job seems to show where it was initially sent (Worker node IP)
rather than the one that actually finished it, although many seem to be unknown. A little light reading of the Tutorial is needed maybe...
ID: 1216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1219 - Posted: 10 Oct 2015, 23:07:09 UTC - in response to Message 1216.  

Sounds plausible.
ID: 1219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1222 - Posted: 11 Oct 2015, 22:33:37 UTC

I have monitored the end of a cms task.
It was in the middle of a job (25 events)
At the end of the 24h task i did not see any network data upload for the running job.
Neither boinc.exe nor vboxheadless.exe transmitted any data at the end of the cms task.( apart from a handful of bytes, which i discarded)
Does that mean, that, whatever job is running, when the tasks ends, is not uploaded/used?
This seems very wasteful.
Please confirm or deny my findings.
ID: 1222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 1224 - Posted: 12 Oct 2015, 9:49:47 UTC

Requesting an X509 credential
Proxy error
Going to sleep for 1 hour
...
This seems to turn every minute
ID: 1224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 6,886
Message 1225 - Posted: 12 Oct 2015, 10:17:50 UTC - in response to Message 1224.  

Requesting an X509 credential
Proxy error
Going to sleep for 1 hour
...
This seems to turn every minute

They were told this over a month ago...
ID: 1225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1288 - Posted: 22 Oct 2015, 18:53:22 UTC - in response to Message 1189.  

I haven't found anything yet in the logs I can access to identify which user/machine it ran on.

The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

ps.If jobs could be sorted on this (WNIp) I could see all the ones I had done... or not. Very useful if only I knew how...

edit:-
5143 shows
StartedRunningTimeStamp 2015-10-07 01:13:57
FinishedTimeStamp 2015-10-07 01:37:40

Ah, I was pointed at something that does what you want, and seems to work from a machine where I haven't loaded any CMS credentials.
Go to the Dashboard Home Page and click on the NEW: Interactive view link. Drop down the "Filters" menu and fill in a suitable time interval, and put "T3_CH_Volunteer" in the "Site" box. Change any other option as you wish and click the "Submit" button. You should get a broad graph of job states.
Now change the tab selection to "Table" and you will get a broad table of job states. Click on the link for any of the numbers in the table, e.g. in the "All Fail" column and you get all the jobs that fall into that category. Note you can sort on these columns, including the IP address, by clicking on the up/down arrows at the top of each column.
ID: 1288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1296 - Posted: 22 Oct 2015, 21:49:44 UTC - in response to Message 1288.  
Last modified: 22 Oct 2015, 21:55:42 UTC

I haven't found anything yet in the logs I can access to identify which user/machine it ran on.

The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

ps.If jobs could be sorted on this (WNIp) I could see all the ones I had done... or not. Very useful if only I knew how...

edit:-
5143 shows
StartedRunningTimeStamp 2015-10-07 01:13:57
FinishedTimeStamp 2015-10-07 01:37:40

Ah, I was pointed at something that does what you want, and seems to work from a machine where I haven't loaded any CMS credentials.
Go to the Dashboard Home Page and click on the NEW: Interactive view link. Drop down the "Filters" menu and fill in a suitable time interval, and put "T3_CH_Volunteer" in the "Site" box. Change any other option as you wish and click the "Submit" button. You should get a broad graph of job states.
Now change the tab selection to "Table" and you will get a broad table of job states. Click on the link for any of the numbers in the table, e.g. in the "All Fail" column and you get all the jobs that fall into that category. Note you can sort on these columns, including the IP address, by clicking on the up/down arrows at the top of each column.


Thanks. Rather than scroll around the list, clicking on the IP (or any) column name (the actual text, not the arrows) you can enter in the search box the value you want to find. The search process is slow and the table is three times the width of my screen... awkward but it works. Now, how do I work out which machine was used? Thanks again.
ID: 1296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1297 - Posted: 22 Oct 2015, 22:18:16 UTC - in response to Message 1296.  

Thanks. Rather than scroll around the list, clicking on the IP (or any) column name (the actual text, not the arrows) you can enter in the search box the value you want to find. The search process is slow and the table is three times the width of my screen... awkward but it works. Now, how do I work out which machine was used? Thanks again.
OK, as you'll have noticed most machines are listed as localhost.localdomain (IIRC). Now there should be ways to set that in the VM, but whether the VM can pick up enough info from the host... Well I don't know (yet!). At the point that could be set, I believe we know the volunteer ID but of course that doesn't help, it's the same info as the external IP (unless he's using several NAT routers).
ID: 1297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1367 - Posted: 29 Oct 2015, 10:14:07 UTC

I have monitored the end of a cms task.
It was in the middle of a job
At the end of the 24h task i did not see any network data upload for the running job.
Neither boinc.exe nor vboxheadless.exe transmitted any data at the end of the cms task.( apart from a handful of bytes, which i discarded)
Does that mean, that, whatever job is running, when the tasks ends, is not uploaded/used?
This seems very wasteful.
Please confirm or deny my findings.

I have posted this before and did not get an answer.
ID: 1367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : issue of the day


©2024 CERN