Message boards :
Number crunching :
issue of the day
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
Author | Message |
---|---|
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
Indeed. The first step of this has already been posted, but this is the whole thing... 1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days period (at the top of the page, anything longer than 1 day works), navigate to Ivan Reid. This gets you your (Ivan's) task list in a new window. 2. Click on the task name you want. You can select (for example) failed jobs here by clicking instead on the red "failed" block. This (eventually) gets you the list of jobs, in another new window. 3. Navigate to the job you want - you need to know the number. The search boxes are a bit weird, they select subsets of the list incrementally after the first character, so if you type a 4 digit job, type quickly so that it doesn't start searching in between digits. Only the first digit will appear initially but just type them all - it will work. You need the data view (default) 4. Click the + sign at the left of the line. This gets you the state of the job. 5. Click on the number of the attempt you want... you're done... in yet another new window. It's a lot easier than it looks from this, honest. In Windows it's easiest to set up a short cut to the task list. If I pick the 1 day period (the default) at step 1 my browser goes wild with continual refreshes. I don't know if it's requesting these from CERN; it doesn't stop if I disconnect the network, so hopefully not. [pedantMode]Oh yes, it's got columns labelled "retries", surely they're "tries"[/pedantMode] Edit:- Different browsers may behave differently, this is for Firefox and NoScript with as many bells and whistles turned off as possible. There are lots of useful looking things here, some work but some don't. The logs just time out for me and some, but not all, of the help files need a CERN login... very helpful. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I've played Adventure on an ONYX computer with UNIX in the Eighties. Then I found it described by Tracy Kidder in his book "The soul of a new machine' which I translated for Mondadori in 1983. Tullio |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days Very interesting site, I'm wondering what does it mean that 25% of the Jobs are Status "Unknown" |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
Very interesting site, I'm wondering what does it mean that 25% of the Jobs are Status "Unknown" I think that most "unknowns" haven't been started yet. You can compare the state of a given job at your end with that in dashboard to work out how it goes. There's an amazing amount of stuff there. Most of the CMS things work. [rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant] Better than interesting, it looks like it could be useful. e.g When there are not too many, you can check the job details of the failures and make sure that your IPs aren't there. It would be nice to be able to sort out worker IPs without having to go to each job separately; any ideas... any experts out there? There's probably a CLI somewhere. I don't want to poke around too much lest CERN IT or someone thinks that further restricting public access is a good fix for something. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant] It's very easy: Each VM-Job is only ONE Atlas-Job, so, if your Atlas-Job is failing you see it here: (Couldn't find user m at Atlas, so I took my account): http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=6&appid= and http://atlasathome.cern.ch/results.php?userid=1735&offset=0&show_names=0&state=5&appid= |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
[rant] I wish it was as easy to find individual job details from ATLAS... I could be failing 90% of the jobs for all I know... maybe not...[/rant] Yes, I got carried away a bit, there. It's just that ATLAS seem a bit less open than T4T or CMS. Perhaps we've been spoilt. (Couldn't find user m at Atlas, so I took my account):Just click the m. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Back to CMS here, I will send you a PM |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 5 |
1. Starting here http://dashboard.cern.ch/, click CMS, Crab3 user summary, 7days Actually, 7 days mightn't work now, since the batch is over a week old, but you can use the Filter at the top of the tasks page to change the period from some presets, or specify start and end dates.
Ah, that was the bit I was missing, I didn't realise that was clickable -- typical Dashboard! Thanks. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
You're right, just edit the bookmark/shortcut to show the last two weeks. Fifteen entries is about all I've got space for on the screen... why do designers tend to assume their creations will be viewed on cinema size displays???
Ah, that was the bit I was missing, I didn't realise that was clickable -- typical Dashboard! That should show up as a link in your browser, the ones that don't are the coloured blobs for the job categories (unknown,success,failed, whatever) these are useful because you can quickly see the failed jobs. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
OK, thanks, Yeti. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
Rather by accident I've come across another "phantom" job similar to that described here. CMS job 7398 was abandoned by me shutting down BOINC whilst it was in progress. On restart the machine then went on to run and finish job 7422. Dashboard shows 7398 as finished with my IP. Unfortunately I have no details of the controlling BOINC task. I don't even remember which machine it was. However the times shown in Dashboard, start 8/10 22.33.15 stop 8/10 23.26.35 don't correspond with the original aborted run of 7398 about 12 hours earlier. Normally machines here start automatically and none were running at that time so it wasn't run on another machine at the same IP. Dashboard shows correct times for job 7422. With the benefit of not knowing how any of this works, it appears as if the job was rescheduled, and finished, on another machine ('cos whatever finished it, it wasn't mine) but Dashboard wrote the details of this in the original job slot. Curious indeed. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
Maybe not so curious. Maybe by design. The "help file" contains:- " In case when the job was resubmitted multiple times, clicking at the number at the "Submission Attempts" column allows you to see all resubmissions corresponding to a given job inside the task. Here we are referring not to the resubmissions triggered by resource broker, but to resubmissions done by the user. " The user, after all, neither knows nor cares where the job was run, only that it finished. The job details page for a "retried" job seems to show where it was initially sent (Worker node IP) rather than the one that actually finished it, although many seem to be unknown. A little light reading of the Tutorial is needed maybe... |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 5 |
Sounds plausible. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have monitored the end of a cms task. It was in the middle of a job (25 events) At the end of the 24h task i did not see any network data upload for the running job. Neither boinc.exe nor vboxheadless.exe transmitted any data at the end of the cms task.( apart from a handful of bytes, which i discarded) Does that mean, that, whatever job is running, when the tasks ends, is not uploaded/used? This seems very wasteful. Please confirm or deny my findings. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Requesting an X509 credential Proxy error Going to sleep for 1 hour ... This seems to turn every minute |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 67 |
Requesting an X509 credential They were told this over a month ago... |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 5 |
I haven't found anything yet in the logs I can access to identify which user/machine it ran on. Ah, I was pointed at something that does what you want, and seems to work from a machine where I haven't loaded any CMS credentials. Go to the Dashboard Home Page and click on the NEW: Interactive view link. Drop down the "Filters" menu and fill in a suitable time interval, and put "T3_CH_Volunteer" in the "Site" box. Change any other option as you wish and click the "Submit" button. You should get a broad graph of job states. Now change the tab selection to "Table" and you will get a broad table of job states. Click on the link for any of the numbers in the table, e.g. in the "All Fail" column and you get all the jobs that fall into that category. Note you can sort on these columns, including the IP address, by clicking on the up/down arrows at the top of each column. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
I haven't found anything yet in the logs I can access to identify which user/machine it ran on. Thanks. Rather than scroll around the list, clicking on the IP (or any) column name (the actual text, not the arrows) you can enter in the search box the value you want to find. The search process is slow and the table is three times the width of my screen... awkward but it works. Now, how do I work out which machine was used? Thanks again. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 5 |
Thanks. Rather than scroll around the list, clicking on the IP (or any) column name (the actual text, not the arrows) you can enter in the search box the value you want to find. The search process is slow and the table is three times the width of my screen... awkward but it works. Now, how do I work out which machine was used? Thanks again.OK, as you'll have noticed most machines are listed as localhost.localdomain (IIRC). Now there should be ways to set that in the VM, but whether the VM can pick up enough info from the host... Well I don't know (yet!). At the point that could be set, I believe we know the volunteer ID but of course that doesn't help, it's the same info as the external IP (unless he's using several NAT routers). |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have monitored the end of a cms task. It was in the middle of a job At the end of the 24h task i did not see any network data upload for the running job. Neither boinc.exe nor vboxheadless.exe transmitted any data at the end of the cms task.( apart from a handful of bytes, which i discarded) Does that mean, that, whatever job is running, when the tasks ends, is not uploaded/used? This seems very wasteful. Please confirm or deny my findings. I have posted this before and did not get an answer. |
©2025 CERN