Message boards : Number crunching : issue of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 331,462
RAC: 146
Message 855 - Posted: 26 Aug 2015, 20:32:21 UTC - in response to Message 854.  

I think we are out of jobs again.
ID: 855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 856 - Posted: 26 Aug 2015, 20:41:39 UTC - in response to Message 855.  

Thanks.
It would be nice to post a message, when such things occur.
That is not too much to ask,is it?
I like to help, but i also do not want to waste my recources for no reason.
ID: 856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 857 - Posted: 27 Aug 2015, 0:13:54 UTC - in response to Message 856.  

Sorry, I thought I'd made it clear. We're short of jobs at the moment. I can submit new ones but we're chasing a bug where both the Condor server and the Dashboard reporting service don't get the message that a job has successfully completed and written its output to storage. So each job gets run three times. Which is a waste of your computers. We've made progress on other problems yesterday, but this one is eluding us; I'm running several short jobs from time to time to get logs for Laurence et al., but I don't have the expertise or the access that they have to attack the problem.
I could, and possibly may, just run a large batch of jobs and ignore the fact that most jobs are run three times too many -- you guys get the same credit one way or the other, but we do need to get this sorted before we go anywhere near a beta release.
ID: 857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 889 - Posted: 28 Aug 2015, 8:58:59 UTC

Strange directory usage!
First, it generates run-1, run-2, etc.(no work at that time)
Then, at run-33 i got some jobs.At some point during run-33, it decided to generate additional logs in run-1 again, up to run-30. Then it started to put new logs into run-1 again and is currently up run-7.If there is any logic to this, it escapes me.
I hope, there is useful info in the results.
ID: 889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 890 - Posted: 28 Aug 2015, 9:05:23 UTC - in response to Message 889.  

Yeah, job availability is sporadic at the moment. The generation of new log directories every 15(?) minutes is a consequence of there being no work; looks like a timeout we could probably adjust. Just waiting to hear from the development team on their conclusions from last night's mini-blitz; I'm very tempted to just go for a large-scale test over the weekend (it's a three-day holiday weekend here).
ID: 890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 891 - Posted: 28 Aug 2015, 9:23:50 UTC - in response to Message 890.  

I understand, that there are only a few jobs. If this messy directory structure is not a problem---fine with me.
I just thought, i mention it.
ID: 891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 6,886
Message 892 - Posted: 28 Aug 2015, 9:49:17 UTC - in response to Message 891.  

The log does say it is "Resetting run number to rotate logs" so goes from 1 to 30 now.
ID: 892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 901 - Posted: 28 Aug 2015, 13:41:29 UTC

I just killed a bunch of jobs to submit some new ideas from the team. This will probably show some weirdness in your VMs but as I just said to Ben, "DON'T PANIC!"
ID: 901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 850,198
RAC: 282
Message 1186 - Posted: 6 Oct 2015, 16:35:19 UTC

I paused the VM just after the start of JobNumber 5143 and resumed the VM 2 hours later.
A new job (5250) started and 5143 never returned, also no error.
In CMS-Dashboard 5143 is still in running state and probably will time out after 24 hours.

cmsRun-stdout.log of job 5143:

Beginning CMSSW wrapper script
slc6_amd64_gcc472 scramv1 CMSSW
Performing SCRAM setup...
Completed SCRAM setup
Retrieving SCRAM project...
Untarring /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14417/sandbox.tar.gz
Completed SCRAM project
Executing CMSSW
cmsRun -j FrameworkJobReport.xml PSet.py



From StarterLog:

10/06/15 15:09:49 (pid:14417) Using wrapper /home/boinc/CMSRun/glide_hY3D6I/condor_job_wrapper.sh to exec /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14417/condor_exec.exe -a sandbox.tar.gz --sourceURL=https://cmsweb-testbed.cern.ch/crabcache --jobNumber=5143 --cmsswVersion=CMSSW_6_2_0_SLHC26_patch3 --scramArch=slc6_amd64_gcc472 --inputFile=job_input_file_list_5143.txt --runAndLumis=job_lumis_5143.json --lheInputFiles=False --firstEvent=128551 --firstLumi=5143 --lastEvent=128576 --firstRun=1 --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=100 --scriptArgs=[] -o {}
10/06/15 15:09:49 (pid:14417) Running job as user (null)
10/06/15 15:09:49 (pid:14417) Create_Process succeeded, pid=14424
10/06/15 17:15:43 (pid:14777) FILETRANSFER: "/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/06/15 17:15:43 (pid:14777) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/06/15 17:15:43 (pid:14777) WARNING: Initializing plugins returned: FILETRANSFER:1:"/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/06/15 17:16:08 (pid:14796) ******************************************************
10/06/15 17:16:08 (pid:14796) ** condor_starter (CONDOR_STARTER) STARTING UP
10/06/15 17:16:08 (pid:14796) ** /home/boinc/CMSRun/glide_hY3D6I/main/condor/sbin/condor_starter
10/06/15 17:16:08 (pid:14796) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
10/06/15 17:16:08 (pid:14796) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
10/06/15 17:16:08 (pid:14796) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
10/06/15 17:16:08 (pid:14796) ** $CondorPlatform: x86_64_RedHat5 $
10/06/15 17:16:08 (pid:14796) ** PID = 14796
10/06/15 17:16:08 (pid:14796) ** Log last touched 10/6 17:15:43
10/06/15 17:16:08 (pid:14796) ******************************************************
10/06/15 17:16:08 (pid:14796) Using config source: /home/boinc/CMSRun/glide_hY3D6I/condor_config
10/06/15 17:16:08 (pid:14796) config Macros = 212, Sorted = 212, StringBytes = 10694, TablesBytes = 7672
10/06/15 17:16:08 (pid:14796) CLASSAD_CACHING is OFF
10/06/15 17:16:08 (pid:14796) Daemon Log is logging: D_ALWAYS D_ERROR
10/06/15 17:16:08 (pid:14796) DaemonCore: command socket at <10.0.2.15:58763?noUDP>
10/06/15 17:16:08 (pid:14796) DaemonCore: private command socket at <10.0.2.15:58763>
10/06/15 17:16:09 (pid:14796) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9620 as ccbid 130.246.180.120:9620#89568
10/06/15 17:16:09 (pid:14796) Communicating with shadow <130.246.180.120:9818?noUDP&sock=20016_d29f_56892>
10/06/15 17:16:09 (pid:14796) Submitting machine is "lcggwms02.gridpp.rl.ac.uk"
10/06/15 17:16:09 (pid:14796) setting the orig job name in starter
10/06/15 17:16:09 (pid:14796) setting the orig job iwd in starter
10/06/15 17:16:09 (pid:14796) Chirp config summary: IO false, Updates false, Delayed updates true.
10/06/15 17:16:09 (pid:14796) Initialized IO Proxy.
10/06/15 17:16:09 (pid:14796) Done setting resource limits
10/06/15 17:16:09 (pid:14796) FILETRANSFER: "/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/06/15 17:16:09 (pid:14796) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_hY3D6I/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/06/15 17:16:10 (pid:14796) File transfer completed successfully.
10/06/15 17:16:10 (pid:14796) Job 138242.0 set to execute immediately
10/06/15 17:16:10 (pid:14796) Starting a VANILLA universe job with ID: 138242.0
10/06/15 17:16:10 (pid:14796) IWD: /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14796
10/06/15 17:16:10 (pid:14796) Output file: /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14796/_condor_stdout
10/06/15 17:16:10 (pid:14796) Error file: /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14796/_condor_stderr
10/06/15 17:16:11 (pid:14796) Renice expr "0" evaluated to 0
10/06/15 17:16:11 (pid:14796) Using wrapper /home/boinc/CMSRun/glide_hY3D6I/condor_job_wrapper.sh to exec /home/boinc/CMSRun/glide_hY3D6I/execute/dir_14796/condor_exec.exe -a sandbox.tar.gz --sourceURL=https://cmsweb-testbed.cern.ch/crabcache --jobNumber=5250 --cmsswVersion=CMSSW_6_2_0_SLHC26_patch3 --scramArch=slc6_amd64_gcc472 --inputFile=job_input_file_list_5250.txt --runAndLumis=job_lumis_5250.json --lheInputFiles=False --firstEvent=131226 --firstLumi=5250 --lastEvent=131251 --firstRun=1 --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=100 --scriptArgs=[] -o {}
10/06/15 17:16:11 (pid:14796) Running job as user (null)
10/06/15 17:16:11 (pid:14796) Create_Process succeeded, pid=14800
ID: 1186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1188 - Posted: 7 Oct 2015, 9:15:09 UTC - in response to Message 1186.  

Curious. 5143 appears to have finished (its output is on the DataBridge and its stdout is complete). The Condor node_state.txt doesn't show it was resubmitted:
[
Type = "NodeStatus";
Node = "Job5143";
NodeStatus = 5; /* "STATUS_DONE" */
StatusDetails = "";
RetryCount = 0;
JobProcsQueued = 0;
JobProcsHeld = 0;
]

Its successful incarnation started at 0114 UTC:
gWMS-CMSRunAnalysis.sh STARTING at Wed Oct 7 01:13:55 GMT 2015 on 246-563-16291
and finished at 0145:
gWMS-CMSRunAnalysis.sh FINISHING at Wed Oct 7 01:44:52 GMT 2015 on 246-563-16291 with (short) status 0

I haven't found anything yet in the logs I can access to identify which user/machine it ran on.
ID: 1188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1189 - Posted: 7 Oct 2015, 12:07:09 UTC - in response to Message 1188.  
Last modified: 7 Oct 2015, 12:40:41 UTC



I haven't found anything yet in the logs I can access to identify which user/machine it ran on.


The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

ps.If jobs could be sorted on this (WNIp) I could see all the ones I had done... or not. Very useful if only I knew how...

edit:-
5143 shows
StartedRunningTimeStamp 2015-10-07 01:13:57
FinishedTimeStamp 2015-10-07 01:37:40
ID: 1189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1190 - Posted: 7 Oct 2015, 15:01:35 UTC - in response to Message 1189.  

The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

Which page is that exactly?
ID: 1190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1191 - Posted: 7 Oct 2015, 15:03:58 UTC
Last modified: 7 Oct 2015, 15:28:36 UTC

Once or twice during a 24h task i get an empty dir xxxx.
Condor does not seem to start.

Starterlog
10/07/15 13:53:59 (pid:30125) Running job as user (null)
10/07/15 13:53:59 (pid:30125) Create_Process succeeded, pid=30131
10/07/15 15:24:56 (pid:30125) Process exited, pid=30131, status=0
10/07/15 15:24:58 (pid:30125) Got SIGQUIT. Performing fast shutdown.
10/07/15 15:24:58 (pid:30125) ShutdownFast all jobs.
10/07/15 15:24:58 (pid:30125) **** condor_starter (condor_STARTER) pid 30125 EXITING WITH STATUS 0
10/07/15 15:24:59 (pid:1773) ******************************************************
10/07/15 15:24:59 (pid:1773) ** condor_starter (CONDOR_STARTER) STARTING UP
10/07/15 15:24:59 (pid:1773) ** /home/boinc/CMSRun/glide_HKmyZt/main/condor/sbin/condor_starter
10/07/15 15:24:59 (pid:1773) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
10/07/15 15:24:59 (pid:1773) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
10/07/15 15:24:59 (pid:1773) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
10/07/15 15:24:59 (pid:1773) ** $CondorPlatform: x86_64_RedHat5 $
10/07/15 15:24:59 (pid:1773) ** PID = 1773
10/07/15 15:24:59 (pid:1773) ** Log last touched 10/7 15:24:58
10/07/15 15:24:59 (pid:1773) ******************************************************
10/07/15 15:24:59 (pid:1773) Using config source: /home/boinc/CMSRun/glide_HKmyZt/condor_config
10/07/15 15:24:59 (pid:1773) config Macros = 212, Sorted = 212, StringBytes = 10697, TablesBytes = 7672
10/07/15 15:24:59 (pid:1773) CLASSAD_CACHING is OFF
10/07/15 15:24:59 (pid:1773) Daemon Log is logging: D_ALWAYS D_ERROR
10/07/15 15:24:59 (pid:1773) DaemonCore: command socket at <10.0.2.15:40973?noUDP>
10/07/15 15:24:59 (pid:1773) DaemonCore: private command socket at <10.0.2.15:40973>
10/07/15 15:25:00 (pid:1773) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9621 as ccbid 130.246.180.120:9621#83955
10/07/15 15:25:00 (pid:1773) Communicating with shadow <130.246.180.120:9818?noUDP&sock=20016_d29f_57837>
10/07/15 15:25:00 (pid:1773) Submitting machine is "lcggwms02.gridpp.rl.ac.uk"
10/07/15 15:25:01 (pid:1773) setting the orig job name in starter
10/07/15 15:25:01 (pid:1773) setting the orig job iwd in starter
10/07/15 15:25:01 (pid:1773) Chirp config summary: IO false, Updates false, Delayed updates true.
10/07/15 15:25:01 (pid:1773) Initialized IO Proxy.
10/07/15 15:25:01 (pid:1773) Done setting resource limits
10/07/15 15:25:01 (pid:1773) FILETRANSFER: "/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/07/15 15:25:01 (pid:1773) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/07/15 15:25:03 (pid:1773) Got SIGTERM. Performing graceful shutdown.
10/07/15 15:25:03 (pid:1773) ShutdownGraceful all jobs.
10/07/15 15:25:04 (pid:1773) ERROR "FileTransfer::UpLoadFiles called during active transfer!
" at line 1159 in file /slots/12/dir_4417/userdir/src/condor_utils
/file_transfer.cpp
10/07/15 15:25:04 (pid:1773) ShutdownFast all jobs.
10/07/15 15:25:05 (pid:1802) ******************************************************
10/07/15 15:25:05 (pid:1802) ** condor_starter (CONDOR_STARTER) STARTING UP
10/07/15 15:25:05 (pid:1802) ** /home/boinc/CMSRun/glide_HKmyZt/main/condor/sbin/condor_starter
10/07/15 15:25:05 (pid:1802) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
10/07/15 15:25:05 (pid:1802) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
10/07/15 15:25:05 (pid:1802) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
10/07/15 15:25:05 (pid:1802) ** $CondorPlatform: x86_64_RedHat5 $
10/07/15 15:25:05 (pid:1802) ** PID = 1802
10/07/15 15:25:05 (pid:1802) ** Log last touched 10/7 15:25:04
10/07/15 15:25:05 (pid:1802) ******************************************************
10/07/15 15:25:05 (pid:1802) Using config source: /home/boinc/CMSRun/glide_HKmyZt/condor_config
10/07/15 15:25:05 (pid:1802) config Macros = 212, Sorted = 212, StringBytes = 10697, TablesBytes = 7672
10/07/15 15:25:05 (pid:1802) CLASSAD_CACHING is OFF
10/07/15 15:25:05 (pid:1802) Daemon Log is logging: D_ALWAYS D_ERROR
10/07/15 15:25:05 (pid:1802) DaemonCore: command socket at <10.0.2.15:56183?noUDP>
10/07/15 15:25:05 (pid:1802) DaemonCore: private command socket at <10.0.2.15:56183>
10/07/15 15:25:05 (pid:1802) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9621 as ccbid 130.246.180.120:9621#83956
10/07/15 15:25:07 (pid:1802) Communicating with shadow <130.246.180.120:9818?noUDP&sock=20016_d29f_57838>
10/07/15 15:25:07 (pid:1802) Submitting machine is "lcggwms02.gridpp.rl.ac.uk"
10/07/15 15:25:07 (pid:1802) setting the orig job name in starter
10/07/15 15:25:07 (pid:1802) setting the orig job iwd in starter
10/07/15 15:25:07 (pid:1802) Chirp config summary: IO false, Updates false, Delayed updates true.
10/07/15 15:25:07 (pid:1802) Initialized IO Proxy.
10/07/15 15:25:07 (pid:1802) Done setting resource limits
10/07/15 15:25:07 (pid:1802) FILETRANSFER: "/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/07/15 15:25:07 (pid:1802) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_HKmyZt/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
10/07/15 15:25:07 (pid:1802) Got SIGTERM. Performing graceful shutdown.
10/07/15 15:25:07 (pid:1802) ShutdownGraceful all jobs.
10/07/15 15:25:08 (pid:1802) ERROR "FileTransfer::UpLoadFiles called during active transfer!
" at line 1159 in file /slots/12/dir_4417/userdir/src/condor_utils/file_transfer.cpp

10/07/15 15:25:08 (pid:1802) ShutdownFast all jobs.
ID: 1191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 19
Message 1192 - Posted: 7 Oct 2015, 17:36:59 UTC - in response to Message 1190.  

The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

Which page is that exactly?


This is an example of another job
ID: 1192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 850,198
RAC: 282
Message 1193 - Posted: 7 Oct 2015, 20:26:41 UTC - in response to Message 1188.  

Curious. 5143 appears to have finished (its output is on the DataBridge and its stdout is complete). The Condor node_state.txt doesn't show it was resubmitted:
.
.
Its successful incarnation started at 0114 UTC:
gWMS-CMSRunAnalysis.sh STARTING at Wed Oct 7 01:13:55 GMT 2015 on 246-563-16291
and finished at 0145:
gWMS-CMSRunAnalysis.sh FINISHING at Wed Oct 7 01:44:52 GMT 2015 on 246-563-16291 with (short) status 0

I haven't found anything yet in the logs I can access to identify which user/machine it ran on.

This is the BOINC-task serving the jobs (also job 5143).
66911 62053 37 6 Oct 2015, 10:31:55 UTC 6 Oct 2015, 21:37:09 UTC Completed and validated

It was completed and reported hours before dashboard times of STARTING and FINISHING of job 5143.
Baffling is the fact that Dashboard has stored my WNIp and the job needed only 24 minutes,
where my machine for this kind of jobs needed about 1 hour when running the CPU at 100%.
ID: 1193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1195 - Posted: 8 Oct 2015, 12:55:05 UTC - in response to Message 1193.  

Following up from m's post I found this page for job 5143, which ties in with what I found on the Condor server. This gets more and more curious.
ID: 1195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 6,886
Message 1197 - Posted: 8 Oct 2015, 14:56:46 UTC - in response to Message 1195.  

Should those 1970 time stamps be set to something more recent ?
ID: 1197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 850,198
RAC: 282
Message 1198 - Posted: 8 Oct 2015, 16:48:23 UTC - in response to Message 1197.  

Should those 1970 time stamps be set to something more recent ?

1970-01-01 00:00:00 means zero or not (yet) set.

Unix time starts counting the seconds since 1970-01-01 00:00:00 UTC
If you want to join the next big 2,000,000,000 celebration party, make a notice in your calendar at 18 May 2033 03:33:20 UTC.
ID: 1198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 6,886
Message 1199 - Posted: 8 Oct 2015, 17:00:37 UTC - in response to Message 1198.  

Should those 1970 time stamps be set to something more recent ?

1970-01-01 00:00:00 means zero or not (yet) set.

Unix time starts counting the seconds since 1970-01-01 00:00:00 UTC
If you want to join the next big 2,000,000,000 celebration party, make a notice in your calendar at 18 May 2033 03:33:20 UTC.


I know, and I can understand the ScheduledTimeStamp not getting set but thought the GridFinishedTimeStamp might have had a value.

I've put it in my calendar, assume we are all coming to your house for the party ;-)
ID: 1199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,973,351
RAC: 1,548
Message 1200 - Posted: 8 Oct 2015, 21:16:13 UTC - in response to Message 1192.  

The best I could find is the dashboard job details page which has the public IP of the machine so for those who have fixed IPs might this not help show the user?

Which page is that exactly?


This is an example of another job

Yeah, thanks, but now I've got a chance to wind down a bit after a hectic day, how did you get there? Dashboard is a bit like the old game of Colossal Cave Adventure: "a maze of twisty little passages, all alike!"
ID: 1200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : issue of the day


©2024 CERN