Message boards : Number crunching : issue of the day
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 61
Message 1754 - Posted: 31 Jan 2016, 14:44:58 UTC - in response to Message 1751.  
Last modified: 31 Jan 2016, 15:02:17 UTC

However maybe also because of the issue I have.
This morning I fetched a new task and a new VM was created and booted.
That was all! Only a boot.log is created. No boinc user jobs, only user root

Resetting the project and getting all fresh project files, did not solve my problem.
I meanwhile discovered that the project launched a new version of the application (v4622) on 28 Jan 2016, 12:05:09 UTC.
This was not announced in the News afaik.
The only difference is a new project xml-file. The VM-vdi is unchanged.
New in the xml-file is copying init_data.xml to the shared project directory in the task-slot and the runtime is extended from 24hrs to 36hrs.
ID: 1754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1755 - Posted: 31 Jan 2016, 14:55:14 UTC - in response to Message 1754.  

I did a project detach and reattach.
It is not working.
Bootlog is the only file in the logs.
Investigating....
ID: 1755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1756 - Posted: 31 Jan 2016, 15:58:53 UTC

My guess is, that when vlhc users were disallowed, cms-dev users were as well.
Once a cms-dev boinc tasks ends, you cannot get in again.
ID: 1756 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,310,612
RAC: 270
Message 1757 - Posted: 31 Jan 2016, 18:57:32 UTC - in response to Message 1754.  

I meanwhile discovered that the project launched a new version of the application (v4622) on 28 Jan 2016, 12:05:09 UTC.
This was not announced in the News afaik.

Hmm, that's news to me as well. Will follow up tomorrow.
ID: 1757 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2275 - Posted: 9 Mar 2016, 13:46:06 UTC
Last modified: 9 Mar 2016, 13:54:42 UTC

edit: vLHCathome-dev 3/9/2016 2:51:20 PM [error] No start tag in scheduler reply

Above is message from boinc, when attempting to report task.


After the switch-over, the credentials do not seem to work.

It goes on and on and on....
00:04:06.324627 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:06.974927 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:27.871165 VMMDev: Guest Log: [INFO] Starting CMS Application - Run 4
00:04:27.908388 VMMDev: Guest Log: [INFO] Reading the BOINC volunteer's information
00:04:28.002609 VMMDev: Guest Log: [INFO] Volunteer: Rasputin42 (xxx) Host: xxx
00:04:28.050134 VMMDev: Guest Log: [INFO] VMID: xxxxxxxxxxxxxxxxxxxxxxxx
00:04:28.103016 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:04:28.831879 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:30.671557 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:04:31.513885 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:36.858739 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:04:37.520845 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:04:37.607688 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:38.372696 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:04:53.174549 NAT: old socket rcv size: 64KB
00:04:53.174581 NAT: old socket snd size: 64KB
00:04:59.372916 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:00.158923 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:02.053716 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:02.651167 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:08.205944 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:08.866343 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:08.919476 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:09.573582 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:27.833479 VMMDev: Guest Log: [INFO] Starting CMS Application - Run 5
00:05:27.861517 VMMDev: Guest Log: [INFO] Reading the BOINC volunteer's information
00:05:27.949528 VMMDev: Guest Log: [INFO] Volunteer: Rasputin42 (xxx) Host: xxx
00:05:27.983997 VMMDev: Guest Log: [INFO] VMID: xxxxxxxxxxxxxxxxxxxxxxxx
00:05:28.031234 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:28.664188 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:30.684688 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:31.302647 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:33.209132 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:33.858069 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:39.441975 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:40.137187 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:40.203887 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:40.920251 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:05:59.176726 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:05:59.951849 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:06:01.840979 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:06:02.770356 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:06:04.376781 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:06:05.292544 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:06:10.776294 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:06:11.374606 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:06:11.436696 VMMDev: Guest Log: [INFO] Requesting an X509 credential from CMS-Dev
00:06:12.097996 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home
00:06:27.835997 VMMDev: Guest Log: [INFO] Starting CMS Application - Run 6
00:06:27.867779 VMMDev: Guest Log: [INFO] Reading the BOINC volunteer's information
00:06:27.953261 VMMDev: Guest Log: [INFO] Volunteer: Rasputin42 (xxx) Host: xxx
00:06:27.994671 VMMDev: Guest Log: [INFO] VMID: xxxxxxxxxxxxxxxxxxxxxxx
ID: 2275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2312 - Posted: 10 Mar 2016, 22:58:08 UTC

A 2nd? run started.Has this been fixed?



I found this in cron_stdout:

16:51:01 +0100 2016-03-10 [INFO] Starting CMS Application - Run 1
16:51:02 +0100 2016-03-10 [INFO] Reading the BOINC volunteer's information
16:51:02 +0100 2016-03-10 [INFO] Volunteer: Rasputin42 (277) Host: 617
16:51:02 +0100 2016-03-10 [INFO] VMID: e9a40930-863c-4e95-b27a-44abf7940b9c
16:51:02 +0100 2016-03-10 [INFO] Requesting an X509 credential from CMS-Dev
subject : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277/CN=xxxxxxxx
issuer : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277
identity : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 130:00:00 (5.4 days)
16:51:03 +0100 2016-03-10 [INFO] Downloading glidein
16:51:05 +0100 2016-03-10 [INFO] Running glidein (check logs)
23:47:01 +0100 2016-03-10 [INFO] CMS glidein Run 1 ended
Copying 354466 bytes file:///home/boinc/wu_1457619778_8_0_1.tgz => https://data-bridge-test.cern.ch/myfed/moutputs/wu_1457619778_8_0_1.tgz
Short exit status: 0
Short exit status: 0
Short exit status: 0
Short exit status: 0

23:48:01 +0100 2016-03-10 [INFO] Starting CMS Application - Run 2
23:48:01 +0100 2016-03-10 [INFO] Reading the BOINC volunteer's information
23:48:01 +0100 2016-03-10 [INFO] Volunteer: Rasputin42 (277) Host: 617
23:48:01 +0100 2016-03-10 [INFO] VMID: e9a40930-863c-4e95-b27a-44abf7940b9c
23:48:01 +0100 2016-03-10 [INFO] Requesting an X509 credential from CMS-Dev
subject : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277/CN=xxxxxxx
issuer : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277
identity : /O=Volunteer Computing/O=CERN/CN=Rasputin42 277
type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 129:59:59 (5.4 days)
23:48:03 +0100 2016-03-10 [INFO] Downloading glidein
23:48:03 +0100 2016-03-10 [INFO] Running glidein (check logs)
ID: 2312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2315 - Posted: 11 Mar 2016, 8:52:17 UTC - in response to Message 2312.  

Thanks for pointing this out. This is copying the log files for that run.

http://lhcathomedev.cern.ch/vLHCathome-dev/forum_thread.php?id=139&
postid=2261#2261


By collecting them all we will be in a better position to debug issues after they have occurred.
ID: 2315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,191,283
RAC: 3,172
Message 2329 - Posted: 11 Mar 2016, 11:34:34 UTC - in response to Message 2315.  

I've just had one complete that had done 2 runs also.
ID: 2329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2332 - Posted: 11 Mar 2016, 12:38:32 UTC

I have a large number of jobs, which ALL have been started and aborted by ONE host.
IP available, if needed.

This guy must produce a lot of these.
ID: 2332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2336 - Posted: 11 Mar 2016, 12:51:18 UTC - in response to Message 2332.  

I would just need the Task number, WU Name or host id.
ID: 2336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2337 - Posted: 11 Mar 2016, 13:00:05 UTC - in response to Message 2336.  

Jobs:
3895,4252,5070,5539,5455 and more.
ID: 2337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2338 - Posted: 11 Mar 2016, 13:24:00 UTC - in response to Message 2337.  

Are the jobs the task number as it seems to be fine?

http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=5455
ID: 2338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2339 - Posted: 11 Mar 2016, 13:35:23 UTC - in response to Message 2338.  

I think, you misunderstood.

The point is, that a host starts a lot of jobs, which are abandoned, and picked up by others.
They do not produce errors, but the host should be checked, as of why it produces such large numbers of abandoned tasks.
The jobs, i listed were picked up by me and finished.

I can only guess, how may abandoned jobs were produced by this single host, if i am picking up that many.
ID: 2339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2340 - Posted: 11 Mar 2016, 14:10:09 UTC - in response to Message 2339.  

When you refer to jobs, do you mean CMS jobs or Boinc tasks. How do you know that you are getting abandoned jobs?
ID: 2340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2341 - Posted: 11 Mar 2016, 14:13:04 UTC - in response to Message 2340.  

I only see 29 abandoned BOINC tasks in the past 7 days.
ID: 2341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2342 - Posted: 11 Mar 2016, 14:20:02 UTC - in response to Message 2340.  
Last modified: 11 Mar 2016, 14:30:46 UTC

I am talking about CMS jobs.
I know, they are abandoned, because they have an ip address associated with them, that is not mine.

http://dashb-cms-job-task.cern.ch/dashboard/templates/task-analysis/#user=ivan+reid&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=all&site=&tid=160226_150549%3Aireid_crab_CMS_at_Home_MinBias_250evE

If you type a job number into search and click on the + sign on the very left of the job number and then click on the attempt number, you can see the IP address, the job was originally assigned to.(amongst other things)


With non-abandoned jobs, they would have my IP on it. EDIT:(Of course only jobs, i have been calculating)
ID: 2342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2343 - Posted: 11 Mar 2016, 15:35:33 UTC - in response to Message 2342.  

OK, here's what's happening.

Ivan's proxy expires when the job is in the queue on the server. A VM requests a new job and the jobs fails. It does not even start as the site is recorded as unknown. The job is resubmitted and Ivan's script eventually renews the proxy. You then get the good job.

There is nothing wrong with the volunteer side of things, it is just noise created by the proxy expiring.
ID: 2343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2344 - Posted: 11 Mar 2016, 16:37:22 UTC

Thanks, Laurence.
Where does the job, i am picking up, have the IP from, if not a volunteer, that did not finish it?
This, by itself is, is not remarkable.
It becomes remarkable, when the IP is the same for a large number of jobs.

A "fresh" job does not have an IP.

The jobs in question did not fail, never left the queue. They were abandoned and reassigned (to me).

In any case, if you do not have a problem with that, why should i?
ID: 2344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 2345 - Posted: 11 Mar 2016, 16:49:30 UTC

I just re-enabled one of my boxes to fetch work from this Project.

The machine was added 2 month ago, so before your Project rename. I only allowed to get work and work was fetched; this was good.

Now, run 1 has finished and the box is sitting their idle.

I just checked around and found in http://localhost:57156/logs/run-1/glide_UES5to/MasterLog:

03/11/16 14:52:31 (pid:7863) ******************************************************
03/11/16 14:52:31 (pid:7863) ** condor_master (CONDOR_MASTER) STARTING UP
03/11/16 14:52:31 (pid:7863) ** /home/boinc/CMSRun/glide_UES5to/main/condor/sbin/condor_master
03/11/16 14:52:31 (pid:7863) ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
03/11/16 14:52:31 (pid:7863) ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
03/11/16 14:52:31 (pid:7863) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
03/11/16 14:52:31 (pid:7863) ** $CondorPlatform: x86_64_RedHat5 $
03/11/16 14:52:31 (pid:7863) ** PID = 7863
03/11/16 14:52:31 (pid:7863) ** Log last touched time unavailable (No such file or directory)
03/11/16 14:52:31 (pid:7863) ******************************************************
03/11/16 14:52:31 (pid:7863) Using config source: /home/boinc/CMSRun/glide_UES5to/condor_config
03/11/16 14:52:31 (pid:7863) config Macros = 212, Sorted = 212, StringBytes = 10636, TablesBytes = 7672
03/11/16 14:52:31 (pid:7863) CLASSAD_CACHING is OFF
03/11/16 14:52:31 (pid:7863) Daemon Log is logging: D_ALWAYS D_ERROR
03/11/16 14:52:31 (pid:7863) DaemonCore: command socket at <10.0.2.15:58692?noUDP>
03/11/16 14:52:31 (pid:7863) DaemonCore: private command socket at <10.0.2.15:58692>
03/11/16 14:52:32 (pid:7863) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#131760
03/11/16 14:52:32 (pid:7863) Master restart (GRACEFUL) is watching /home/boinc/CMSRun/glide_UES5to/main/condor/sbin/condor_master (mtime:1457704337)
03/11/16 14:52:32 (pid:7863) Started DaemonCore process "/home/boinc/CMSRun/glide_UES5to/main/condor/sbin/condor_startd", pid and pgroup = 7866
03/11/16 15:03:35 (pid:7863) condor_write(): Socket closed when trying to write 2896 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:03:35 (pid:7863) Buf::write(): condor_write() failed
03/11/16 15:14:33 (pid:7863) condor_write(): Socket closed when trying to write 2897 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:14:33 (pid:7863) Buf::write(): condor_write() failed
03/11/16 15:25:31 (pid:7863) condor_write(): Socket closed when trying to write 2897 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:25:31 (pid:7863) Buf::write(): condor_write() failed
03/11/16 15:36:29 (pid:7863) condor_write(): Socket closed when trying to write 2914 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:36:29 (pid:7863) Buf::write(): condor_write() failed
03/11/16 15:47:27 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:47:27 (pid:7863) Buf::write(): condor_write() failed
03/11/16 15:52:51 (pid:7863) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9619
03/11/16 15:52:51 (pid:7863) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9619 failed; will try to reconnect in 60 seconds.
03/11/16 15:53:52 (pid:7863) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#131774
03/11/16 15:58:25 (pid:7863) condor_write(): Socket closed when trying to write 2915 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 15:58:25 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:09:23 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 16:09:23 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:20:21 (pid:7863) condor_write(): Socket closed when trying to write 2915 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 16:20:21 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:31:20 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 16:31:20 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:42:18 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 16:42:18 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:53:16 (pid:7863) condor_write(): Socket closed when trying to write 2896 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 16:53:16 (pid:7863) Buf::write(): condor_write() failed
03/11/16 16:54:11 (pid:7863) CCBListener: failed to receive message from CCB server lcggwms02.gridpp.rl.ac.uk:9619
03/11/16 16:54:11 (pid:7863) CCBListener: connection to CCB server lcggwms02.gridpp.rl.ac.uk:9619 failed; will try to reconnect in 60 seconds.
03/11/16 16:55:12 (pid:7863) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#131787
03/11/16 17:04:14 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 17:04:14 (pid:7863) Buf::write(): condor_write() failed
03/11/16 17:15:12 (pid:7863) condor_write(): Socket closed when trying to write 2915 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 17:15:12 (pid:7863) Buf::write(): condor_write() failed
03/11/16 17:26:10 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 17:26:10 (pid:7863) Buf::write(): condor_write() failed
03/11/16 17:37:08 (pid:7863) condor_write(): Socket closed when trying to write 2898 bytes to collector lcggwms02.gridpp.rl.ac.uk:9619, fd is 10
03/11/16 17:37:08 (pid:7863) Buf::write(): condor_write() failed

What can I do or what has to be done on your side ?
ID: 2345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2346 - Posted: 11 Mar 2016, 17:16:50 UTC - in response to Message 2345.  

I suggest to detach and reattach to:

http://lhcathomedev.cern.ch/vLHCathome-dev
ID: 2346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : issue of the day


©2024 CERN