Message boards : Theory Application : New Version 3.08
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 158
Message 5440 - Posted: 25 Jun 2018, 12:26:36 UTC

Updates the CernVM cache and now uses OpenHTC.io, for CVMFS.
ID: 5440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5441 - Posted: 25 Jun 2018, 13:45:13 UTC

Thanks for updating.
What about this multicore-Theory in production.
Is there a timeline?
ID: 5441 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5442 - Posted: 25 Jun 2018, 16:31:23 UTC
Last modified: 25 Jun 2018, 16:32:46 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=1823638
shows 3.07 and not 3.08.
Initialisation 18/6/21 of the task.
ID: 5442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5443 - Posted: 25 Jun 2018, 16:50:26 UTC - in response to Message 5442.  

https://openhtc.io/

I will d/l this version later tonight on my dev computers but I wish I would know how big the vdi is going to be before doing this.

And yeah I have been wondering why these multi-cores have not already been moved to LHC since I have about 4000 Valids myself here.

They never fail.
Mad Scientist For Life
ID: 5443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5444 - Posted: 25 Jun 2018, 21:55:56 UTC

vdi 293 MByte, 43 sec. download ;-)
ID: 5444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5445 - Posted: 26 Jun 2018, 7:27:09 UTC

This Sherpa is now in a loop:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=1827692
Event 700 ( 1m 26s elapsed / 3h 23m 22s left ) -> ETA: Tue Jun 26 11:52
700 events processed
dumping histograms...
Event 800 ( 1m 35s elapsed / 3h 17m 50s left ) -> ETA: Tue Jun 26 11:46
800 events processed
dumping histograms...
Updating display...
Display update finished (9 histograms, 800 events).
Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.595853, y = 0.50943).
Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.610011, y = 0.495165).
Event 900 ( 1m 47s elapsed / 3h 17m 35s left ) -> ETA: Tue Jun 26 11:46
900 events processed
dumping histograms...
Updating display...
Display update finished (9 histograms, 900 events).
ID: 5445 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 158
Message 5446 - Posted: 26 Jun 2018, 8:04:44 UTC - in response to Message 5441.  

I will put it in production Wednesday or Thursday, once we have verified there are not issues here.
ID: 5446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 158
Message 5447 - Posted: 26 Jun 2018, 8:05:00 UTC - in response to Message 5442.  

Forgot to restart the server. Done now.
ID: 5447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5448 - Posted: 26 Jun 2018, 17:50:35 UTC

I only have 3 so far but maybe after the rest of the 3.07's are done I will get the rest of the new version.

All of my -dev computers are always set to auto-load so they would have had all with the new version if the server ........
Mad Scientist For Life
ID: 5448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5449 - Posted: 27 Jun 2018, 1:18:22 UTC

Ok I connected all my computers to Axel's ISP so I have the new version on all my -dev pc's now.
ID: 5449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5450 - Posted: 27 Jun 2018, 5:19:41 UTC

10k miles aircable RJ45 are needed.
ID: 5450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5451 - Posted: 27 Jun 2018, 20:17:41 UTC - in response to Message 5450.  

10k miles aircable RJ45 are needed.


well I thought of doing that since it takes about as long to d/l those Atlas tasks as it would for me to roll out the 10k miles of cable but with these smaller Theory vdi's I decided to test my new Tesla wireless internet that bounces the signal off of clouds back down to Earth and hitting hard ground so it will bounce back up to the clouds and travel those 10k miles at close to the speed of light.

I was going to try my home made Linear particle accelerator but I thought that might melt all of our computers
ID: 5451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5452 - Posted: 28 Jun 2018, 4:51:08 UTC
Last modified: 28 Jun 2018, 4:56:27 UTC

Not much luck with these so far compared to the previous version.

Valids 9 - Errors 11 so far. (and 4 of those Valids are short one hour tasks each)

Most are the usual *VM Completion Message: Condor exited after 1036s without running a job*
ID: 5452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5453 - Posted: 28 Jun 2018, 5:54:05 UTC
Last modified: 28 Jun 2018, 6:16:18 UTC

Have one with defeats Status. Now running again.
There where network problems with Atlas-Downloads last night.
5 hours download and than stalled.
ID: 5453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 158
Message 5454 - Posted: 28 Jun 2018, 9:28:43 UTC - in response to Message 5453.  

This is now available on the production project.
ID: 5454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5455 - Posted: 28 Jun 2018, 10:30:13 UTC

Laurence
thank you and also the Team.
ID: 5455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5456 - Posted: 1 Jul 2018, 7:27:34 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192

They have been working but just started getting these heartbeat Errors again just around midnight.

Hope they don't all end up with these while I am asleep
ID: 5456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,807,614
RAC: 2,394
Message 5457 - Posted: 1 Jul 2018, 9:43:14 UTC - in response to Message 5456.  

Your link is for the user: access denied!
But over the Computer-List the Message-list is shown.
You had no tasks for this to Theory-Tasks. Linux was started. Condor-ping was successful.
17.7 Benchmark HPSEC. Wow.
My Ryzen have 10 HPSEC.
Your Virtualbox is 5.1.22. For me 5.2.12 and Boinc 7.10.2.
Oracle have finished the support for 5.1.xx.
Will tune my Ryzen to see 17 HPSEC!!
ID: 5457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 5458 - Posted: 1 Jul 2018, 17:54:50 UTC
Last modified: 1 Jul 2018, 18:02:25 UTC

Yeah I always forget these url's only work for the members after they log in........sure wish they had these so we could just use one url to show the members tasks page.

I just want to show how the 4 hosts I use here have task errors (I see a couple Valids this morning but there should be more)

I rather not post 10 url's to show 10 separate task stderr's

But mine aren't hidden so I guess I will just post one and say they are (or were) all doing this.


https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=1901631

5 Valids on July 1st so maybe some are just not finished yet today
(hey it is morning and I haven't went up the stairs to look at the desktops yet)

The previous version was not doing this all the time.
(I also haven't checked my LHC hosts yet this morning)

Here I have VB Version: 5.2.2 - 5.1.22
Mad Scientist For Life
ID: 5458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 238
Message 5459 - Posted: 4 Jul 2018, 18:54:49 UTC
Last modified: 4 Jul 2018, 19:30:28 UTC

This version does not handle resuming the VM from a snapshot very well.
Seen this now her at -dev and also on the production LHC@home.
Console Alt-F2 shows e.g. 52000 events processed, but there are no (nobody) processes busy (VM idling)
and the VM is also not killed by the shutdown file.

07/03/18 19:37:40 ******************************************************
07/03/18 19:37:40 ** condor_startd (CONDOR_STARTD) STARTING UP
07/03/18 19:37:40 ** /usr/sbin/condor_startd
07/03/18 19:37:40 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
07/03/18 19:37:40 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
07/03/18 19:37:40 ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
07/03/18 19:37:40 ** $CondorPlatform: x86_64_RedHat6 $
07/03/18 19:37:40 ** PID = 4089
07/03/18 19:37:40 ** Log last touched time unavailable (No such file or directory)
07/03/18 19:37:40 ******************************************************
07/03/18 19:37:40 Using config source: /etc/condor/condor_config
07/03/18 19:37:40 Using local config sources:
07/03/18 19:37:40 /etc/condor/config.d/10_security.config
07/03/18 19:37:40 /etc/condor/config.d/14_network.config
07/03/18 19:37:40 /etc/condor/config.d/20_workernode.config
07/03/18 19:37:40 /etc/condor/config.d/30_lease.config
07/03/18 19:37:40 /etc/condor/config.d/35_theory.config
07/03/18 19:37:40 /etc/condor/config.d/40_ccb.config
07/03/18 19:37:40 /etc/condor/config.d/62-benchmark.conf
07/03/18 19:37:40 /etc/condor/condor_config.local
07/03/18 19:37:40 config Macros = 160, Sorted = 160, StringBytes = 5549, TablesBytes = 5864
07/03/18 19:37:40 CLASSAD_CACHING is ENABLED
07/03/18 19:37:40 Daemon Log is logging: D_ALWAYS D_ERROR
07/03/18 19:37:40 Daemoncore: Listening at <10.0.2.15:37368> on TCP (ReliSock).
07/03/18 19:37:40 DaemonCore: command socket at <10.0.2.15:37368?addrs=10.0.2.15-37368&noUDP>
07/03/18 19:37:40 DaemonCore: private command socket at <10.0.2.15:37368?addrs=10.0.2.15-37368>
07/03/18 19:37:41 WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167!
07/03/18 19:37:41 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#40450682
07/03/18 19:37:42 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state
07/03/18 19:37:42 VM-gahp server reported an internal error
07/03/18 19:37:42 VM universe will be tested to check if it is available
07/03/18 19:37:42 History file rotation is enabled.
07/03/18 19:37:42 Maximum history file size is: 20971520 bytes
07/03/18 19:37:42 Number of rotated history files is: 2
07/03/18 19:37:42 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1.000000, Memory: 1500, Swap: 100.00%, Disk: 100.00%
07/03/18 19:37:42 New machine resource allocated
07/03/18 19:37:42 Setting up slot pairings
07/03/18 19:37:42 CronJobList: Adding job 'multicore'
07/03/18 19:37:42 CronJob: Initializing job 'multicore' (/usr/local/bin/multicore-shutdown)
07/03/18 19:37:42 CronJobList: Adding job 'mips'
07/03/18 19:37:42 CronJobList: Adding job 'kflops'
07/03/18 19:37:42 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
07/03/18 19:37:42 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
07/03/18 19:37:42 State change: IS_OWNER is false
07/03/18 19:37:42 Changing state: Owner -> Unclaimed
07/03/18 19:37:42 State change: RunBenchmarks is TRUE
07/03/18 19:37:42 Changing activity: Idle -> Benchmarking
07/03/18 19:37:42 BenchMgr:StartBenchmarks()
07/03/18 19:37:45 Initial update sent to collector(s)
07/03/18 19:37:45 Sending DC_SET_READY message to master <10.0.2.15:55685?addrs=10.0.2.15-55685>
07/03/18 19:37:46 WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167!
07/03/18 19:37:57 State change: benchmarks completed
07/03/18 19:37:57 Changing activity: Benchmarking -> Idle
07/03/18 19:38:16 Request accepted.
07/03/18 19:38:16 Remote owner is test4theory@cern.ch
07/03/18 19:38:16 State change: claiming protocol successful
07/03/18 19:38:16 Changing state: Unclaimed -> Claimed
07/03/18 19:38:17 Got activate_claim request from shadow (188.184.94.254)
07/03/18 19:38:17 Remote job ID is 387629.129
07/03/18 19:38:17 Got universe "VANILLA" (5) from request classad
07/03/18 19:38:17 State change: claim-activation protocol successful
07/03/18 19:38:17 Changing activity: Idle -> Busy
07/03/18 20:29:26 CCBListener: no activity from CCB server in 2169s; assuming connection is dead.
07/03/18 20:29:26 CCBListener: connection to CCB server vccondor01.cern.ch failed; will try to reconnect in 60 seconds.
07/03/18 20:29:59 condor_write(): Socket closed when trying to write 4096 bytes to collector vccondor01.cern.ch, fd is 8, errno=104 Connection reset by peer
07/03/18 20:29:59 Buf::write(): condor_write() failed
07/03/18 20:30:27 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#40450682
07/03/18 20:36:09 condor_read() failed: recv(fd=9) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from collector vccondor01.cern.ch.
07/03/18 20:36:09 IO: Failed to read packet header
07/03/18 20:36:09 CCBListener: failed to receive message from CCB server vccondor01.cern.ch
07/03/18 20:36:09 CCBListener: connection to CCB server vccondor01.cern.ch failed; will try to reconnect in 60 seconds.
07/03/18 20:36:12 condor_write(): Socket closed when trying to write 4096 bytes to collector vccondor01.cern.ch, fd is 6, errno=104 Connection reset by peer
07/03/18 20:36:12 Buf::write(): condor_write() failed
07/03/18 20:37:10 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#40450682
07/03/18 20:37:40 PERMISSION DENIED to condor@38-37-9348 from host 10.0.2.15 for command 448 (GIVE_STATE), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
07/03/18 20:37:40 DC_AUTHENTICATE: Command not authorized, done!
07/04/18 06:51:19 CCBListener: no activity from CCB server in 36249s; assuming connection is dead.
07/04/18 06:51:19 CCBListener: connection to CCB server vccondor01.cern.ch failed; will try to reconnect in 60 seconds.
07/04/18 06:51:22 condor_write(): Socket closed when trying to write 4096 bytes to collector vccondor01.cern.ch, fd is 6, errno=104 Connection reset by peer
07/04/18 06:51:22 Buf::write(): condor_write() failed
07/04/18 06:52:02 Starter pid 4124 exited with status 2
07/04/18 06:52:02 State change: starter exited
07/04/18 06:52:02 Changing activity: Busy -> Idle
07/04/18 06:52:02 State change: claim lease expired (condor_schedd gone?), evicting claim
07/04/18 06:52:02 Changing state and activity: Claimed/Idle -> Preempting/Killing
07/04/18 06:52:02 State change: No preempting claim, returning to owner
07/04/18 06:52:02 Changing state and activity: Preempting/Killing -> Owner/Idle
07/04/18 06:52:02 State change: IS_OWNER is false
07/04/18 06:52:02 Changing state: Owner -> Unclaimed
07/04/18 06:52:20 WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167!
07/04/18 06:52:20 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#40450682
07/04/18 06:52:33 Request accepted.
07/04/18 06:52:33 Remote owner is test4theory@cern.ch
07/04/18 06:52:33 State change: claiming protocol successful
07/04/18 06:52:33 Changing state: Unclaimed -> Claimed
07/04/18 06:52:33 Got activate_claim request from shadow (188.184.94.254)
07/04/18 06:52:33 Remote job ID is 429810.134
07/04/18 06:52:33 Got universe "VANILLA" (5) from request classad
07/04/18 06:52:33 State change: claim-activation protocol successful
07/04/18 06:52:33 Changing activity: Idle -> Busy
07/04/18 07:03:56 Called deactivate_claim_forcibly()
07/04/18 07:03:56 Starter pid 8323 exited with status 0
07/04/18 07:03:56 State change: starter exited
07/04/18 07:03:56 Changing activity: Busy -> Idle
07/04/18 07:03:56 Got activate_claim request from shadow (188.184.94.254)
07/04/18 07:03:56 Remote job ID is 429813.13
07/04/18 07:03:56 Got universe "VANILLA" (5) from request classad
07/04/18 07:03:56 State change: claim-activation protocol successful
07/04/18 07:03:56 Changing activity: Idle -> Busy
07/04/18 07:20:40 Called deactivate_claim_forcibly()
07/04/18 07:20:40 Starter pid 9949 exited with status 0
07/04/18 07:20:40 State change: starter exited
07/04/18 07:20:40 Changing activity: Busy -> Idle
07/04/18 07:20:40 Got activate_claim request from shadow (188.184.94.254)
07/04/18 07:20:40 Remote job ID is 429813.90
07/04/18 07:20:40 Got universe "VANILLA" (5) from request classad
07/04/18 07:20:40 State change: claim-activation protocol successful
07/04/18 07:20:40 Changing activity: Idle -> Busy
07/04/18 18:06:52 CCBListener: no activity from CCB server in 36071s; assuming connection is dead.
07/04/18 18:06:52 CCBListener: connection to CCB server vccondor01.cern.ch failed; will try to reconnect in 60 seconds.
07/04/18 18:06:57 condor_write(): Socket closed when trying to write 4096 bytes to collector vccondor01.cern.ch, fd is 6, errno=104 Connection reset by peer
07/04/18 18:06:57 Buf::write(): condor_write() failed
07/04/18 18:07:52 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#40450682
07/04/18 20:37:40 PERMISSION DENIED to condor@38-37-9348 from host 10.0.2.15 for command 448 (GIVE_STATE), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
07/04/18 20:37:40 DC_AUTHENTICATE: Command not authorized, done!
ID: 5459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : New Version 3.08


©2024 CERN