Message boards : CMS Application : New version v48.30
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5402 - Posted: 16 Apr 2018, 11:44:30 UTC

Uses a content delivery network and updating the cache.
ID: 5402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 0
Message 5403 - Posted: 18 Apr 2018, 6:44:28 UTC - in response to Message 5402.  

No problems: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=730341

The task even survived an overnight suspension ;)
ID: 5403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 263
Credit: 232,222
RAC: 0
Message 5404 - Posted: 18 Apr 2018, 8:44:00 UTC

I ran a task here after the WMAgent update yesterday but only with partial success.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=730340

Detected squid proxy http://<hostname_censored_by_volunteer/>:3128

This means, the bootstrap script works and copies the info about the local squid into the VM.


Probing /cvmfs/grid.cern.ch... OK

looks good, but
Probing /cvmfs/cms.cern.ch...

no such line for CMS in the log.


VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2.4.4.0 3543 1 25968 5943 3 1 1115963 10240001 2 65024 0 15 100 0 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1

Although the local proxy info is available, CVMFS configures a CERN proxy.
It also uses cvmfs-stratum-one.cern.ch instead of (what I expected) s1x-cvmfs.openhtc.io.


Nonetheless both slots ran a job, but none of them got a follow up job. That's why I shut it down after a few idle hours.
ID: 5404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5408 - Posted: 19 Apr 2018, 0:21:46 UTC

No problems here with CMS and Theory (I will have to switch the CMS tasks over to one of mine with more ram)
So I decided to fire up a couple more of my old fleet.
ID: 5408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5418 - Posted: 1 May 2018, 3:06:59 UTC

It looks like we have a new batch of CMS tasks so I will give them a try again.
ID: 5418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5419 - Posted: 1 May 2018, 4:39:42 UTC - in response to Message 5418.  

Still not working

[ERROR] Condor exited after 1020s without running a job.
ID: 5419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5420 - Posted: 3 May 2018, 9:48:42 UTC

Well it looked like the CMS tasks were starting to work again, at least on a pc running linux but it started getting its version of the error "VM Completion Message: No jobs were available to run"
I saw 6 valids so I started one on a Windows 10 but lost it because I was at the same time messing with the OS at the same time and after freezing and having to reboot a couple times it crashed so next after I get finished with this I will try one again just running a single 2-core version and see how it works.

(and watch that certain other CMS machine running with linux)
ID: 5420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5421 - Posted: 3 May 2018, 9:49:22 UTC

Well it looked like the CMS tasks were starting to work again, at least on a pc running linux but it started getting its version of the error "VM Completion Message: No jobs were available to run"
I saw 6 valids so I started one on a Windows 10 but lost it because I was at the same time messing with the OS at the same time and after freezing and having to reboot a couple times it crashed so next after I get finished with this I will try one again just running a single 2-core version and see how it works.

(and watch that certain other CMS machine running with linux)
ID: 5421 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5423 - Posted: 3 May 2018, 21:33:18 UTC - in response to Message 5421.  

Still not working here so run the other task versions here for now.
ID: 5423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 431
Credit: 1,260,319
RAC: 1
Message 5752 - Posted: 20 Dec 2018, 7:11:36 UTC
Last modified: 20 Dec 2018, 7:14:06 UTC

CMS is waked up again, but.. too early for us..
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2744192
ID: 5752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 0
Message 5753 - Posted: 20 Dec 2018, 8:46:58 UTC

Your VM did not start, but mine started OK, but did not get a CMS job to run >> EXIT_NO_SUB_TASKS

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2744208
ID: 5753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5754 - Posted: 20 Dec 2018, 17:18:25 UTC

It looks like they accidentally sent out 5 of those for you.

I see Laurence ran some of those TensorFlow

(the Theory tasks seem to be back to normal)
ID: 5754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5755 - Posted: 21 Dec 2018, 21:37:59 UTC

For some reason these vdi's always take a long time to d/l to where I am.

My speed is not slow for anything else but this is running at 2.3KBps at best.
About .5 % in 30 minutes. so 1% per hour is going to take a while.
ID: 5755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 242
Credit: 856,216
RAC: 8
Message 5756 - Posted: 21 Dec 2018, 23:40:09 UTC
Last modified: 21 Dec 2018, 23:52:35 UTC

There's been one of these running here for an hour or so but it isn't actually doing any work. The set up takes quite a while. It seems to install singularity (among much else) in the VM. A quick look at the proxy log shows about 255M download for two tasks, one started the other waiting.. The setup appears to complete successfully but, although "cmsrun" appears at intervals in the "top" console and takes ca 50% CPU, no "running job", "wrapper" nor "error" outputs appear and it hasn't timed out.
The host is shut down at the moment but will start itself up again later and run until 0700 GMT. I'll leave it to see what, if anything, happens.
ID: 5756 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5757 - Posted: 22 Dec 2018, 1:32:13 UTC - in response to Message 5756.  
Last modified: 22 Dec 2018, 1:32:57 UTC

Yes the VB tasks do always start slow even to get the job to start beyond Condor Ping

[INFO] Condor JobID: 484151.51 in slot2
00:14:54.347983 VMMDev: Guest Log: [INFO] Condor JobID: 484151.50 in slot1
[IINNFFOO]] MMCCPPlloottss JJoobIIDD:: 4477881105049820 iinn sslloott21
[[IINNFFOO]] MMCCPPlloottss JJoobIIDD:: 4477881105049820 iinn sslloott21
[INFO] Job finished in slot1 with 0.
[INFO] New Job Starting in slot1
[INFO] Condor JobID: 483974.117 in slot1
: [INFO] MCPlots JobID: 47785745 in slot1

And if you are thousands of miles away the connection tends to have the server with all those text errors but at least they start better than they did before they made it here.
This vdi is 694.05MB (4 hours to d/l 10% so far and now at 4.53KBps speed)

So I will just load it on this host and see how it starts after 2am here since I have the fastest speed then.
If they run ok I will load this on my faster host with most ram.

BUT I see maeax just got a [ERROR] Condor exited after 11212s without running a job after Run time 3 hours 22 min 12 sec

So this isn't looking good (and hope we don't have to try a new Version or get one before I finish this current d/l )
ID: 5757 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 431
Credit: 1,260,319
RAC: 1
Message 5758 - Posted: 22 Dec 2018, 2:45:01 UTC - in response to Message 5757.  

2018-12-21 21:18:45 (10144): Status Report: Job Duration: '64800.000000'
2018-12-21 21:18:45 (10144): Status Report: Elapsed Time: '6000.164904'
2018-12-21 21:18:45 (10144): Status Report: CPU Time: '433.171875'
2018-12-21 22:53:05 (10144): Guest Log: [ERROR] Condor exited after 11212s without running a job.

2018-12-21 22:53:05 (10144): Guest Log: [INFO] Shutting Down.

So long, so good... sorry, no good.

No CMS-jobs are avalaible, but the task starts well and finished.

Ivan is retired and his work is coming in the next generation of running....
ID: 5758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 431
Credit: 1,260,319
RAC: 1
Message 5769 - Posted: 18 Jan 2019, 7:45:39 UTC
Last modified: 18 Jan 2019, 7:46:41 UTC

CMS-Tasks avalaible, but without jobs:
207 (0x000000CF) EXIT_NO_SUB_TASKS
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748532
ID: 5769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 538
Credit: 7,547,069
RAC: 1,614
Message 5771 - Posted: 19 Jan 2019, 20:08:16 UTC

I tried one and got almost the same.

After 7 hours it looked like it was going to run but 30mins later I got this.....

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748799

If anyone here other than Axel and myself tries one let us know since we can't go by the stats pages here.
ID: 5771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 431
Credit: 1,260,319
RAC: 1
Message 5772 - Posted: 20 Jan 2019, 6:15:19 UTC

Hi Magic,
there was something working in your task shown above:
2019-01-18 14:29:40 (9472): Guest Log: [INFO] CMS application starting. Check log files.

2019-01-18 15:56:59 (9472): Status Report: Job Duration: '64800.000000'
2019-01-18 15:56:59 (9472): Status Report: Elapsed Time: '6000.000000'
2019-01-18 15:56:59 (9472): Status Report: CPU Time: '857.671875'
2019-01-18 17:37:26 (9472): Status Report: Job Duration: '64800.000000'
2019-01-18 17:37:26 (9472): Status Report: Elapsed Time: '12000.000000'
2019-01-18 17:37:26 (9472): Status Report: CPU Time: '1145.546875'
2019-01-18 19:17:41 (9472): Status Report: Job Duration: '64800.000000'
2019-01-18 19:17:41 (9472): Status Report: Elapsed Time: '18000.000000'
2019-01-18 19:17:41 (9472): Status Report: CPU Time: '1875.218750'
2019-01-18 21:00:28 (9472): Status Report: Job Duration: '64800.000000'
2019-01-18 21:00:28 (9472): Status Report: Elapsed Time: '24000.844013'
2019-01-18 21:00:28 (9472): Status Report: CPU Time: '3796.703125'

We have to wait up to monday for some news.
ID: 5772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 0
Message 5773 - Posted: 20 Jan 2019, 8:47:43 UTC - in response to Message 5771.  
Last modified: 20 Jan 2019, 9:34:56 UTC

After 7 hours it looked like it was going to run but 30mins later I got this.....

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748799

If anyone here other than Axel and myself tries one let us know since we can't go by the stats pages here.

I ran one 2 days ago, but stopped it early, cause it was obvious that it would not get a job.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748507

I just started an other one https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2749192,
but will let it run without a good feeling. We didn't hear anything from Ivan since months.
It's clearly that this version has a very different setup including install of python etc. before checking HTCondor after about 8 minutes.
I'll make a video of all actions before it's really waiting for a job.

Edit: Normal 'EXIT_NO_SUB_TASKS' shutdown:

2019-01-20 10:26:08 (9564): Guest Log: [ERROR] No jobs were available to run.

2019-01-20 10:26:08 (9564): Guest Log: [INFO] Shutting Down.
ID: 5773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : New version v48.30


©2020 CERN