Message boards : CMS Application : New version v48.30
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1118 Credit: 339,209 RAC: 29 ![]() |
Uses a content delivery network and updating the cache. |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 895,029 RAC: 796 ![]() ![]() ![]() |
No problems: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=730341 The task even survived an overnight suspension ;) |
![]() Send message Joined: 28 Jul 16 Posts: 511 Credit: 400,710 RAC: 115 ![]() ![]() |
I ran a task here after the WMAgent update yesterday but only with partial success. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=730340 Detected squid proxy http://<hostname_censored_by_volunteer/>:3128 This means, the bootstrap script works and copies the info about the local squid into the VM. Probing /cvmfs/grid.cern.ch... OK looks good, but Probing /cvmfs/cms.cern.ch... no such line for CMS in the log. VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2.4.4.0 3543 1 25968 5943 3 1 1115963 10240001 2 65024 0 15 100 0 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1 Although the local proxy info is available, CVMFS configures a CERN proxy. It also uses cvmfs-stratum-one.cern.ch instead of (what I expected) s1x-cvmfs.openhtc.io. Nonetheless both slots ran a job, but none of them got a follow up job. That's why I shut it down after a few idle hours. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
No problems here with CMS and Theory (I will have to switch the CMS tasks over to one of mine with more ram) So I decided to fire up a couple more of my old fleet. Mad Scientist For Life ![]() |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
It looks like we have a new batch of CMS tasks so I will give them a try again. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
Still not working [ERROR] Condor exited after 1020s without running a job. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
Well it looked like the CMS tasks were starting to work again, at least on a pc running linux but it started getting its version of the error "VM Completion Message: No jobs were available to run" I saw 6 valids so I started one on a Windows 10 but lost it because I was at the same time messing with the OS at the same time and after freezing and having to reboot a couple times it crashed so next after I get finished with this I will try one again just running a single 2-core version and see how it works. (and watch that certain other CMS machine running with linux) |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
Well it looked like the CMS tasks were starting to work again, at least on a pc running linux but it started getting its version of the error "VM Completion Message: No jobs were available to run" I saw 6 valids so I started one on a Windows 10 but lost it because I was at the same time messing with the OS at the same time and after freezing and having to reboot a couple times it crashed so next after I get finished with this I will try one again just running a single 2-core version and see how it works. (and watch that certain other CMS machine running with linux) |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
Still not working here so run the other task versions here for now. Mad Scientist For Life ![]() |
Send message Joined: 22 Apr 16 Posts: 710 Credit: 2,114,314 RAC: 5,123 ![]() ![]() ![]() |
CMS is waked up again, but.. too early for us.. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2744192 |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 895,029 RAC: 796 ![]() ![]() ![]() |
Your VM did not start, but mine started OK, but did not get a CMS job to run >> EXIT_NO_SUB_TASKS https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2744208 |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
It looks like they accidentally sent out 5 of those for you. I see Laurence ran some of those TensorFlow (the Theory tasks seem to be back to normal) |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
For some reason these vdi's always take a long time to d/l to where I am. My speed is not slow for anything else but this is running at 2.3KBps at best. About .5 % in 30 minutes. so 1% per hour is going to take a while. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 ![]() ![]() |
There's been one of these running here for an hour or so but it isn't actually doing any work. The set up takes quite a while. It seems to install singularity (among much else) in the VM. A quick look at the proxy log shows about 255M download for two tasks, one started the other waiting.. The setup appears to complete successfully but, although "cmsrun" appears at intervals in the "top" console and takes ca 50% CPU, no "running job", "wrapper" nor "error" outputs appear and it hasn't timed out. The host is shut down at the moment but will start itself up again later and run until 0700 GMT. I'll leave it to see what, if anything, happens. |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
Yes the VB tasks do always start slow even to get the job to start beyond Condor Ping [INFO] Condor JobID: 484151.51 in slot2 00:14:54.347983 VMMDev: Guest Log: [INFO] Condor JobID: 484151.50 in slot1 [IINNFFOO]] MMCCPPlloottss JJoobIIDD:: 4477881105049820 iinn sslloott21 [[IINNFFOO]] MMCCPPlloottss JJoobIIDD:: 4477881105049820 iinn sslloott21 [INFO] Job finished in slot1 with 0. [INFO] New Job Starting in slot1 [INFO] Condor JobID: 483974.117 in slot1 : [INFO] MCPlots JobID: 47785745 in slot1 And if you are thousands of miles away the connection tends to have the server with all those text errors but at least they start better than they did before they made it here. This vdi is 694.05MB (4 hours to d/l 10% so far and now at 4.53KBps speed) So I will just load it on this host and see how it starts after 2am here since I have the fastest speed then. If they run ok I will load this on my faster host with most ram. BUT I see maeax just got a [ERROR] Condor exited after 11212s without running a job after Run time 3 hours 22 min 12 sec So this isn't looking good (and hope we don't have to try a new Version or get one before I finish this current d/l ) |
Send message Joined: 22 Apr 16 Posts: 710 Credit: 2,114,314 RAC: 5,123 ![]() ![]() ![]() |
2018-12-21 21:18:45 (10144): Status Report: Job Duration: '64800.000000' 2018-12-21 21:18:45 (10144): Status Report: Elapsed Time: '6000.164904' 2018-12-21 21:18:45 (10144): Status Report: CPU Time: '433.171875' 2018-12-21 22:53:05 (10144): Guest Log: [ERROR] Condor exited after 11212s without running a job. 2018-12-21 22:53:05 (10144): Guest Log: [INFO] Shutting Down. So long, so good... sorry, no good. No CMS-jobs are avalaible, but the task starts well and finished. Ivan is retired and his work is coming in the next generation of running.... |
Send message Joined: 22 Apr 16 Posts: 710 Credit: 2,114,314 RAC: 5,123 ![]() ![]() ![]() |
CMS-Tasks avalaible, but without jobs: 207 (0x000000CF) EXIT_NO_SUB_TASKS https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748532 |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 794 Credit: 13,547,676 RAC: 9,769 ![]() ![]() ![]() |
I tried one and got almost the same. After 7 hours it looked like it was going to run but 30mins later I got this..... https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748799 If anyone here other than Axel and myself tries one let us know since we can't go by the stats pages here. |
Send message Joined: 22 Apr 16 Posts: 710 Credit: 2,114,314 RAC: 5,123 ![]() ![]() ![]() |
Hi Magic, there was something working in your task shown above: 2019-01-18 14:29:40 (9472): Guest Log: [INFO] CMS application starting. Check log files. 2019-01-18 15:56:59 (9472): Status Report: Job Duration: '64800.000000' 2019-01-18 15:56:59 (9472): Status Report: Elapsed Time: '6000.000000' 2019-01-18 15:56:59 (9472): Status Report: CPU Time: '857.671875' 2019-01-18 17:37:26 (9472): Status Report: Job Duration: '64800.000000' 2019-01-18 17:37:26 (9472): Status Report: Elapsed Time: '12000.000000' 2019-01-18 17:37:26 (9472): Status Report: CPU Time: '1145.546875' 2019-01-18 19:17:41 (9472): Status Report: Job Duration: '64800.000000' 2019-01-18 19:17:41 (9472): Status Report: Elapsed Time: '18000.000000' 2019-01-18 19:17:41 (9472): Status Report: CPU Time: '1875.218750' 2019-01-18 21:00:28 (9472): Status Report: Job Duration: '64800.000000' 2019-01-18 21:00:28 (9472): Status Report: Elapsed Time: '24000.844013' 2019-01-18 21:00:28 (9472): Status Report: CPU Time: '3796.703125' We have to wait up to monday for some news. |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 895,029 RAC: 796 ![]() ![]() ![]() |
After 7 hours it looked like it was going to run but 30mins later I got this..... I ran one 2 days ago, but stopped it early, cause it was obvious that it would not get a job. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2748507 I just started an other one https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2749192, but will let it run without a good feeling. We didn't hear anything from Ivan since months. It's clearly that this version has a very different setup including install of python etc. before checking HTCondor after about 8 minutes. I'll make a video of all actions before it's really waiting for a job. Edit: Normal 'EXIT_NO_SUB_TASKS' shutdown: 2019-01-20 10:26:08 (9564): Guest Log: [ERROR] No jobs were available to run. 2019-01-20 10:26:08 (9564): Guest Log: [INFO] Shutting Down. |
©2025 CERN