Message boards :
Theory Application :
Open Issues
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Here is the list of open issues. If there is something that is not listed, please post to get it added.
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have allocated 2 cores to the task. The load average is very high (15min average somtimes up to 1.81) I can only speculate how high it might be with just one core. (Maybe , i try that next) This is not a fault as such, but an efficiency issue. Other tasks (CMS, Atlas) are not anywere near that bad, under the same conditions. Finished jobs should show in the stderr with pass/fail status. I would also like to see, which app (pythia6.xxx, sherpa, herwig...)actually calculated the job. EDIT:Not to forget: Download tasks according to requested time (workbuffer) and not a fixed number. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I have allocated 2 cores to the task. With 2 cores, 1.81 seems fine. The workload may just be adapting. I would suggest trying with 1 core.
Will extend the job analysis description
Does this happen in the T4T production version? If so, please let me know where you see this.
Will add this. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
No, this is just for fault finding. There may only be certain apps causing certain issues, which we cannot tell, if we do not know, which app it is. They used to be in the logs a few days ago, but things change... |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The logs should be back. A few things broke while re-factoring today. If they are not there in new tasks from now on, let me know. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have allocated 2 cores to the task. I have tested it with only one core. Load average, as expected, very high. (1.85- 1.93 15min load average). This is with agile-runmc app on two tasks. The apps are still nowhere to be found in the logs |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
Last 4 tasks ended in computation error with no reason for me. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=155757 2016-04-23 21:35:59 (5040): Status Report: CPU Time: '35317.362792' 2016-04-23 22:03:57 (5040): Guest Log: [ERROR] Condor exited with 0 2016-04-23 22:03:57 (5040): Guest Log: [INFO] Shutting Down. 2016-04-23 22:03:57 (5040): VM Completion File Detected. 2016-04-23 22:03:57 (5040): VM Completion Message: Condor exited with 0 . 2016-04-23 22:03:57 (5040): Powering off VM. 2016-04-23 22:03:59 (5040): Successfully stopped VM. 2016-04-23 22:04:04 (5040): Deregistering VM. (boinc_394876cc1189c4ec, slot#1) 2016-04-23 22:04:04 (5040): Removing virtual disk drive(s) from VM. 2016-04-23 22:04:04 (5040): Removing network bandwidth throttle group from VM. 2016-04-23 22:04:04 (5040): Removing storage controller(s) from VM. 2016-04-23 22:04:04 (5040): Removing VM from VirtualBox. 22:04:09 (5040): called boinc_finish(1) |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Task finished after about 10h Computation error http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=155454 Everything seemed fine, last job successful. Similar to Crystal Pellet. 2 more tasks failed with computation error after 10h to 10h30min runtime. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
The odd thing is the first few lines in the stderr. <core_client_version>7.6.22</core_client_version> Function not permitted---?????? EDIT: My last valid result did not have that. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=155532 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Sorry about that. We should give credit in this case. Need to double-check what is going on when Condor exits. |
Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0 |
I have allocated 2 cores to the task. Theory apps are dual-threaded, but the second thread is only used for graphics generation and uses less than half a CPU. In the past (back in the days of cernvmwrapper) we tried allocating 2 cores per task but discontinued it as it wasted half a CPU on average. The numerical value of "load average" in any case doesn't map exactly to the number of CPU's loaded, so don't worry too much about it. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
The numerical value of "load average" in any case doesn't map exactly to the number of CPU's loaded, so don't worry too much about it. I am just concerned, that 1 core is doing work, where nearly 2 would be needed. It is just eighter wasting some cpu (2 cores) or slowing down (1 core) the task quite a bit. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks for adding the finished x.log files. Now we can see, if a job passed or failed. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
Over at VirtualLHC, the 32bit app has a base memory requirement of only 256MB. Challenge 64bit runs happily with 512MB per core. The apps here are requesting 2GB, triggering the need for app_configs to limit the number of tasks running so as not to overburden contributors' hosts. Is there any prospect of reducing the VM memory requirement before these 64s get released to production? (question applies equally for CMS and ATLAS) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Theory yes, CMS and ATLAS no. |
©2024 CERN