1)
Message boards :
Theory Application :
New cranky version explained
(Message 8172)
Posted 7 Sep 2023 by m Post: New cranky version explained I, and I'm sure, others have put a fair bit of effort into arranging things around the lack of suspend/resume for certain applications (Theory, Atlas and others) and am happy with that. So have not needed cgroups etc. The system has worked well for a long time. This latest change therefore involves a major update/reconfiguration for no useful purpose. (even assuming suspend/resume will now work to disk, allowing the host to be shut down). So, unless there is some sort of fall back option to a "basic, non suspendable" version (it looks as though any of the previous versions would do), then, sorry to say it's a show stopper for me, at least in the short to medium term.
Didn't work for me. |
2)
Message boards :
Theory Application :
New version 5.00
(Message 6806)
Posted 9 Nov 2019 by m Post: I am planning to put v5.18 on the production server tomorrow. Any objections? A quick test produced:- v5.18 failing on Linux (with CVMFS installed) and on Windows. This is from Linux:- 2019-11-08 00:45:16 (3522): Guest Log: 00:45:14 GMT +00:00 2019-11-08: cranky: [INFO] Checking CVMFS. 2019-11-08 00:45:34 (3522): Guest Log: 00:45:34 GMT +00:00 2019-11-08: cranky: [WARNING] 'cvmfs_config probe sft.cern.ch' failed. 2019-11-08 00:45:34 (3522): Guest Log: 00:45:34 GMT +00:00 2019-11-08: cranky: [INFO] Creating local CVMFS repository. 2019-11-08 00:45:34 (3522): Guest Log: sed: can't read cvmfs-mini-0.1-amd64.tgz: No such file or directory 2019-11-08 00:45:34 (3522): Guest Log: tar: option requires an argument -- 'f' 2019-11-08 00:45:34 (3522): Guest Log: Try `tar --help' or `tar --usage' for more information. 2019-11-08 00:45:34 (3522): Guest Log: /home/boinc/cranky: line 62: ./cvmfs-mini-0.1-amd64/mount_cvmfs.sh: No such file or directory 2019-11-08 00:45:34 (3522): Guest Log: 00:45:34 GMT +00:00 2019-11-08: cranky: [WARNING] 'cvmfs_config probe sft.cern.ch' failed. ..and from Windows 2019-11-09 01:09:16 (288): Guest Log: 01:09:13 GMT +00:00 2019-11-09: cranky: [INFO] Checking CVMFS. 2019-11-09 01:09:38 (288): Guest Log: 01:09:34 GMT +00:00 2019-11-09: cranky: [WARNING] 'cvmfs_config probe sft.cern.ch' failed. 2019-11-09 01:09:38 (288): Guest Log: 01:09:34 GMT +00:00 2019-11-09: cranky: [INFO] Creating local CVMFS repository. 2019-11-09 01:09:38 (288): Guest Log: sed: can't read cvmfs-mini-0.1-amd64.tgz: No such file or directory 2019-11-09 01:09:38 (288): Guest Log: tar: option requires an argument -- 'f' 2019-11-09 01:09:38 (288): Guest Log: Try `tar --help' or `tar --usage' for more information. 2019-11-09 01:09:38 (288): Guest Log: /home/boinc/cranky: line 62: ./cvmfs-mini-0.1-amd64/mount_cvmfs.sh: No such file or directory 2019-11-09 01:09:38 (288): Guest Log: 01:09:34 GMT +00:00 2019-11-09: cranky: [WARNING] 'cvmfs_config probe sft.cern.ch' failed. The startup messages are as CPs post here This is via the local proxy and this isn't so it doesn't seem to be a cache problem. |
3)
Message boards :
Theory Application :
Native Theory Application in Production
(Message 6360)
Posted 9 May 2019 by m Post: This is how the scheduling is working at the moment.
The current availability of SixTrack work has enabled a wider check. Of the three LHC subprojects available to this host the server will only send work for one. The priority order being, from the highest, Theory Native, Atlas, SixTrack. i.e if all three are enabled and available, only Theory tasks are sent; if Atlas and SixTrack are enabled and available, only Atlas tasks are sent. SixTrack tasks are only sent if available when only SixTrack is enabled. Hosts running Theory VBox and SixTrack work as expected (they don't run Atlas). |
4)
Message boards :
Theory Application :
Native Theory Application in Production
(Message 6339)
Posted 3 May 2019 by m Post:
Hosts here shut down (aka switch off) during the day.and run overnight (cheaper electricity, unmetered internet and I can use the heat) There must be many (potential) volunteers who want or need to shut their computers down overnght, over the weekend or whatever without a lot of manual attention.. |
5)
Message boards :
Theory Application :
Native Theory Application in Production
(Message 6322)
Posted 3 May 2019 by m Post: Some comments The same applies to Theory native and Atlas (this is from the production server). The server will only send Atlas if it can't send Theory, either because of preference settings or because there are no Theory tasks available (as happened a few weeks ago) This is a " VBox free" host where there have been >50 Theory tasks in a row without the server sending an Atlas task. I thought that the server may be trying to equalise the credit between the sub-projects but this is greater than the difference in credit; unless it's trying to make up the backlog which, for me, will take a long time unless the credit for Theory increases considerably. Also, The need to run hosts continuously is a problem for me and, I expect, for many others as well, especially if the project wants to widen it's potential pool of volunteers. I know that with a bit of babysitting and some clever control scripts this can be overcome to some extent, but we're getting a long way from the original ideas of BOINC. |
6)
Message boards :
CMS Application :
CMS jobs becoming available again
(Message 6169)
Posted 7 Mar 2019 by m Post: Welcome back, Ivan. "The good news is that a new grant has been received and I should go back to full-time work" is good news indeed. but the "slough of despond" is putting it a bit strongly. and Laurence hadn't configured the wrapper to pick up the output messages. Perhaps that's why there are no outputs on the "running job", "wrapper" or "error" terminals although cmsRun is shown as using >90% CPU. I accidentally quit "top" but don't know how to restart it so can't see if it's still running. How can "top" (F3) be restarted if accidentally quit? |
7)
Message boards :
Theory Application :
Windows Version
(Message 6106)
Posted 27 Feb 2019 by m Post: I've been unable to get any of these tasks to start.on a couple of Win7 hosts https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=246 and https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=245 The failures have been slightly different as Laurence has made changes (4.16 to 4.18 etc) at the server end and I've made changes (memory, task priorities, Vbox 5.1.x to 5.2.x) to the hosts. At present (v4.16) the VM setup fails at the message "Started update UTMP about system Runlevel changes" At this point the RDP connection to the VM is lost. The wrapper runs on for a minute or so before failing. The vbox log shows various file read failures from the slot with "can't find path" errors. The slot contents seem OK although this is after the failure. The VMs are left as orphans afterwards. Testing is very difficult since both hosts are now down to 1 task per day., although they normally get two at once.. I'm sure I'm missing something obvious but have run out of ideas. I've a host shut down at the moment with two failures unreported so logs should still be there. |
8)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5941)
Posted 18 Feb 2019 by m Post:
OK. now. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752887 Don't know how often MCPlots is updated, but they aren't there yet. Slots seem OK now, too... just got to go and clean up the previious left-overs. |
9)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5939)
Posted 18 Feb 2019 by m Post: With the new version 4.13 (cranky-0.0.20) I get only errors. Same here, the physics app finishes OK, From this task:- https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752862 Generator run finished successfully 100000 events processed dumping histograms... Rivet.Analysis.Handler: INFO Finalising analyses Rivet.Analysis.Handler: INFO Processed 100000 events The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694) Processing histograms... input = /shared/tmp/tmp.b9ZK1W7cQa/flat output = /shared ./runRivet.sh: line 742: 205 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /shared) mc: ATLAS_2011_S9131140_d01-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-dressed/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d01-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-bare/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d02-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-dressed/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-.....(snip).....ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-bare/7000/ATLAS_2011_S9131140.dat Disk usage: 2440 Kb CPU usage: 12136 s Clean tmp ... Run finished successfully but then the task fails 15:35:16 2019-02-18: cranky-0.0.20: [INFO] Running Container 'runc'. ===> [runRivet] Mon Feb 18 15:35:16 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - pythia8 8.235 default-CD 100000 19] 19:00:41 2019-02-18: cranky-0.0.20: [INFO] Preparing output. tar: local.txt: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors 19:00:42 (17293): cranky exited; CPU time 12083.959181 19:00:42 (17293): app exit status: 0x2 19:00:42 (17293): called boinc_finish(195) The next one is at ~70000 events so I'll let it run and see what happens. |
10)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5917)
Posted 15 Feb 2019 by m Post: Running CentOS Linux 7 .6 (Core) 3.10.0-957.5.1.el7.x86_64 Whatever caused this:- 00:51:44 (5769): wrapper (7.7.26015): starting 00:51:44 (5769): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.14 () cranky-0.0.14 INFO: Starting cranky-0.0.14 INFO: Detected Theory App cranky-0.0.14 INFO: Checking CVMFS. cranky-0.0.14 INFO: Checking runc. cranky-0.0.14 INFO: Creating the filesystem. cranky-0.0.14 INFO: Using /cvmfs/cernvm-prod.cern.ch/cvm3 cranky-0.0.14 INFO: Updating config.json. cranky-0.0.14 INFO: Running Container 'runc'. cranky-0.0.14 ERROR: Container 'runc' failed. 00:55:27 (5769): cranky exited; CPU time 20.553370 00:55:27 (5769): app exit status: 0xce 00:55:27 (5769): called boinc_finish(195) was fixed between cranky 0.0.14 and 0.0.17 (I didn't try xx15 or xx16) Perhaps because Laurence wrote:- I have rewritten the script in bash to make is more portable between Linux systems. Now running OK. I've still got some tidying up to do... |
11)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5877)
Posted 13 Feb 2019 by m Post: I got shut out before I could finish editing this After upgrading, I get this [m@TeeC15 ~]$ sudo echo 100 > /proc/sys/user/max_user_namespaces -bash: /proc/sys/user/max_user_namespaces: Permission denied But this $ cat /proc/sys/user/max_user_namespaces 32767 looks right The file permissions are:- { user]$ ls -l total 0 -rw-r--r--. 1 root root 0 Feb 13 03:26 max_ipc_namespaces -rw-r--r--. 1 root root 0 Feb 13 03:26 max_mnt_namespaces -rw-r--r--. 1 root root 0 Feb 13 03:26 max_net_namespaces -rw-r--r--. 1 root root 0 Feb 13 03:26 max_pid_namespaces -rw-r--r--. 1 root root 0 Feb 13 01:30 max_user_namespaces -rw-r--r--. 1 root root 0 Feb 13 03:26 max_uts_namespaces [ user]$ Are these right? they don't look right The task fails like this 00:51:44 (5769): wrapper (7.7.26015): starting 00:51:44 (5769): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.14 () cranky-0.0.14 INFO: Starting cranky-0.0.14 INFO: Detected Theory App cranky-0.0.14 INFO: Checking CVMFS. cranky-0.0.14 INFO: Checking runc. cranky-0.0.14 INFO: Creating the filesystem. cranky-0.0.14 INFO: Using /cvmfs/cernvm-prod.cern.ch/cvm3 cranky-0.0.14 INFO: Updating config.json. cranky-0.0.14 INFO: Running Container 'runc'. cranky-0.0.14 ERROR: Container 'runc' failed. 00:55:27 (5769): cranky exited; CPU time 20.553370 00:55:27 (5769): app exit status: 0xce 00:55:27 (5769): called boinc_finish(195) Not very informative but no unshare error.and no runc error either. Host is running an Atlas native task at the moment - just as a check that things still work - OK so far |
12)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5876)
Posted 13 Feb 2019 by m Post:
After upgrading, I get this [m@TeeC15 ~]$ sudo echo 100 > /proc/sys/user/max_user_namespaces -bash: /proc/sys/user/max_user_namespaces: Permission denied But this $ cat /proc/sys/user/max_user_namespaces 32767 indicates that it should work. From here https://stackoverflow.com/questions/39215025/how-to-check-if-linux-user-namespaces-are-supported-by-current-os-kernel This You could check if your current process' /proc/[pid]/ns/ directory has a file called user: l'm not sure how to find the pid but the one for the boinc process (which has been running throughout) does have this file, but it's empty. The task fails like this 00:51:44 (5769): wrapper (7.7.26015): starting 00:51:44 (5769): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.14 () cranky-0.0.14 INFO: Starting cranky-0.0.14 INFO: Detected Theory App cranky-0.0.14 INFO: Checking CVMFS. cranky-0.0.14 INFO: Checking runc. cranky-0.0.14 INFO: Creating the filesystem. cranky-0.0.14 INFO: Using /cvmfs/cernvm-prod.cern.ch/cvm3 cranky-0.0.14 INFO: Updating config.json. cranky-0.0.14 INFO: Running Container 'runc'. cranky-0.0.14 ERROR: Container 'runc' failed. 00:55:27 (5769): cranky exited; CPU time 20.553370 00:55:27 (5769): app exit status: 0xce 00:55:27 (5769): called boinc_finish(195) Not very informative but no unshare error.and no runc error either. Host is running an Atlas native task at the moment - just as a check that things still work - OK so far. |
13)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5875)
Posted 12 Feb 2019 by m Post: ... $ cat /proc/cmdline Well spotted, many thanks....(bigger font...more powerful glasses...more coffee...) Unfortunately it didn't change anything. I think this is what you need to do. From the 7.4 release notes: Waitng for cheap internet time... hopefully it won't break anything. |
14)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5859)
Posted 11 Feb 2019 by m Post:
Well, I now have $ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-514.10.2.el7.x86_64 root=/dev/mapper/cl_teec15-root ro rd.lvm.lv=cl_teec15/root rd.lvm.lv=cl_teec15/swap rhgb quiet LANG=en_GB.UTF-8 user_namespace.enable=1 namespace unpriv_enable=1 and $ cat /etc/sysctl.d/51-userns.conf user.max_user_namespaces = 32767 but unshare still doesn't work. $ unshare -U /bin/bash [nfsnobody@TeeC15 ~]$ Short of updating the kernel, I don't know what else to do. |
15)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5855)
Posted 11 Feb 2019 by m Post: Let me know if you figure it out otherwise I will investigate after looking into Opensuse.I have found a solution. Taken from the Debian mail logs: There is a similar situation in Centos7 (7.3 in my case) here:- https://groups.io/g/charliecloud/topic/charliecloud_and_centos7/13269608?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,13269608 I haven't tried it yet. |
16)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5851)
Posted 11 Feb 2019 by m Post: OK, thanks, got one with this result. nsenter: failed to unshare namespaces: Operation not permitted container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 46\"" cranky-0.0.14 ERROR: Container 'runc' failed. Test Commands [m@TeeC15 ~]$ unshare -U /bin/bash [nfsnobody@TeeC15 ~]$ and [m@TeeC15 ~]$ sudo unshare -U /bin/bash [nfsnobody@TeeC15 m]$ and [m@TeeC15 ~]$ grep CONFIG_USER_NS /boot/config-$(uname -r) CONFIG_USER_NS=y [m@TeeC15 ~]$ CONFIG_USER_NS=y |
17)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5849)
Posted 11 Feb 2019 by m Post: m wrote: It's OK, thanks. Just more fingers than keys, more keys than brain cells, or so it seems. Would very much like to see if the python 3 problem is fixed but no work now. |
18)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5844)
Posted 11 Feb 2019 by m Post: ] On Centos7 python gets you python2.7 which the rest of the system needs. After simply installing python3.X you seem to need "python3.x" note the dot, "python3x" doesn't work, not for me, anyway. /usr/bin/env python Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. /usr/bin/env python3 /usr/bin/env: python3: No such file or directory /usr/bin/env python3.6 Python 3.6.7 (default, Dec 5 2018, 15:02:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. .... trying to get an app_config to get this host to only download one task at a time, but I can't even get that right now..... says "missing start tag", looks OK to me; aaaaarrrgggh. |
19)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5834)
Posted 11 Feb 2019 by m Post: By default the maximum number of namespaces is set to 0. To fix this run: I think that this only applies to kernel versions 644 and up, mine is 514 so should be OK. I had tried this:- sudo sysctl user.max_user_namespaces=15000 and got .".. /proc/sys/user/max_user_namespaces "no such file or directory" so .assumed it didn't apply - yet. . From here:- For background information, see this bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1340238 runc here is version 1.0.0. If the setup is to be done from CVMFS do I need to remove the installed version? I don't see why I can't get "python3" to go to "python3.6" but if it's to be fixed that's great --- lots to learn. The host is running an Atlas native at the moment so I'll need to wait for it to finish... the job starts again from the beginning if I even think about suspending it... (that's something else that needs sorting out...) |
20)
Message boards :
Theory Application :
New Native App - Linux Only
(Message 5820)
Posted 11 Feb 2019 by m Post: The python3 error is back. /usr/bin/env: python3: No such file or directory Python3.6 is in ./usr/bin/python3.6 and the above command works if python3 is changed to python3.6. so must be missing from an environment or path somewhere. I haven't been able to work out how (or where) to properly fix this (or to add an alias.?) The existing python (which works) should provide a clue - but I can't find that either... so stuck for now |
©2025 CERN