Message boards : Theory Application : Status
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6163 - Posted: 7 Mar 2019, 13:48:56 UTC - in response to Message 6161.  

It should be possible to pause the container with the following command:
sudo /cvmfs/grid.cern.ch/vc/containers/runc --root /var/lib/boinc-client/slots/0/cernvm/ pause Theory_859210_1543416190.499432_0

But I am getting the following error:
no such directory for freezer.state


If anyone has any ideas, please let me know.
Did you give the above command as user Laurence or as user boinc?
runc pause may fail if you don't have the full access to cgroups


I have tried as myself and root. Have contacted the developers.
ID: 6163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6172 - Posted: 8 Mar 2019, 10:58:14 UTC - in response to Message 6163.  

I have tried as myself and root. Have contacted the developers.

Looks like it is a cgroup configuration issue.
ID: 6172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6174 - Posted: 8 Mar 2019, 12:54:20 UTC - in response to Message 6172.  

I have tried as myself and root. Have contacted the developers.

Looks like it is a cgroup configuration issue.

Problem solved! Need to create a cgroup and add the following to the config.json
	"cgroupsPath": "/boinc",
ID: 6174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 6176 - Posted: 8 Mar 2019, 13:49:02 UTC - in response to Message 6174.  

Problem solved! Need to create a cgroup and add the following to the config.json
	"cgroupsPath": "/boinc",

When every simple Linux user has to do that himself, I think that will radically reduce the number of BOINC users for the Theory native application.

Could you provide more detailed information how to create the cgroup with which parameters.
??
groupadd -g group-ID cgroup
Which ID?

And where in config.json should that line be inserted? Environment maybe?:
       "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/shared/bin",
            "TERM=xterm",
            "cgroupsPath": "/boinc",
        ],
ID: 6176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6177 - Posted: 8 Mar 2019, 18:58:40 UTC - in response to Message 6176.  

When every simple Linux user has to do that himself, I think that will radically reduce the number of BOINC users for the Theory native application.

I don't expect users to do this. I have to package this up and make it automatic. Was trying to get something ready today but stopped myself as last minute Friday afternoon specials are never a good idea.
ID: 6177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6202 - Posted: 12 Mar 2019, 11:09:42 UTC - in response to Message 6083.  

The current priority is to get the native Theory app production ready. The recent experience on dev suggests that it should be a separate app to the VM apps as they have different requirements at least for memory and disk. The two main improvements needed are:

  1. Fix suspend/resume
  2. Detect bad hosts and restrict the number of jobs sent.


If there is anything else, please let me know. We can revisit the VM apps once the native app is solid.



Both of the above improvements have been done. Is there anything else that needs addressing before we can move this to production? Note that I will add this as a separate app as it has different disk, memory and runtime requirements than the VM app.
ID: 6202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 318
Message 6204 - Posted: 12 Mar 2019, 12:03:31 UTC

As more and more apps with different and partly conflicting requirements become active it may be a good idea to increase the number of venues.
ID: 6204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6206 - Posted: 12 Mar 2019, 12:43:02 UTC - in response to Message 6204.  

As more and more apps with different and partly conflicting requirements become active it may be a good idea to increase the number of venues.


This is hard-coded in the BOINC software. I would suggest that you open an issue in the BOINC issue tracker.
ID: 6206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 6207 - Posted: 12 Mar 2019, 13:25:02 UTC - in response to Message 6202.  

Both of the above improvements have been done. Is there anything else that needs addressing before we can move this to production? Note that I will add this as a separate app as it has different disk, memory and runtime requirements than the VM app.

Only Linux Theory native, I suppose.
Minor issue, could you 2 lines reduce to 1:
11:40:13 (10717): wrapper (7.15.26016): starting
11:40:13 (10717): wrapper (7.15.26016): starting

When going into production, inform the users that
- When a task is suspended, the task stays in memory ignoring the setting of "Leave non-GPU tasks in memory while suspended".
- When BOINC is restarted the task will start from the very beginning (no checkpointing).
ID: 6207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6208 - Posted: 12 Mar 2019, 15:23:01 UTC - in response to Message 6207.  

Only Linux Theory native, I suppose.

Yes.
Minor issue, could you 2 lines reduce to 1:
11:40:13 (10717): wrapper (7.15.26016): starting
11:40:13 (10717): wrapper (7.15.26016): starting


This is an upstream BOINC issue. The message is printed here and here.
I will open an issue and make a PR along side updating our wrapper.

When going into production, inform the users that
- When a task is suspended, the task stays in memory ignoring the setting of "Leave non-GPU tasks in memory while suspended".
- When BOINC is restarted the task will start from the very beginning (no checkpointing).

Yes.
ID: 6208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 6209 - Posted: 12 Mar 2019, 15:55:08 UTC - in response to Message 6208.  

Minor issue, could you 2 lines reduce to 1:
11:40:13 (10717): wrapper (7.15.26016): starting
11:40:13 (10717): wrapper (7.15.26016): starting
This is an upstream BOINC issue. The message is printed here and here.
I will open an issue and make a PR along side updating our wrapper.
Strange.
With version 4.18 that line was printed only once.
Example: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2759454

The double printing was introduced at last with version 4.21.
ID: 6209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6212 - Posted: 12 Mar 2019, 22:04:28 UTC - in response to Message 6209.  

Here is a reply that explains why it is printed twice.
ID: 6212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6219 - Posted: 13 Mar 2019, 10:29:07 UTC - in response to Message 6202.  

Both of the above improvements have been done. Is there anything else that needs addressing before we can move this to production? Note that I will add this as a separate app as it has different disk, memory and runtime requirements than the VM app.

Unless anyone provides any objections, on Monday I will enable this on the production project.
ID: 6219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,034
RAC: 5,700
Message 6220 - Posted: 13 Mar 2019, 10:57:57 UTC - in response to Message 6219.  

A small notice.
Today, had ALT+F4 taken for Boinc (x-Button). After starting Boinc again, all 5 tasks running from the beginning.
My fault,
When you transfer it to production, become native a selection in preferences for testing, or is it the only Application for Linux (no VM)?
ID: 6220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 6221 - Posted: 13 Mar 2019, 11:02:25 UTC - in response to Message 6220.  

A small notice.
Today, had ALT+F4 taken for Boinc (x-Button). After starting Boinc again, all 5 tasks running from the beginning.
My fault,
When you transfer it to production, become native a selection in preferences for testing, or is it the only Application for Linux (no VM)?

It is only the native Linux application.
ID: 6221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 6225 - Posted: 13 Mar 2019, 13:51:19 UTC - in response to Message 6219.  

Both of the above improvements have been done. Is there anything else that needs addressing before we can move this to production? Note that I will add this as a separate app as it has different disk, memory and runtime requirements than the VM app.

Unless anyone provides any objections, on Monday I will enable this on the production project.
No objections, but cause it will not run out of the box for new users, who didn't install needed packages,
you may overview your instructions mentioned in your Native Setup Linux thread and extend it with
instructions/remarks which are made afterwards keeping in mind that also ATLAS native have to run on those machines.
ID: 6225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,034
RAC: 5,700
Message 6234 - Posted: 19 Mar 2019, 12:57:26 UTC - in response to Message 6083.  

The current priority is to get the native Theory app production ready. The recent experience on dev suggests that it should be a separate app to the VM apps as they have different requirements at least for memory and disk. The two main improvements needed are:

  1. Fix suspend/resume
  2. Detect bad hosts and restrict the number of jobs sent.


If there is anything else, please let me know. We can revisit the VM apps once the native app is solid.


Laurence,
now more than 50 Tasks in production for Theory-native without any problem so long.
Thanks for your good work.
How is it possible to combine Atlas+Theory both native in one pref (Home,School or Work)?
Cpu is only avalaible with one parameter (1 or more CPU's).
When Theory using 1 Cpu how can Atlas be defined with 4 CPU's?
ID: 6234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 6235 - Posted: 19 Mar 2019, 13:41:26 UTC - in response to Message 6234.  

When Theory using 1 Cpu how can Atlas be defined with 4 CPU's?
app_config.xml
ID: 6235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Theory Application : Status


©2024 CERN