Message boards : News : Workplan
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 1925 - Posted: 9 Feb 2016, 21:45:59 UTC

This thread will be used to provide information on the status of issues/improvements. Strikethrough will be used to show items done (check for issues), bold for in progress and italics for things on the to-do list.
ID: 1925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 1928 - Posted: 9 Feb 2016, 22:01:51 UTC
Last modified: 9 Feb 2016, 22:03:41 UTC


  • Graceful shutdown
  • Shutdown on error instead of sleep
  • Error messages to task's stderr
  • WMAgent Jobs
  • Zombie Tasks
  • Shutdown if VM fails to boot e.g. kernel panic for hangs
  • Suspend-resume behaviour
  • Shutdown if no jobs
  • LHCb app

ID: 1928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 1959 - Posted: 11 Feb 2016, 8:56:14 UTC - in response to Message 1928.  
Last modified: 11 Feb 2016, 21:57:17 UTC


  • Zombie Tasks
  • Shutdown if VM fails to boot e.g. kernel panic for hangs
  • WMAgent Jobs
  • Suspend-resume behaviour
  • Job Output to BOINC log
  • Shutdown if no jobs
  • CVMFS Proxies
  • LHCb app

ID: 1959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1978 - Posted: 13 Feb 2016, 10:13:56 UTC

At some point, all the thousands of "ghosts" need to be cleared out of the user accounts.
ID: 1978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,173
RAC: 266
Message 1987 - Posted: 13 Feb 2016, 11:41:21 UTC - in response to Message 1978.  

At some point, all the thousands of "ghosts" need to be cleared out of the user accounts.

I think they'll disappear when the time limit is reached. I believe it was one week for most, so just a few days to go...
ID: 1987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2010 - Posted: 15 Feb 2016, 20:50:53 UTC - in response to Message 1959.  


  • Job Output to BOINC log
  • WMAgent Jobs
  • Suspend-resume behaviour
  • VM terminating due to missing heartbeat file
  • Blackhole VMs
  • Shutdown if no jobs
  • CVMFS Proxies
  • LHCb app

ID: 2010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2187 - Posted: 3 Mar 2016, 14:43:08 UTC - in response to Message 2010.  


  • VM terminating due to missing heartbeat file
  • CVMFS Proxies
  • LHCb app
  • Shutdown if no jobs
  • WMAgent Jobs
  • Suspend-resume behaviour
  • Blackhole VMs



ID: 2187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2216 - Posted: 4 Mar 2016, 11:15:35 UTC - in response to Message 2187.  


  • Shutdown if no jobs
  • Blackhole VMs
  • WMAgent Jobs
  • Suspend-resume behaviour

ID: 2216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2390 - Posted: 15 Mar 2016, 10:24:41 UTC
Last modified: 15 Mar 2016, 10:27:03 UTC

Tasks end after first run
ID: 2390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2391 - Posted: 15 Mar 2016, 13:54:31 UTC - in response to Message 2390.  

Will keep it as it is for now as things are working well and it will speed up the iterations if we need a new VM to update a configuration parameter.
ID: 2391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2392 - Posted: 15 Mar 2016, 14:07:37 UTC - in response to Message 2391.  
Last modified: 15 Mar 2016, 14:08:51 UTC

Maybe it is really better to keep it this way, as there is no time saving involved having a 2nd run versus a new task.
ID: 2392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 238
Message 2395 - Posted: 15 Mar 2016, 16:01:40 UTC - in response to Message 2391.  

Will keep it as it is for now as things are working well and it will speed up the iterations if we need a new VM to update a configuration parameter.

It's running fine now, only a bit more overhead when BOINC gets a new task and have to create a new VM and
that new VM have to load data that else w(c)ould have been used for 4 runs (24 hours).

One point if you'll keep this single run 'for ever'. The max duration of the last 100 BOINC-tasks is 26.37 hours.
Do you plan to create a new CMS_2016_Month_Date.xml to reduce BOINC job_duration to let's say 12 or 18 hours.
ID: 2395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2397 - Posted: 15 Mar 2016, 16:13:06 UTC - in response to Message 2395.  

I am open to suggestions. The value for 24 hour tasks came from test4theory and as far as I understand a compromise between the overhead of restarting a VM and the desire for volunteers to see credit/task progress. We can change the length of the task and also the length of the the run. So instead of having 4 runs in 24 hours we could have 1 run that executes 16 jobs instead of 4. What task length would you suggest works best and is there any advantage of having multiple runs or should we just increase the number of jobs that a run does?
ID: 2397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 238
Message 2398 - Posted: 15 Mar 2016, 16:42:33 UTC - in response to Message 2397.  

There is another 'unknown' factor. The scientist decides how many events should be processed in 1 job.

I prefer jobs with a duration of about 1 hour on an average computer.
We don't have to decide how many jobs should be done in 1 glidein run. That depends on the computer speed.
When the one and only glidein has run for 12 hours, it could send the shutdown (kill) to vboxwrapper after the last job has finished and
just for sake BOINC's job_duration set to 18 hours for the reason, that BOINC is not obeying the shutdown.
ID: 2398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2405 - Posted: 16 Mar 2016, 9:51:22 UTC - in response to Message 2398.  

ID: 2405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2478 - Posted: 21 Mar 2016, 14:45:41 UTC - in response to Message 2405.  


  • 12h hour glidein
  • Check for failures between jobs
  • WMAgent Jobs
  • Suspend-resume behaviour

ID: 2478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2564 - Posted: 24 Mar 2016, 23:51:21 UTC - in response to Message 2478.  


  • 12h hour glidein
  • Check for failures between jobs
  • WMAgent Jobs
  • Suspend-resume behaviour
  • Validate CVMFS before running jobs

ID: 2564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Workplan


©2024 CERN