Multi-core jobs available for CMS@Home-dev
We are currently testing multi-core jobs for CMS@Home. Note that these will only run in -dev as the main project does not currently allow you to select multi-core VMs. We currently have 2-core and 4-core tasks in the queue, so please try selecting 4-core in your machine preferences, and let us know how it works.
25 Mar 2024, 17:09:12 UTC
· Discuss
Database upgrade
The MySQL database for LHC@home-dev has been upgraded to MySQL8. Normally this should not affect any of the functionality of the BOINC applications.
30 Aug 2023, 9:31:28 UTC
· Discuss
Server Release 1.4.0
The server has been upgraded to the 1.4.0 tag for evaluation before the official release. Please let us know if there are any issues.
9 Nov 2022, 14:20:05 UTC
· Discuss
CMS job queue to drain this weekend (21/08/2021)
CMS is about to release a new version of WMAgent based entirely on python 3. They have asked that they be able to update our agent by Monday evening (23/08), so I will not inject any new workflows before the upgrade. I expect the job queue to drain by late on Sunday.
Please set your CMS application to no new tasks by then.
20 Aug 2021, 15:32:25 UTC
· Discuss
CMS job queue to drain this weekend (21/08/2021)
CMS is about to release a new version of WMAgent based entirely on python 3. They have asked that they be able to update our agent by Monday evening (23/08), so I will not inject any new workflows before the upgrade. I expect the job queue to drain by late on Sunday.
Please set your CMS application to no new tasks by then.
20 Aug 2021, 15:30:56 UTC
· Discuss
Server upgrade
The LHC@home dev server has been upgraded to server release 1.2.1. (in practice it was running this code already, but now the project has undergone the usual server upgrade process.)
14 Apr 2020, 6:48:17 UTC
· Discuss
CMS@Home disruption this week
It appears that a database intervention at CERN went badly, leaving our data tables empty and us not being able to submit new CMS@Home jobs. Advice is that it will take several days to recover -- and as well as that some of the major players are in the USA, which has holidays for the rest of this week. I'll keep an eye on it, but I'm doubtful we'll be running again this week. Sorry 'bout that!
Happy Thanksgiving...
27 Nov 2019, 8:22:38 UTC
· Discuss
CMS job shortage Wednesday 13th November
CMS IT will be installing a new version of WMAgent on Wednesday. This will impact job availability for the duration of the intervention. We might be able to eliminate the little gremlin that's been plaguing us for the last few weeks, too.
So, please set your CMS processors to No New Tasks sometime tomorrow, Tuesday 12th, so that current tasks will stop requesting new jobs before the queues get cut. I'll let you know when jobs are available again.
Thanks.
11 Nov 2019, 15:50:55 UTC
· Discuss
Updated server code
We have updated the lhcathome-dev server code to the latest BOINC server release, 1.1.
Please let us know if you should spot any new bug or unexpected behaviour.
6 Sep 2019, 7:03:26 UTC
· Discuss
CMS@Home: Disruption to our condor server next Monday
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5087#39376
17 Jul 2019, 13:16:37 UTC
· Discuss
Using a local proxy to reduce network traffic for CMS
Thanks to computezrmle, with additional work from Laurence and a couple of CMS experts (and my adding one line to the site-local-config file) there is now a way to set up a local caching proxy to greatly reduce your network traffic. Each job instance that runs within s CMS BOINC task must retrieve a lot of set-up data from our database. This data doesn't change very often, so if you keep a local copy the job can access that rather than going over the network every time.
Instructions on how to do this are available at https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.phpp?id=475&postid=6396 or https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5052&postid=39072
7 Jun 2019, 14:22:51 UTC
· Discuss
CMS -- Please set "no new tasks"
Hi to all CMS-ers. We need to drain the job queue so that a new version of the WMAgent can be installed.
Can you please set No New Tasks so that your current tasks can run out and no new jobs start? If you have any tasks waiting to run, please suspend or abort them.
Thanks, I'll let you know as soon as the change is done.
14 May 2019, 14:59:27 UTC
· Discuss
Problem writing CMS job results; please avoid CMS tasks until we find the reason
Since some time last night CMS jobs appear to have problems writing results to CERN storage (DataBridge). It's not affecting BOINC tasks as far as I can see, they keep running and credit is given. However, Dashboard does see the jobs as failing, hence the large red areas on the job plots.
Until we find out where the problem lies, it's best to set No New Tasks or otherwise avoid CMS jobs. I'll let you know when things are back to normal again.
18 Apr 2019, 15:46:05 UTC
· Discuss
CMS jobs
The batch I submitted last night is now showing on the monitor, so you can resume tasks at will.
23 Mar 2019, 17:55:46 UTC
· Discuss
Warning: possible shortage of CMS jobs - set No New Tasks as a precaution
There was an intervention (i.e. upgrade) yesterday afternoon[1] on the cmsweb-testbed system we use to submit CMS workflows that left things a bit confused. One problem was fixed, and the monitor shows all good. However, we are running out of CMS jobs -- maybe 10 hours left -- but the new batch I submitted yesterday isn't showing up on the testbed monitor. I submitted another last night but still neither are being shown this morning, so I submitted yet another batch.
At the moment I don't know whether the submission has failed or whether the monitor hasn't picked up the new batches. As a precaution, set No New Tasks on your CMS project(s) to avoid tasks crashing due to lack of jobs. I'll let you know as soon as I'm sure jobs are available again.
[1] How many times do I have to tell people not to touch critical systems on a Friday -- especially Friday afternoon!?
23 Mar 2019, 11:32:45 UTC
· Discuss
Dev server updated
We have updated our development server to the latest BOINC server release.
Please let us know if you spot any issues.
The server may go down for a while if we need iterations on this.
14 Nov 2018, 12:40:52 UTC
· Discuss
ATLAS load tests
For information: We have increased limits pr. host on the dev project as part of a campaign to test our new storage backend.
Expect most new ATLAS tasks to be pulled by our local cluster hosts and the LCG cluster in Bejing. But there are tasks for other applications too for those interested. There might be interruptions too, as this is our development and testbed. :-)
27 Feb 2018, 15:49:29 UTC
· Discuss
LHCathome- dev server interruptions
There will be some interruptions on the dev project this week while we test a new BOINC version and migrate to another host.
23 Jan 2018, 12:36:09 UTC
· Discuss
New logo for LHC@home?
We are planning to update our web presence for LHC@home and in this context we got a couple of proposals for a new LHC@home logo from our graphics team.
Please vote your preference among the logos on this Doodle poll page.
Please note that this is work in progress, and that these images may be adjusted. Also if you have other proposals of your own, please do not hesitate to comment and display/link alternatives here in the forum. :-)
Many thanks in advance for your help and feedback!
..The LHC@home team
13 Nov 2017, 13:55:18 UTC
· Discuss
Development project "face lift"
The development project has been upgraded to use the latest BOINC upstream code. Thus, it now has a new look based on Bootstrap in accordance with other projects like SETI@home.
Your feedback and suggestions are, as always, appreciated.
11 Oct 2017, 8:16:29 UTC
· Discuss
Server Migration
Over the next week, some of the servers will be migrated in order to upgrade from SL6 to CC7. This will hopefully be transparent but if there are issues, this may be the reason.
6 Oct 2017, 8:30:12 UTC
· Discuss
Server Upgrade
The server will be upgraded today to use the latest version of the Web interface. We anticipate that things may break so expect some instability over the next few days. What we discover by doing this will hopefully make the upgrade of the production server smoother.
3 Oct 2017, 7:45:01 UTC
· Discuss
Reminder -- CMS jobs unavailable Weds 27th September
An upgrade to the CMS@Home workflow management system (WMAgent) is planned for tomorrow (Wed Sep 27th). This needs the current batch of jobs to be stopped so that the queue is empty. I plan to do this about 0700-0800 UTC on Wednesday.
To avoid "error while computing" task failures and the resulting back-off of your daily quotas, we suggest you set all your CMS machines to No New Tasks at least 12 hours beforehand to allow current tasks to time out in the normal way. You can stop BOINC once all your tasks are finished, if you wish.
Exactly how long the intervention will take is unclear, and there will be a delay of up to an hour to get a new batch of jobs queued afterwards. I will post here when jobs are available again, hopefully before the end of the day European time.
26 Sep 2017, 12:25:22 UTC
· Discuss
Multicore Shutdown
A new feature has been implemented to shutdown multicore VMs at the end of their life when the idle time > busy time. This means that if we are wasting more time with empty cores than we would loose by kill the remaining jobs, those jobs are killed. This should avoid the situation where a looping Theory job keeps the VM alive while wasting idle cores. If this works the following message should be seen in the task output.
Multicore Shutdown: Idle > Busy ( 1439s > 1382s )
Server upgrade
We will migrate the lhcathome-dev project to a Centos7 server and upgrade the BOINC server components.
The lhcathome-dev project server will be unreachable for a while later today during the upgrade.
11 Aug 2017, 8:22:39 UTC
· Discuss
CMS application job queue is being run down.
We want to update the WMAgent job controller, so I've stopped the next batch (I hope). We should run out of jobs in 10-12 hours, so set any machine running CMS tasks to No New Tasks as soon as practicable. Should be up again tomorrow.
26 Jun 2017, 16:00:39 UTC
· Discuss
CGI testing
We are trying to improve the job limiter that is currently active, so you might experience some instabilities for the rest of the day.
Thank you for your understanding.
23 Jun 2017, 8:53:53 UTC
· Discuss
Dev server downtime Tue 6 June
We need to rebuild the dev VM as it runs out of memory. Thus the LHC@home-dev project server will be unreachable for part of the day today,
6 Jun 2017, 7:29:39 UTC
· Discuss
Dev server was down today
We had an issue with our dev server and had to rebuild it today. Should be back in business now. Sorry for the trouble.
23 May 2017, 15:23:14 UTC
· Discuss
Screenshot display
A new feature has been added. The screenshot that is captured in the event of an error during the job execution is now displayed at the very bottom of the result page. So, if a task errors out, the state of the VM at the time of the error can be seen in the task's results.
We would greatly appreciate any feedback on this new feature.
Thank you for your invaluable support!
25 Apr 2017, 12:55:34 UTC
· Discuss
New native Linux ATLAS application
Hi all,
If you don't use Linux you can ignore the rest of this post. If you do you may be interested in trying the experimental ATLAS app which doesn't use virtualbox but runs natively on Linux.
IMPORTANT!! To run this app you must install CVMFS, the CERN VM File System, and configure it for ATLAS. This file system contains all the software for ATLAS WU and is normally inside the virtual image (the same as for all LHC vbox apps).
A simple installation guide can be found here: https://cernvm.cern.ch/portal/filesystem/quickstart
You should set up the repositories as shown in the example for ATLAS. If you have a squid proxy handy you can specify it there - if not I'm not sure whether it will work or not without configuring one.
Our target for this app is CERN or ATLAS-related institutes who have idle machines with CVMFS already installed, and we do not expect the average volunteer to install CVMFS and run this app. But I think all of you here are above-average volunteers :) and you may be interested in trying it.
Please give feedback on the ATLAS forums. Unfortunately there is no way to check for CVMFS on the client before requesting tasks, so if you don't have CVMFS you can still get these tasks and they will fail straight away. So better to uncheck the ATLAS app if you don't want to run it.
23 Feb 2017, 15:48:35 UTC
· Discuss
Draining the CMS job queue
Because of an upgrade to the WMAgent server, we need to drain the CMS job queue. So, I'm not submitting any more batches at present and we should start running out over the weekend. If you see that you are not getting any CMS jobs (not tasks...) please set No New Jobs or stop BOINC.
I expect that the intervention will take place Monday morning, and hopefully we'll have new jobs again later that day.
17 Feb 2017, 10:58:22 UTC
· Discuss
Good news for the CMS@Home application
We've now demonstrated that we can perform all the steps required to bring CMS@Home into a production tool for CMS Monte-Carlo data production. Please see this announcement at LHC@Home for more details.
27 Jan 2017, 21:03:51 UTC
· Discuss
SSL on lhcathomedev.cern.ch
We have now got a proper certificate for our DEV server, and use the opportunity to change name to: LHCathome-dev, so please use this URL from now on for the LHC@home-dev project:
https://lhcathomedev.cern.ch/lhcathome-dev
The former test SSL URL to this project on the production server will be stopped and redirected here.
Thanks for your continued support and help!
17 Jan 2017, 10:14:02 UTC
· Discuss
CMS Servers up again
https://www.neowin.net/news/dirty-cow-flaw-lets-hackers-gain-control-of-linux-systems-every-single-time
YEP Linux is just the greatest and most secure OS ever 😎
.....I didn't do it.......and I never liked a Dirty Cow
(OK I won't restart the OS war)
24 Oct 2016, 15:50:09 UTC
· Discuss
Server Consolidation
As mentioned previously, we would like to consolidate the existing production servers (Sixtrack, vLHC and ATLAS) into a single service. We hope that by doing this we can improve the support and reduce the confusion. One benefit for all is that there will be a single forum so both us and our volunteer moderators can be more effective.
The transition will have two phases, commissioning and decommissioning. First a new server will be prepared with a similar configuration as this dev project but based on the Sixtrack DB. This is because they have the most users and 50% of the active users from vLHC and ATLAS are already there, hence it should minimize the impact. Once this new server is ready, it will be opened up for use in parallel with the existing three servers.
Next comes the decommissioning. For Sixtrack this should be straight forward, the URLs for the old host will be redirected to the new host. For vLHC and ATLAS things will be a little more complicated. Those users who are already registered with Sixtrack will be encouraged to move to the new server. For those who are not registered they can either register themselves and move or we can do a bulk registration. Tasks can then be stopped and the URLs redirected.
Finally there is the issue of credit. It should be possible to migrate the credit from the old servers to the new server. This can only really be done once the servers are no longer used. There is no time critical aspect, just until this is done, only the new credit will be seen.
Comments and feedback on this proposal are welcome.
P.S The dev project will stay around as it is.
5 Oct 2016, 12:56:22 UTC
· Discuss
Migration to SSL
The scheduler and web pages of this dev project are now also published on the URL:
https://lhcathome.cern.ch/vLHCathome-dev
Please detach and re-attach to this new URL with your BOINC clients. The old project server is still running, and also the file upload and validation daemons run on the old server for now.
Later on after a test period, we will redirect the old URL. Then we will proceed in a similar way with our production project.
16 Sep 2016, 14:18:50 UTC
· Discuss
Task Tracker
I have added a task tracker to the top left of the page so everyone can see what issues we know about, which one are being worked on and what is being done right now. It still needs populating with a few items.
3 Aug 2016, 10:08:09 UTC
· Discuss
Task and CPU limiter
The server has just been updated to add the feature that limits Tasks and CPUs per user. This limit can be controlled in the project preferences.
Together with my changes to the scheduler, per-project limits on jobs in progress and #CPUs should now work. But I haven't actually tested this. Laurence, please try it and tell me if it doesn't work.
-- David
Server Upgrade Tomorrow Morning
The server will be upgrade tomorrow morning.
28 Jul 2016, 14:07:33 UTC
· Discuss
Update
David Anderson, Rom Walton and myself had a conference call yesterday where we discussed limiting tasks per user, why we want to do this, mutli-core VMs and the VT-x issues.
One of the reasons why we would like to limit tasks is that machines can be assigned more tasks that they can handle or is desired, which leads to problems. David pointed out that BOINC should respect the resource constraints and if not the issue needs to be looked into. Feedback on this would be welcome so I have created a new thread where you can paste any scenarios where BOINC is not respecting the constraints.
Implementing a task limit per user should be straight forward. I will provide an updated php file for the project preferences and David will update the scheduler code to take this into consideration.
Similarly for multi-core, we can set a flag in the project preference on whether or not you would like BOINC provide multi-core VMs. This is an area where we probably still need to experiment.
Finally, the VT-x issue was discussed as over 30% of our failed tasks are VMs that fail to boot due to this setting not being available or enabled. It was pointed out that tasks should not be provided if the machine is not capable of running them. This will also be investigated.
28 Jul 2016, 13:03:56 UTC
· Discuss
Scheduler and vbox update to detect 64-bit enabled computers
The BOINC scheduler has been updated to detect 64 bit machines that do not have the virtualization hardware extensions enabled. Also vboxwrapper has been updated for better error handling, and vboxwrapper 26193 is now deployed for Windows and Linux for the Theory application.
Many thanks to RomW for providing these changes!
20 Jun 2016, 8:45:59 UTC
· Discuss
CMS Jobs Available Again
There was a fault with a server at CERN last night, which meant that we could not submit new CMS jobs, so we ran out.
However, the problem has now been fixed and CMS jobs are available again. Many thanks to the staff who worked Saturday night and Sunday to fix the problem.
5 Jun 2016, 16:48:31 UTC
· Discuss
Infrastructure Update
The authentication server used to get the proxy has been changed. New tasks from now on will use the new server. This change should be transparent but in case everything breaks in the next few hours, this will be why.
24 May 2016, 13:31:00 UTC
· Discuss
CERN Bulletin
This article was published in the CERN Bulletin yesterday.
http://cds.cern.ch/journal/CERNBulletin/2016/20/News%20Articles/2151943?ln=en
18 May 2016, 11:20:48 UTC
· Discuss
Project Configuration Update 2
Some project configuration parameters have been changed to help avoid hosts being swamped with tasks and to back off problem hosts.
<daily_result_quota> 2 </daily_result_quota>
<max_wus_in_progress> 1 </max_wus_in_progress>
<max_wus_to_send> 1 </max_wus_to_send>
<min_sendwork_interval> 60 </min_sendwork_interval>
Error Codes
At the beginning of next week work will start on providing consistent error codes and behaviour for all applications. Three error codes will be used:
Any of these errors should cause the BOINC client to back-off. If you see any errors and one of these codes is not used, please let us know.
EDIT: To get this into the upstream release the codes have changed to:
30 Apr 2016, 17:51:59 UTC
· Discuss
Refactoring
The bootstrapping code used to prepare the environment for the jobs in the VM has been re-factored so that common tasks for the five applications are abstracted to common functions. It has been tested in a VM but there may be issues relating to the diversity of our environment and error conditions.
22 Apr 2016, 14:44:49 UTC
· Discuss
New Applications
As many of you have already seen, there are now five applications at various levels of readiness hosted in this project. The challenge now to is bring them to the level of quality required for the production project. This will involve work both on the frontends that are visible and the backends that are not. With more applications, we have to be a little more focused and use our efforts effectively. Recently there has been some good communication from everyone on the message boards and I hope that this will continue. The initial focus will be to improve the Theory application as it should be the easiest and as many components are now shared between the applications, the improvements should benefit all. We would like to thank everyone for their continued support, it really does help to make a big difference.
21 Apr 2016, 19:46:23 UTC
· Discuss
Project Configuration Update
As we now have multiple applications, some of you have requested that we remove the restriction that limits the number running tasks per host to one. This restriction has was put in place so as not to overload machines while developing. As everyone in this development project is (or should be) an advanced user, we assume that everyone knows how to adjust their preferences to limit tasks on the client if needed. This update will be done tomorrow morning at around 10am CET (8am GMT, 1461052800 UTC).
18 Apr 2016, 20:38:52 UTC
· Discuss
New Theory Application
A new Theory application has just been added. If you do not wish to receive these tasks, please update your vLHCathome-dev preferences.
11 Apr 2016, 14:45:06 UTC
· Discuss
Server code update
We're down for a short while for a server code update.
16 Mar 2016, 9:44:37 UTC
· Discuss
Change of project name
As mentioned earlier (under "Project Restructuring"), CMS-dev has evolved into a more general dev project for virtual machine applications running under LHC@home.
We will therefore rename the "CMS-dev" project to "vLHCathome-dev" as it is now a development project for early testing of applications that potentially could run in production under the Virtual LHC@home platform.
The change is planned for tomorrow 13 UTC, and you should later see this project as:
http://lhcathomedev.cern.ch/vLHCathome-dev/
Redirection will be put in place, so in principle BOINC clients should be able to follow. Otherwise please detach the project and re-attach to the new URL at your convenience.
Thanks to your contributions and feedback on tests of our applications, CMS-dev has been a success, and we would like to express our warm thanks for your contributions! :-)
Please note that this remains a development and test project, that might provide unstable applications and that any BOINC credit accumulated here might get lost.
If you prefer to just crunch and get credit, please give priority to our production LHC@home projects.
Many thanks for your collaboration!
... the team
8 Mar 2016, 14:17:35 UTC
· Discuss
LHCb Jobs
Tasks for the LHCb application will start to be submitted. By default this has been made opt-in so you should only get tasks if you specifically ask for them. The application is still in development so only give it a try if your curious.
7 Mar 2016, 15:22:48 UTC
· Discuss
Updated Job Agent
The CMS job agent has been updated to add some additional protections. The VM will now shutdown if there are no more jobs, no output has been produced or if too many jobs fail.
4 Mar 2016, 11:14:35 UTC
· Discuss
New CMS App v46.26
Version No. in title and post conflicting . . .
3 Mar 2016, 14:38:57 UTC
· Discuss
LHCb Application
An initial alpha version of the LHCb application has been added. There are currently no tasks for now. Two new topics have been created for the message boards so that discussions on the CMS application and LHCb application can be kept separate.
3 Mar 2016, 11:00:25 UTC
· Discuss
We may have made a mistake...
If you get error messages about site-local-config like the ones in this message, please abort your current task and start again. We made a change yesterday afternoon to speed up booting, but things aren't behaving as we'd expected. We've reverted to the previous version, but you need to reboot the VM to pick up the changes (...or let it spew out job failures for 24 hours...). :-(
Sorry, it was my idea but things obviously weren't working exactly as I'd thought.
3 Mar 2016, 1:02:05 UTC
· Discuss
Change Log
This thread will be used to provide information on all the changes that are made to help correlate issues with potential causes. It is needed as not all changes are tied to a new application release such as with the supporting infrastructure.
2 Mar 2016, 20:14:56 UTC
· Discuss
Project Restructuring
As discussed in other threads, we would now like CMS-dev to become our general testing project. In order for this to happen the project should support sub-projects similar to PrimeGrid and per app credit. We are aiming to add these features within the next few days.
1 Mar 2016, 9:56:45 UTC
· Discuss
Out Of Jobs
We are out of jobs and are fighting will a few other issues. I have stopped new tasks being sent for now. A feature to handle this situation more gracefully is on the work plan. I hope that we can be back running after the weekend.
26 Feb 2016, 10:07:42 UTC
· Discuss
Infrastructure Issues
There is an issue with one of our servers that will stop new glideins (runs) from working. In theory the VMs should just idle until this is fixed.
19 Feb 2016, 15:58:45 UTC
· Discuss
Suspend/Resume
The suspend/resume issue should now be resolved so it should be possible to pause/save a VM for up to 48 hours without loosing the current job. This will only work with new tasks so please start a new one if you would like to test this. As usual, please post any message to this thread if you find any problems.
EDIT: Just as a reminder the job would evicted after about suspending the VM for 20mins. If you do a test and it is fine, please also post and say for how long the VM was suspended.
16 Feb 2016, 19:07:08 UTC
· Discuss
New App Version For Linux and Windows
A new app version (CMS v46.23) for Linux and Windows has been provided. It contains vboxwrapper v26183 which provides a heartbeat mechanisms that can detect if the VM fails to boot or freezes. It should prevent VMs just sitting idle if such scenarios as a kernel panic occurs at boot. A Mac version will be made available once a build is available. As usual, please let us know if there are any issues with this release.
10 Feb 2016, 14:25:29 UTC
· Discuss
Workplan
This thread will be used to provide information on the status of issues/improvements. Strikethrough will be used to show items done (check for issues), bold for in progress and italics for things on the to-do list.
9 Feb 2016, 21:45:59 UTC
· Discuss
Zombie Tasks
The issue where many tasks have been sent to the BOINC client but only one runs is being investigated. The number of tasks per day per user has been reduced from 500 to 20 in an attempt to reduce the impact. This is probably why only a few jobs are running as the clients are now blocked from getting a new task that may run. A project reset might fix this or maybe not. If anyone has any information that may help, please post.
9 Feb 2016, 20:21:33 UTC
· Discuss
Server Restart
The sever has been restarted to hopefully address the issue of multiple tasks being sent to the BOINC client.
5 Feb 2016, 7:47:55 UTC
· Discuss
Aborted VMs
It has been noticed that some VMs are aborting. This maybe due to them running out of memory as no swap space has been configured. The bootstrap script has been updated so that next time the VM is started some swap will be configured.
3 Feb 2016, 17:55:17 UTC
· Discuss
New plot for the jobs stats
A new plot has been added to the CMS job stats page. It shows the wallclock consumption for successful and failed jobs. This is a better indicator than using the number of jobs as it is not affected by job length.
3 Feb 2016, 15:30:56 UTC
· Discuss
Updated Agent
The job agent has been updated with the main aim to get logging messages back into stderr_txt so it can be seen for each task. The following changes were made:
This update provides a handle for shutting down the VM when problems are detected and providing the reason to the BOINC client. Details are also archived for the task which should help troubleshooting and improve overall support.
2 Feb 2016, 23:45:10 UTC
· Discuss
Graceful Shutdown Now Implemented
The graceful shutdown of VMs has now been implemented. When the VM is older than 24 hours, after the current run has finished the VM will shut itself down using the completion_trigger_file method. More precisely, a file is placed in a shared directory between the host and guest that signals to the BOINC client that the task has ended. To verify that the VM was gracefully shutdown, the message VM Completion File Detected should be seen in the stderr_txt of the task. This required new app version to be released (v46.22) that contains the following changes to the job description:
31 Jan 2016, 22:20:59 UTC
· Discuss
Poll
As discused in a recent thread, There will potentially be 6 LHC related applications (Six Track, Test4Theory, ALICE, ATLAS, CMS and LHCb) and hence between 1 and 12 projects depending on how things are organised. The options are:
1. One project with beta apps
2. Two projects; prod and dev
3, One project and six dev projects
4. Six prod projects and six dev projects
What would you prefer?
http://doodle.com/poll/esktqvrikqmpmyp2
30 Jan 2016, 21:08:58 UTC
· Discuss
Constructive suggestions please
As mentioned elsethread, I have to prepare a summary of required/desired improvements to CMS@Home to take it up to production readiness. Please post suggestions and criticisms in this thread. Please keep it short and non-personal, as I'll have to de-serialise the thread to make my report.
29 Jan 2016, 13:20:06 UTC
· Discuss
Migrating to vLHC@home
As most of you already know, the aim of this project was to get the CMS application to a point where it was mature enough to be added to the vLHC@home project as a beta app. We believe that we have now reached that point and would like migrate our activity to that project.
The CMS beta application which should be identical to this one is now available in vLHC@home. Out of the 190 volunteers that have credit, 89% already have a vLHC@home account. Please could everyone who is running here try out the CMS beta application from vLHC@home. To do this you will need to go to the vLHCathome preferences and enable CMS Simulation.
http://lhcathome2.cern.ch/vLHCathome/prefs.php?subset=project
If the beta app is working for you, please stop running here. Once most have migrated, no new tasks will be created and the accumulated credit can be migrated from here to vLHC@home.
Please post any comments or issues relating to the migration in this thread.
Thanks to everyone who has supported this project and enabled us to get to where we are today.
Laurence
27 Jan 2016, 14:59:44 UTC
· Discuss
Important information on upload bandwidth
We've been puzzling for a while as to why we were getting a lot of "stage-out" failures -- i.e. problems returning result files to data storage.
I'd pushed up lately to CMS jobs taking 2-5 hours, depending on processor speed, and returning ~150 MB result files. This means that on average each VM is returning ~50 MB/hr (or to put it another way, at an upload speed of 1 Mbps, returning a result file would take 1500 seconds, or 25 minutes).
It seems technology is roughly consistent across the world, and many consumers are still on ADSL broadband -- where the A means Asymmetric, that is upload speed is usually much slower than download speed. Upload speeds around, or even less than, 1 Mbps seem to be the norm for ADSL broadband.
So, the problems started occurring when enthusiastic volunteers started running several machines at once on their home networks. This meant that the total load on the upstream channel exceeded availability, uploads stalled and we started getting transfer time-outs.
So, the caution to take away from this is to make sure you know you upload speed, and make sure you don't run so many machines that they take your line into saturation.
I believe there are some workloads we could commission with a somewhat smaller MB/hr result generation; I'll let you know if we can start running them.
3 Jan 2016, 13:21:57 UTC
· Discuss
Updated Agent
The CMS Job Agent used in the VM has been updated. It contains the following changes:
* Fixed 1 hour (not) sleep issue
* Support for non-BOINC instantiations
* Added support for running under for vLHC@home
This update should be transparent but if not please let us know by posting a reply to this message.
29 Oct 2015, 20:19:20 UTC
· Discuss
VirtualBox wrappers upgraded to 26178
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26178.
It contains the following fixes:
* VBOX: Add code to handle search path modification for Linux and Mac.
* VBOX: On a hypervisor detection failure dump all the logs to stderr, it would have quickly exposed a search path change on Mac OS X.
* VBOX: Reduce the amount of disk I/O when parsing the VM log file
* VBOX: Fix a regression introduced in 26172 with starting up a VM
Let us know how it goes.
22 Oct 2015, 7:46:10 UTC
· Discuss
New jobs available
I've now submitted a larger batch of jobs since the failure rate seems manageable. There were a few host IP addresses recurring amongst the failures, I'll keep an eye out out for them in future and contact the owners if they continue to misbehave. You can start running tasks again now if you wish.
20 Oct 2015, 12:20:57 UTC
· Discuss
New vboxwrapper
We've released new versions of CMS@Home with the latest vboxwrapper 26175. See the discussion in Number Crunching for the effects this has.
13 Oct 2015, 14:04:09 UTC
· Discuss
New developments
We're at the stage where we have to make disruptive changes to the workflow, in order to get the results onto the Grid from the data-bridge. At some point soon we'll start getting errors for jobs in the current batch, at which time I'll ditch the rest and submit a small test batch. If we're lucky that may be the end of it, we'll have to see.
Thanks in advance for your understanding.
13 Oct 2015, 13:03:44 UTC
· Discuss
CMS beta in Virtual LHC@Home
Some tasks are now being run through vLHC@Home -- see this thread for details.
12 Oct 2015, 14:17:20 UTC
· Discuss
Possible short outage...
I'm about to try manually to install a new certificate proxy, as the default 7-day initial proxy is about to expire. This is the first time we've done this, so it may not work -- if it fails, expect to see job failures. I'm not sure if the jobs will fail before they get to your tasks or after... If I see failures I'll submit a new batch immediately, so don't panic if you see failures, we should ride it out OK.
9 Oct 2015, 9:52:51 UTC
· Discuss
Logo
Shall we make a logo to sit up there in the top left-hand corner?
Here's my suggestion:
29 Sep 2015, 16:25:24 UTC
· Discuss
Jobs incoming!
Patches have been applied, jobs should be ready when you want them, Enjoy!
[Edit] Confirmed, jobs are available now. [/Edit]
31 Aug 2015, 16:24:10 UTC
· Discuss
Progress!
We are making great progress and are just chasing up the last remaining issues. One of the recent improvements was to create the link to the CMS monitoring infrastructure. This means that we can generate nice plots similar to what ATLAS@home have for their project.
28 Aug 2015, 10:58:33 UTC
· Discuss
Some jobs again
We can now submit jobs again. I'll submit a test batch overnight, and then try for a bigger test for the rest of the week. Feel free to start running tasks again, and report problems (and successes...) in the usual places.
Thanks.
25 Aug 2015, 16:37:29 UTC
· Discuss
No new jobs
We've run out of jobs on the Condor server. Until I can sort out the glitch that's preventing me submitting new jobs you can all take a rest for the weekend, or switch to backup projects.
Cheeers, ivan
22 Aug 2015, 9:10:13 UTC
· Discuss
Agent Fixed
The agent is fixed. If you experience any problems, please abort the old task first to verify the issue in a new task and then post a reply in this thread.
20 Aug 2015, 11:29:17 UTC
· Discuss
Agent Broken
We have an issue with the agent so the VMs will not get new jobs until this has been resolved.
19 Aug 2015, 15:00:43 UTC
· Discuss
New CMS Agent
We are just about to push a new CMS Agent to CVMFS. The code has been re-factored to be much simpler, less code = less bugs :)
It should appear in a few hours, let us know if there are any problems.
19 Aug 2015, 13:05:07 UTC
· Discuss
Helpful tips for new users
This thread is to collect useful information in one place. Feel free to add your tips here.
================
A bug has been found in the Windows version of BOINC which means that files larger than 4 GiB (2^32 bytes) are being left behind in slot directories, affecting us and other BOINC projects. Unfortunately, we do produce VM files that large, so we are interfering with these other projects. If you are active on this project, using Windows, please update your BOINC version to a patched version (see this message and thread).
"The files that are needed to apply the hotfix are
For 64-bit BOINC
boinc.080515.x64.zip
For 32-bit BOINC
boinc.080515.x86.zip
Simply extract the two files for your version from the .zip archive, and copy them to your BOINC program folder - you'll need to stop the BOINC client while you do this, and restart it again afterwards."
17 Aug 2015, 14:26:30 UTC
· Discuss
We have results!
First CMS@Home results returned to storage!
(Don't try to look at the page yourself, unless you have CMS credentials.)
15 Aug 2015, 0:17:16 UTC
· Discuss
Agent Update
We will shortly be updating the agent that is running in the virtual machine. This should start using the new infrastructure that we have been working on recently. It may not work first time so if you have any feedback, please respond to this post.
4 Aug 2015, 11:21:50 UTC
· Discuss
Real CMS Jobs
Over the past few months we have been re-engineering our internal infrastructure so that we can send real CMS simulations jobs to the CMS@home project and are nearly ready to try this out for real. From the VM side of things, we only need to update the CMSJobAgent.py script which will be done via the magic of CVMFS [1], so no new application release will be needed.
Although this should be transparent, there is a high chance we will temporary break something, so please bear with us during this potential period of instability. I would estimate that we will be ready within the next two weeks and I will send an announcement before we make any changes.
Many tanks to all of you who have been supporting with the testing of this project.
[1] http://iopscience.iop.org/1742-6596/219/4/042003
21 Jul 2015, 9:56:40 UTC
· Discuss
Urgent Update for Windows Users
A bug has been found in the Windows version of BOINC which means that files larger than 4 GiB (2^32 bytes) are being left behind in slot directories, affecting us and other BOINC projects. Unfortunately, we do produce VM files that large, so we are interfering with these other projects. If you are active on this project, using Windows, please update your BOINC version to a patched version (see this message and thread).
"The files that are needed to apply the hotfix are
For 64-bit BOINC
boinc.080515.x64.zip
For 32-bit BOINC
boinc.080515.x86.zip
Simply extract the two files for your version from the .zip archive, and copy them to your BOINC program folder - you'll need to stop the BOINC client while you do this, and restart it again afterwards."
Thanks to the community for helping us debug this, especially Richard, Crystal Pellet and Ray, and to the BOINC crew for coming up with the fix.
In the meantime, we will move on to debugging why our VMs are currently growing so large.
19 May 2015, 12:24:33 UTC
· Discuss
VBox Wrappers Updated to 26165
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26165.
It contains the following fixes:
* VBOX: Add VboxStartup.log to the list of partial log dumps to stderr when something goes wrong.
* VBOX: Remove unneeded files.
In addition the plan class vbox64 has been specified for the platform and the
Let us know how it goes.
11 Apr 2015, 3:20:38 UTC
· Discuss
VBox Wrappers Updated to 26164
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26164.
It contains the following fixes:
* VBOX: Check for a valid Console pointer before attempting to pause/resume the VM.
* VBOX: cut down on some of the noise with spurious 'Status Report' messages when we attempt to launch the VM.
* VBOX: After adding in a new VirtualBox COM interface, you must hook up the plumbing.
* VBOX: Only add the guest additions ISO to the VM if the file has actually been detected on the file system.
* VBOX: Add COM support for VirtualBox 5.0.
Let us know how it goes.
8 Apr 2015, 8:31:29 UTC
· Discuss
VBox Wrappers Updated to 26160
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26160.
It contains the following fix:
* VBOX: If polling for the current VM state fails for any reason, like vboxsvc crashing, do a temp exit for 24 hours.
Let us know how it goes.
28 Mar 2015, 22:50:21 UTC
· Discuss
Windows VBox Wapper Updated to 26159
The VBox wrapper for Windows has been upgraded to version 26159. It contains these fixes:
* VBOX: Add better error checks when handling COM error conditions.
* VBOX: Add additional check for a valid pointer to prevent crash condition.
Please let us know if you have any problems.
27 Mar 2015, 10:58:48 UTC
· Discuss
VBox Wrappers Updated to 26158
Hi, thanks for waiting. Unfortunately I got side-tracked for a while due to CERN's changing the rules since the last time I renewed my contract...
I've changed the vboxwrapper to V.26158, so you can resume tasks if you want. It seems to run fine on Windows and Linux. Please run a test WU, esp. if you're on a Mac. (Maybe I can leverage this project to get a Mac myself!)
I've added the new feature
I haven't added another new feature
26 Mar 2015, 11:56:25 UTC
· Discuss
Please set No New Tasks for a short while
Gruezi Mittenand;
I'm about to make my first solo release (to update the vboxwrappers). It's inevitable that I'll make mistakes, so please set NNT until I give the word to go, to avoid picking up some intermediate flawed state.
Thanks.
26 Mar 2015, 10:11:55 UTC
· Discuss
VBox Wrappers Updated to 26157
The VirtualBox wrappers for Windows, Linux and Mac have been upgraded to 26157. Also the tag enable_cern_dataformat has been removed from in the job XML file.
Let us know how it goes.
22 Mar 2015, 23:04:51 UTC
· Discuss
MAC VBox Wapper Updated to 26156
I have just updated the VBox wrapper for MAC to version 26156. Please let us know if you have any problems.
21 Mar 2015, 23:17:54 UTC
· Discuss
A Message To All Our Volunteers
First of all I would like to take this opportunity to thank all our volunteers, especially those who are active on the message boards and are helping us to evolve this project. Without volunteers we can not do volunteer computing.
The goal of this project is to develop what is required so that the CMS collaboration can use this resource for computational intensive tasks such as producing Monte Carlo events, simulated collisions within the detector. Due to the complex nature of the software and the difficulty with maintaining up-to-date ports on different platforms, the Virtualized approach is being used. This means that the BOINC tasks are just virtual machines that run for 24 hours. When the virtual machines start, they download the real computational task from our own infrastructure. For now these tasks are just many copies of the same example so please don't dedicate too many resources as the results will not be used. Your computing cycles are a valuable resource and there are many other projects that could benefit from them. We still have quite some work to do on our back-end infrastructure so that the collaboration can seamlessly direct tasks here and receive back the results.
Our vision is that in the near future when the application and back-end infrastructure is mature, we can include it in the vLHC project as another application. When this happens it will not be possible to transfer the credit so please add your resources there now if that would annoy you.
Once again thank you for your participation and helping us to get this off the ground.
20 Mar 2015, 23:02:56 UTC
· Discuss
Vbox Wrapper Updates
The VirtualBox wrappers for Windows and Linux have been upgraded to 26156. The Mac wrapper has been downgraded to 26105.
Let us know if you have any problems. I will post another general news item soon providing more details about this project.
20 Mar 2015, 21:32:30 UTC
· Discuss
VBox wrapper problems
After upgrading to the version 26155 of the VBox wrappers, we have experienced some problems. Rather than reverting back to a working state we are going to push forwards and help debug them. We hope that this way our development project can then help those in production.
Cheers,
Laurence
20 Mar 2015, 9:19:24 UTC
· Discuss
New Release (v46)
The vboxwrappers have been upgraded to 26155 so that we are now in sync with vLHC@Home.
Please let us know how things are going by posting on the message boards.
Thanks,
Laurence
18 Mar 2015, 11:44:22 UTC
· Discuss
New Release (v45)
A new app version has been released (v45). This is a FAT image (548MB compressed) that contains many of the files were downloaded via CVMFS. It should reduce the amount of network traffic and hence make everything a little bit more efficient.
Please let us know how things are going by posting on the message boards.
Thanks,
Laurence
13 Mar 2015, 14:47:51 UTC
· Discuss
New Release (v44)
A new app version has been release (v44). The main fixed has been to an issue whereby jobs were failing on machines due to configuration files in CVMFS not being discovered automatically. Various other minor things have been clean up. The image is now small (15MB) but will download about 1GB of files via CVMFS. This may be changed in a future release where we will increase the size of the download image by already including may of the needed files.
Please let us know how things are going by posting on the message boards.
Thanks,
Laurence
12 Mar 2015, 9:13:46 UTC
· Discuss
Another new image and access to the log files
Once again we have updated the VM images, so it would be nice if you could get a new VM.
This time we have done multiple things:
Some questions about the logs already arose internally (thanks to Ben) so some short comments on that:
25 Feb 2015, 16:50:59 UTC
· Discuss
Server failure over the night
Over the night we experienced a server failure in our back end server, which feeds jobs into the VM's.
So it might have been that your VM was sitting around doing nothing at that time...
We have now restarted the server and everything is back to normal.
Your VM's should be receiving jobs again :)
Unfortunately at the moment we do not completely understand, what caused the server to crash, but we hope to figure that out soon!
EDIT: In case your VM is still not running properly, getting a fresh VM by getting a new Boinc job/wu should solve this.
20 Feb 2015, 9:57:34 UTC
· Discuss
New VM image and new console feature!
We have just updated the VM image. So please abroad your running jobs/wu's and get a new one.
The new image should have improved stability and use your cpu cycles properly.
As well we have included a new feature, so that you can now see information about the job, etc. on the consoles. (Similar to test4theory/vLHC)
You can open the VM console by clicking on the CMS-dev job in your BOINC Manager and then on the "show VM Console"-button on the left. An rdp client should open automatically and connect to the VM.
Once there you can look through the different consoles. They are as following:
1: Job output stdout [white]
2: JobAgent stdout [white]
3: top
4: Job output stderr [red]
5: JobAgent stderr [red]
On Windows you can use Ctl-Al-F[n] to jump to the Consoles, on Linux you should try it with Alt-F[n]
The output is still a bit messy, but pleas bear with us.
A graphic version as in t4t is coming soon -- to a VM near you :D
As well we were just successful, to run the VM on a Microsoft Surface, which is probably the first time ever that a CMS Job was run on a Surface :D
19 Feb 2015, 14:28:27 UTC
· Discuss
Restriction of new account creation
Unfortunately, there has been a spate of rather dodgy accounts being created in the last few days (try browsing profiles at random...), so we have had to limit new accounts for the time being.
The limiting mechanism is by Invitation Codes.
If you would still like to join this nascent project, then please send an e-mail to Ivan.Reid@CERN.ch with the Subject: "CMS@Home Request" and we will consider your request. Obviously we will consider such factors as whether you are already contributing to BOINC projects before sending an invitation code. Note that a decision may not be immediate.
18 Feb 2015, 15:02:19 UTC
· Discuss
Welcome to the CMS development project
Welcome. We've opened up the forums in case anyone wants to contribute.
We're still under development so don't waste too many cycles on it yet -- only run one or two jobs at a time. Let us know of any problems.
I believe that there are incompatibilities on Windows with VirtualBox versions beyond 4.3.12 (at least that's what vLHC@Home has found), but I seem to be able to run with 4.3.20 on Linux. I don't have a Mac box myself (yet...) so I have no experience there.
13 Feb 2015, 18:49:40 UTC
· Discuss
©2024 CERN