Message boards : News : Problem writing CMS job results; please avoid CMS tasks until we find the reason
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 6293 - Posted: 18 Apr 2019, 15:46:05 UTC

Since some time last night CMS jobs appear to have problems writing results to CERN storage (DataBridge). It's not affecting BOINC tasks as far as I can see, they keep running and credit is given. However, Dashboard does see the jobs as failing, hence the large red areas on the job plots.
Until we find out where the problem lies, it's best to set No New Tasks or otherwise avoid CMS jobs. I'll let you know when things are back to normal again.
ID: 6293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,603,490
RAC: 1,713
Message 6294 - Posted: 18 Apr 2019, 19:16:37 UTC - in response to Message 6293.  

Thanks Ivan and I will watch for the update and when we can run these again.
ID: 6294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 6297 - Posted: 20 Apr 2019, 10:35:52 UTC - in response to Message 6294.  

Given the Easter holidays, I'm not sure when someone at CERN will be able to look at it. We're getting "permission denied" trying to write results and logs to the DataBridge, which suggests either something has filled up, or a certificate has expired. We are starting to get some hard failures now so I guess these are jobs which exceeded the re-try limit.
ID: 6297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 154
Message 6300 - Posted: 25 Apr 2019, 6:32:58 UTC - in response to Message 6297.  

We seem to be getting successful jobs again. Unfortunately I'm not able to access a PC until tonight to verify how well we are recovering. Resume tasks with care.
ID: 6300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Problem writing CMS job results; please avoid CMS tasks until we find the reason


©2024 CERN