Message boards : Number crunching : 24 hours just isn't what it used to be
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,010,018
RAC: 8,271
Message 2232 - Posted: 4 Mar 2016, 20:42:44 UTC

Have I missed something ?

Why are the 24 hour jobs no longer stopping after circa 24 hours ?
Three Boinc tasks that started about lunchtime today have all completed and validated in only 6 to 7 hours.

Was this to be expected ?
ID: 2232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2233 - Posted: 4 Mar 2016, 20:55:20 UTC - in response to Message 2232.  
Last modified: 4 Mar 2016, 20:55:48 UTC

It will finish the "run" that it is on, when going past the 24H limit.
A "run" takes about 6h, so it is possible, worst case, to run for 30h.

This was done to not loose they job it was working on, when hitting 24h.
ID: 2233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 5
Message 2234 - Posted: 4 Mar 2016, 21:02:29 UTC - in response to Message 2232.  

Have I missed something ?

Why are the 24 hour jobs no longer stopping after circa 24 hours ?
Three Boinc tasks that started about lunchtime today have all completed and validated in only 6 to 7 hours.

Was this to be expected ?

You may have run foul of the new rule stopping tasks if all the jobs in one "run" terminate with non-zero exit codes. I'll try to investigate later (just got in from work, haven't watched the news yet...).
ID: 2234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,010,018
RAC: 8,271
Message 2235 - Posted: 4 Mar 2016, 21:09:27 UTC - in response to Message 2234.  

The stderr output for the tasks don't have the Guest Log details in them that the previous version did.

It does have the line...
Detected: Heatbeat check (file: '$s' every 0.000000 seconds)

Is it really checking every 0.000000 seconds ?

PS. Wouldn't bother with the news, if you saw yesterday's and you'll see tomorrow's you'll be fine :-)
ID: 2235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,556
RAC: 73
Message 2237 - Posted: 4 Mar 2016, 22:40:54 UTC - in response to Message 2232.  

I just see the VM Completion File Detected message but no reason. Looks like the logging has broken. Will investigate.
ID: 2237 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 5
Message 2239 - Posted: 4 Mar 2016, 23:27:20 UTC

My quick check only gives status 65 non-zero exit codes for you, from Wed & Thurs; those would be the site-local-config file errors we accidentally introduced. Nothing for today.
Which host(s)? You don't have much RAC on any of them so I'm not sure which one(s) you are using -- peering behind the curtains with admin rights doesn't give the convenient "last time host contacted server" column.
ID: 2239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,010,018
RAC: 8,271
Message 2240 - Posted: 4 Mar 2016, 23:33:41 UTC - in response to Message 2239.  

Another one just did the same, hosts 471, 472, 485 & 761
ID: 2240 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2241 - Posted: 4 Mar 2016, 23:54:17 UTC

touch: cannot touch `/home/boinc/shared/heartbeat': No such file or directory
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo
sudo: sorry, you must have a tty to run sudo

Just found this in cron-stderr.
Maybe related.
ID: 2241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,556
RAC: 73
Message 2243 - Posted: 5 Mar 2016, 6:50:21 UTC - in response to Message 2241.  

I think I know why this might be failing. Are the consoles working for you? Will try to look at it today if I can. The tasks and jobs seem to be running fine though.
ID: 2243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,010,018
RAC: 8,271
Message 2245 - Posted: 5 Mar 2016, 8:22:36 UTC - in response to Message 2243.  

They were yesterday, haven't checked today (will be another 30 minutes before I can).
ID: 2245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,010,018
RAC: 8,271
Message 2246 - Posted: 5 Mar 2016, 9:02:32 UTC - in response to Message 2245.  

Consoles are fine, shows the events being processed etc.
ID: 2246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,556
RAC: 73
Message 2248 - Posted: 5 Mar 2016, 12:10:23 UTC - in response to Message 2241.  

I think that with the new image the /etc/sudoers file had changed and contains requiretty. Have just pushed an update to the bootstrap script that removes this and it should be there once CVMFS is updated and a new task is started. Will check the output of some tasks this evening to see if it worked.
ID: 2248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : 24 hours just isn't what it used to be


©2024 CERN