Message boards : Number crunching : Another Problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5049 - Posted: 18 Jul 2017, 21:47:32 UTC

I got a message from another member today so I had to check it out and see what I could find and then I just tried with one of my computers that has not been running for a couple weeks.

https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=1165

https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=384

Both are running tasks and getting Valids (one is mine and one belongs to maeax)

But they both say they have not contacted the server since last month and in our own accounts the computers are not on the current running list since the server says it has been over 30 days since the last contact.

You can see maeax is running Valids every day and I also got a Valid task (as a test) and decided to see what would happen if I *reset* the project and as usual it aborted the tasks I had BUT on my account page it still says I have 7 in progress (it doesn't)and that it still has not contacted the server since June 6th and of course it has.

-Samson
Mad Scientist For Life
ID: 5049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nikogianna
Project administrator
Project developer
Project tester

Send message
Joined: 21 Feb 17
Posts: 21
Credit: 195,770
RAC: 0
Message 5050 - Posted: 20 Jul 2017, 12:46:14 UTC

Hello, I can confirm that the database record for the last time the host (384) contacted the server is 6 Jun. Could you please try updating the project from the BOINC manager so that we can test if manually issuing an RPC (by the update) will record in the database?
ID: 5050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5051 - Posted: 20 Jul 2017, 18:08:52 UTC

Have updating the project from Boinc-manager (Host 1165).

In Statistics - best Computer, the statistic of the Computer (2.place) is ok.

The contact of the Computer was ok up to 15.July.

Thank you for help.
ID: 5051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 1
Message 5052 - Posted: 20 Jul 2017, 19:07:42 UTC - in response to Message 5050.  
Last modified: 20 Jul 2017, 20:00:05 UTC

I've been concentrating on Sixtrack on the production site recently but I keep looking in here regularly.
Manual updates record today's date on 2 hosts that last returned work on 25th June (host 282) and 1st July (host 508), however, on the 3rd host (289) which last returned work on 15th June, and had therefore dropped below the 30 day limit, didn't register. It updated credit total in Boinc on that host to include that earned by the other hosts but last contact on its host page remains at 15th June.

Could it be that hosts marked as inactive, more than 30 days since last contact, are being ignored completely?

Also, stats are not being updated to external sites since 6th June.

-----------

I let it fetch a Benchmark, which shows on its Tasks page, but on the Computers list page the last contact remains as 15th June. ?Hmmmm?

[Update]That Benchmark, and a 2nd, completed and credited in Boinc on the host, host total and user total on these pages, but no change to "last contact" time.
ID: 5052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5053 - Posted: 21 Jul 2017, 4:48:28 UTC
Last modified: 21 Jul 2017, 4:48:55 UTC

Ray, thank you. Have three Computer refreshed with Boinc-manager, without doing work. The Contact date/time is from now.

It seem so, that after more than 30 days the Computer is a unused one.
ID: 5053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nikogianna
Project administrator
Project developer
Project tester

Send message
Joined: 21 Feb 17
Posts: 21
Credit: 195,770
RAC: 0
Message 5054 - Posted: 21 Jul 2017, 7:36:17 UTC

Will have to investigate what the behaviour of the scheduler is when a host is inactive for more than 30 days (there seems to be a pattern here). Normally it should not fail to update the DB record for last contact time, no matter when last contact was made. There might even be a connection with the statistics export failing since the beginning of June.
ID: 5054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5055 - Posted: 22 Jul 2017, 1:45:05 UTC - in response to Message 5054.  

It isn't on our end for sure nikogianna

We can look at the Valid tasks we have from our account but that date does not update and btw I have been the one at the top of the stats page for a long time with RAC and Total and you can see the member at the top of the RAC list has not done one single task for the last 2 months so that RAC 15,975 is just a bizarre server problem
https://lhcathomedev.cern.ch/lhcathome-dev/top_users.php
(which has been mentioned here several times)
the site can't even change the *User of the Day* here (that one has been there for 2 months)

I am not new here and have been running these tasks on these computers since the beginning and of course always have to d/l those vdi updates.

And my VB versions are also current.
Mad Scientist For Life
ID: 5055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tern

Send message
Joined: 21 Sep 15
Posts: 89
Credit: 383,017
RAC: 0
Message 5057 - Posted: 30 Jul 2017, 13:43:57 UTC

Bump.
ID: 5057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5061 - Posted: 4 Aug 2017, 7:05:18 UTC

A quick and easy way to see what we have been saying for the last 2+ months

Go to my account and click on Computers on this account

There it says no computers for over 30 days.........and if you then check Show: All computers and you will see mine and it will also show the Last contact from the main 6 I use here will say different dates back in JUNE

NOW the problem with that is that the fact is that they all run here 24/7

Now go back to my account page again and check Tasks View and there you will see those computers are turning in around 20 Valids per day.

Which is why when you go to the stats page
https://lhcathomedev.cern.ch/lhcathome-dev/top_users.php?sort_by=total_credit you will see I have almost 3 million Total Credits and the RAC is over 14,000

BUT if you check the RAC on the Top Participants page you will see the #1 member is Paul who has not ran one single task here since June 6th

So his RAC and Total has been the same since the first couple days of JUNE

And if you check several others that is listed on the RAC page saying they are still here running tasks you will see they also are not here running anything. (and I have got pm's from some that said since nothing is being done they will just see if they can have better luck over at LHC)


And also on the Server page you can see not many members are here.

I know Ivan runs most of the CMS and I run most of the Theory tasks.

And we have just a couple other members just running a single computer here and the rest have left.

I have been testing the different multi-core versions of the Theory tasks.

2-core
3-core
4-core
and now I have several 8-core tasks and with the Theory tasks they all work

Also as I mentioned before this site has not even been able to change that UOTD for over 2 months.

Over at LHC (where I also run lots of tasks) there are still problems with the Theory tasks and of course we all know about the Sixtrtack problems.


I have seen Einstein mentioned over there to compare the lack of problems they have.......well I haven't run any CPU tasks there for years but I do run thousands of GPU's there with no problems and they also have a different version for the Credit numbers (they have the same amount no matter how long it takes and no matter what GPU card is being used) ........but that part doesn't apply here.
Mad Scientist For Life
ID: 5061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Another Problem


©2024 CERN