Message boards : Theory Application : Unable to ping HTCondor
Message board moderation

To post messages, you must log in.

AuthorMessage
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 404
Message 4217 - Posted: 22 Oct 2016, 10:18:08 UTC

There have been a couple of failures recently, like this. Presumably there is a timeout on this, and maybe it's exacerbated by "other" recent net activity; but can the timeout be increased, please?... one fewer cause of 206 errors.
ID: 4217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,603,490
RAC: 1,713
Message 4218 - Posted: 22 Oct 2016, 20:20:32 UTC - in response to Message 4217.  

There have been a couple of failures recently, like this. Presumably there is a timeout on this, and maybe it's exacerbated by "other" recent net activity; but can the timeout be increased, please?... one fewer cause of 206 errors.


I don't know anything about using Linux here but it says that pc has never contacted the server.

And just that one failed task.

https://lhcathome.cern.ch/vLHCathome-dev/show_host_detail.php?hostid=1497

Maybe try a VB update to a newer version (and reboot) and I always do a clean update just to make sure it works.

And you could add another 4GB of ram since yours is real close to not having enough to even run these VB tasks (find a good deal and you will like the way it runs just doing that)

Here is that one task https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=269684

You have a 2-core .......did you have any other type of Boinc task running or ever had it work with another type?

And what type of ISP (and do you have some type of firewall or security program running?)
Mad Scientist For Life
ID: 4218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 404
Message 4219 - Posted: 22 Oct 2016, 23:22:21 UTC - in response to Message 4218.  

There have been a couple of failures recently, like this. Presumably there is a timeout on this, and maybe it's exacerbated by "other" recent net activity; but can the timeout be increased, please?... one fewer cause of 206 errors.

I don't know anything about using Linux here but it says that pc has never contacted the server.

And just that one failed task.

Moving to the new server required detaching from and re-attaching to the project, this host ended up with a different host ID, the original ID was 266 - it's back there now. This also meant it got an (unwanted) task which I let run to see how it went. It doesn't usually run this project nowadays but should work.

Maybe try a VB update to a newer version (and reboot) and I always do a clean update just to make sure it works.

It doesn't (or didn't) seem broke... but you could be right; on the TODO list.

And you could add another 4GB of ram since yours is real close to not having enough to even run these VB tasks (find a good deal and you will like the way it runs just doing that)

I know, but that's all it will take. I originally hoped that BOINC would sort this out for itself ("waiting for memory") but it doesn't. So, as projects tend to increase their RAM needs over time (the original T4T was 256MB -happy days) those hosts with limited RAM now get app_config files to only allow the task combinations that will fit - a bit of trial and error there and it's a work in progress. But not really relevant to this.

You have a 2-core .......did you have any other type of Boinc task running or ever had it work with another type?

Shouldn't have, but maybe... and yes.

And what type of ISP (and do you have some type of firewall or security program running?)

The firewall is common to all the hosts - no problem there. The root cause is almost certainly caused by slow network response possibly ISP activity - they're always posting notices of overnight maintenance work of some sort, somewhere plus there were all those botnet members busy DDOSing Dyn or whoever. OpenDNS is set as third choice DNS (1st and 2nd are my ISP which may well have used Dyn) so it could have taken several seconds to get through. A longer timeout for the ping would make things a bit more tolerant of network delays.
ID: 4219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,603,490
RAC: 1,713
Message 4220 - Posted: 23 Oct 2016, 10:28:26 UTC

OK well that must mean it is a 32bit-X86 OS and yeah they will not *see* any more Ram than the original 4GB

I have had a pair of 3-cores for a long time that I use for Einstein GPU's and vLHC tasks and they both were X86 versions of XP Pro so just to find out I updated one to Windows 7 X64 and now it does *see* the other 4GB I had plugged in before and now it can use all 8GB

I left the other one running the old X86 XP Pro with just the 4GB Ram just because I want to see how long it will run (it is #1 on the vLHC stats page)

I never have problems with it and the other one running X64 is also running fine.

BUT you are right about the ISP's we are forced to use with these VB tasks.

I have been running those tasks for almost 6 years now and the main problem has been VB failing because of my over-priced and usually pitiful DSL

In fact I lost my connection for about 30mins tonight and lost a few VB tasks since they would not restart VB back to where it was when it lost that connection.

No problems with any of the Einstein GPU tasks since Virtualbox is not involved.

Sometimes with mine it seems like the VB tasks are more of a way to test my DSL than it is with colliding particles.
Mad Scientist For Life
ID: 4220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 404
Message 4221 - Posted: 23 Oct 2016, 11:31:22 UTC - in response to Message 4220.  
Last modified: 23 Oct 2016, 11:49:03 UTC

OK well that must mean it is a 32bit-X86 OS and yeah they will not *see* any more Ram than the original 4GB

No, that's the capacity of the MB.

I left the other one running the old X86 XP Pro with just the 4GB Ram just because I want to see how long it will run (it is #1 on the vLHC stats page)

I can't get any of my 32bit XP hosts to run the Condor 32bit theory app. I've given up, at least for now; although I've looked enviously at yours...

BUT you are right about the ISP's we are forced to use with these VB tasks.

I have been running those tasks for almost 6 years now and the main problem has been VB failing because of my over-priced and usually pitiful DSL

Seen a couple of these now:-

2016-10-23 01:10:26 (2201): VM Completion Message: Could not connect to lhchomeproxy.cern.ch on port 3125

Never seen them before so something's going on...

In fact I lost my connection for about 30mins tonight and lost a few VB tasks since they would not restart VB back to where it was when it lost that connection.

That's a very good point. Hosts here shut down and restart every day, this is at the root of a lot of odd failures with Condor based projects, I think.

Sometimes with mine it seems like the VB tasks are more of a way to test my DSL than it is with colliding particles.

Completely agree, but also feel that these projects could be better designed to allow for "consumer grade" setups (longer timeouts, compressed files, more allowance for network interruptions, low speeds etc).[rant] BOINC is, after all, designed to use "spare" capacity on existing consumer systems - not require 20 core hosts with 20Mb/s connections running 24/7. [/rant]. Even my systems run OK if I leave them running all the time, but too expensive.
ID: 4221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 750
Credit: 11,603,490
RAC: 1,713
Message 4222 - Posted: 24 Oct 2016, 11:33:27 UTC

Well since it is 4:30am here and I am about to go fake sleep I will just make a quick reply.......I agree with you M

I just try to pretend I don't have to pay thousands of dollars doing this 24/7 since 2000
Goodnight
Mad Scientist For Life
ID: 4222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : Unable to ping HTCondor


©2024 CERN