Message boards : Sixtrack Application : Throughput Testing
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 42
Message 5405 - Posted: 18 Apr 2018, 13:50:25 UTC
Last modified: 18 Apr 2018, 13:50:40 UTC

We are scale testing the throughput of our servers. There will be a flood of jobs but they will quickly fail or be cancelled. This is expected. They will not use much CPU but could use the network. Please stop accepting Sixtrack tasks if you don't want the test jobs to hit your machine.
ID: 5405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5406 - Posted: 18 Apr 2018, 18:27:36 UTC - in response to Message 5405.  
Last modified: 18 Apr 2018, 18:41:11 UTC

All of these that I have attempted to get have failed in download of the "exe". Individual job zips download fine, although with a stutter

18/04/2018 19:13:06 | lhcathome-dev | Started download of sixtrack_win32_466_sse2.exe
18/04/2018 19:13:06 | lhcathome-dev | Started download of d5833ac43aa4338f74b1f1c7a6f8a160.zip
18/04/2018 19:13:08 | | Project communication failed: attempting access to reference site
18/04/2018 19:13:08 | lhcathome-dev | Temporarily failed download of sixtrack_win32_466_sse2.exe: transient HTTP error
18/04/2018 19:13:08 | lhcathome-dev | Backing off 00:02:33 on download of sixtrack_win32_466_sse2.exe
18/04/2018 19:13:08 | lhcathome-dev | Finished download of d5833ac43aa4338f74b1f1c7a6f8a160.zip
18/04/2018 19:13:08 | lhcathome-dev | Started download of 76275e65c45d5f1b6751179b9f7e4ff8.zip
18/04/2018 19:13:09 | | Internet access OK - project servers may be temporarily down.
18/04/2018 19:13:10 | lhcathome-dev | Finished download of 76275e65c45d5f1b6751179b9f7e4ff8.zip
18/04/2018 19:14:16 | lhcathome-dev | [checkpoint] result Theory_558819_1523966612.069130_0 checkpointed
18/04/2018 19:15:42 | lhcathome-dev | File sixtrack_win32_466_sse2.exe exists already, skipping download
18/04/2018 19:15:42 | lhcathome-dev | [error] Signature verification failed for sixtrack_win32_466_sse2.exe
18/04/2018 19:15:42 | lhcathome-dev | [error] Checksum or signature error for sixtrack_win32_466_sse2.exe

Similar for the 64bit exe on the other host, although it hasn't failed yet just several backoffs and retries.
ID: 5406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5407 - Posted: 18 Apr 2018, 20:49:00 UTC
Last modified: 18 Apr 2018, 20:49:48 UTC

Interesting situation with work unit 684064 where both instances were assigned to one of my machines!
I didn't think this was possible and it's certainly not desirable from the "cross-checking of results" viewpoint.
ID: 5407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5409 - Posted: 19 Apr 2018, 17:42:22 UTC

Still difficulty downloading the .exe and I'm once again my own wingman (twice) with 2 instances of 689732 on one machine and 2 instances of 689778 on another.
ID: 5409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 42
Message 5410 - Posted: 20 Apr 2018, 14:14:16 UTC - in response to Message 5407.  

With many jobs and only a few machines, the probability of this happening is quite high.
ID: 5410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 42
Message 5411 - Posted: 20 Apr 2018, 14:17:41 UTC - in response to Message 5409.  

The scale tests were encouraging. We can inject 15K jobs into our backend server and move them to the BONIC server. We are scaling back to about 100 jobs now to investigate the reliability. I am interested to follow up on the download errors.
ID: 5411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5412 - Posted: 20 Apr 2018, 17:00:00 UTC - in response to Message 5410.  
Last modified: 20 Apr 2018, 17:10:14 UTC

Thanks Laurence,
My concern was that a problem with a particular host might return an erroneous result, backed up by the return of a similarly erroneous result from the same host, resulting in a "wrong" answer being validated and treated as being "correct".

Still problem with exe download. I aborted all the offending tasks and stuck transfer, deleted the zero size file remnant in the Project folder and restarted Boinc but the new download is still stuck. No tasks available.

20/04/2018 17:57:19 | lhcathome-dev | Started download of sixtrack_win64_466_sse2.exe
20/04/2018 17:57:22 | | Project communication failed: attempting access to reference site
20/04/2018 17:57:22 | lhcathome-dev | Temporarily failed download of sixtrack_win64_466_sse2.exe: transient HTTP error
20/04/2018 17:57:22 | lhcathome-dev | Backing off 00:13:50 on download of sixtrack_win64_466_sse2.exe

although other errors have shown up as
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>sixtrack_win32_466_sse2.exe</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
<error_message>signature verification failed</error_message>
and some other ones had something about the file being the "wrong size" but I can't find them so maybe the ones that were cancelled by server.
ID: 5412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 42
Message 5413 - Posted: 25 Apr 2018, 7:17:04 UTC - in response to Message 5406.  

I have deprecated the versions that were failing. The jobs should all run now. We are continuing with the scale tests.
ID: 5413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 664
Credit: 1,791,095
RAC: 4,208
Message 5422 - Posted: 3 May 2018, 11:34:47 UTC

Hi Laurence,
Sixtrack work is shown today on the Server, but my Computer(ID=2247) say no work.
ID: 5422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Sixtrack Application : Throughput Testing


©2024 CERN