Message boards :
Sixtrack Application :
Throughput Testing
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
We are scale testing the throughput of our servers. There will be a flood of jobs but they will quickly fail or be cancelled. This is expected. They will not use much CPU but could use the network. Please stop accepting Sixtrack tasks if you don't want the test jobs to hit your machine. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
All of these that I have attempted to get have failed in download of the "exe". Individual job zips download fine, although with a stutter 18/04/2018 19:13:06 | lhcathome-dev | Started download of sixtrack_win32_466_sse2.exe 18/04/2018 19:13:06 | lhcathome-dev | Started download of d5833ac43aa4338f74b1f1c7a6f8a160.zip 18/04/2018 19:13:08 | | Project communication failed: attempting access to reference site 18/04/2018 19:13:08 | lhcathome-dev | Temporarily failed download of sixtrack_win32_466_sse2.exe: transient HTTP error 18/04/2018 19:13:08 | lhcathome-dev | Backing off 00:02:33 on download of sixtrack_win32_466_sse2.exe 18/04/2018 19:13:08 | lhcathome-dev | Finished download of d5833ac43aa4338f74b1f1c7a6f8a160.zip 18/04/2018 19:13:08 | lhcathome-dev | Started download of 76275e65c45d5f1b6751179b9f7e4ff8.zip 18/04/2018 19:13:09 | | Internet access OK - project servers may be temporarily down. 18/04/2018 19:13:10 | lhcathome-dev | Finished download of 76275e65c45d5f1b6751179b9f7e4ff8.zip 18/04/2018 19:14:16 | lhcathome-dev | [checkpoint] result Theory_558819_1523966612.069130_0 checkpointed 18/04/2018 19:15:42 | lhcathome-dev | File sixtrack_win32_466_sse2.exe exists already, skipping download 18/04/2018 19:15:42 | lhcathome-dev | [error] Signature verification failed for sixtrack_win32_466_sse2.exe 18/04/2018 19:15:42 | lhcathome-dev | [error] Checksum or signature error for sixtrack_win32_466_sse2.exe Similar for the 64bit exe on the other host, although it hasn't failed yet just several backoffs and retries. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
Interesting situation with work unit 684064 where both instances were assigned to one of my machines! I didn't think this was possible and it's certainly not desirable from the "cross-checking of results" viewpoint. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
With many jobs and only a few machines, the probability of this happening is quite high. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The scale tests were encouraging. We can inject 15K jobs into our backend server and move them to the BONIC server. We are scaling back to about 100 jobs now to investigate the reliability. I am interested to follow up on the download errors. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
Thanks Laurence, My concern was that a problem with a particular host might return an erroneous result, backed up by the return of a similarly erroneous result from the same host, resulting in a "wrong" answer being validated and treated as being "correct". Still problem with exe download. I aborted all the offending tasks and stuck transfer, deleted the zero size file remnant in the Project folder and restarted Boinc but the new download is still stuck. No tasks available. 20/04/2018 17:57:19 | lhcathome-dev | Started download of sixtrack_win64_466_sse2.exe 20/04/2018 17:57:22 | | Project communication failed: attempting access to reference site 20/04/2018 17:57:22 | lhcathome-dev | Temporarily failed download of sixtrack_win64_466_sse2.exe: transient HTTP error 20/04/2018 17:57:22 | lhcathome-dev | Backing off 00:13:50 on download of sixtrack_win64_466_sse2.exe although other errors have shown up as app_version download error: couldn't get input files: <file_xfer_error> <file_name>sixtrack_win32_466_sse2.exe</file_name> <error_code>-120 (RSA key check failed for file)</error_code> <error_message>signature verification failed</error_message> and some other ones had something about the file being the "wrong size" but I can't find them so maybe the ones that were cancelled by server. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I have deprecated the versions that were failing. The jobs should all run now. We are continuing with the scale tests. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Hi Laurence, Sixtrack work is shown today on the Server, but my Computer(ID=2247) say no work. |
©2024 CERN