Message boards : Number crunching : Job-time reported incorrect after VM-sleep
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 50
Message 402 - Posted: 25 May 2015, 7:40:06 UTC

One whole task with VM-jobs coming all the time returned.

In those 24 hours elapsed time the VM did 26 jobs.

Minimum job-time reported was 2240 seconds, the maximum was 4570 seconds which was the first.
The average of the 26 jobs was 3126 seconds.

Memory usage of the 26 jobs was rather equal: 882MB on average. Minimum 871MB and Maximum 905MB.

Now the title of this thread explained:

After a pause of the VM (VM in saved state) and a resume of the VM the runtime of the current VM-job is counted including the sleep time.
This could be several hours (me and BOINC sleeping) or days when BOINC should do other projects first.
ID: 402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,310,612
RAC: 245
Message 404 - Posted: 25 May 2015, 18:27:02 UTC - in response to Message 402.  

Do you have any sense of whether that is specific to this project, or more an underlying problem with BOINC/vmwrapper?
ID: 404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 50
Message 405 - Posted: 25 May 2015, 20:08:48 UTC - in response to Message 404.  

Do you have any sense of whether that is specific to this project, or more an underlying problem with BOINC/vmwrapper?

It has nothing to do with BOINC or vboxwrapper.

23/05/15 23:24:08 Beginning CMSSW wrapper script
23/05/15 23:24:08 slc5_amd64_gcc462 scramv1 CMSSW
23/05/15 23:24:08 Performing SCRAM setup...
23/05/15 23:24:08 Completed SCRAM setup
23/05/15 23:24:08 Retrieving SCRAM project...
23/05/15 23:24:08 Untarring /home/boinc/CMSRun/sandbox.tar.gz
23/05/15 23:24:08 Completed SCRAM project
23/05/15 23:24:08 Executing CMSSW
23/05/15 23:24:08 cmsRun -j FrameworkJobReport.xml PSet.py
23/05/15 23:26:12 Begin processing the 1st record. Run 1, Event 101, LumiSection 1 at 23-May-2015 23:26:12.118 CEST
23/05/15 23:26:18 Begin processing the 2nd record. Run 1, Event 102, LumiSection 1 at 23-May-2015 23:26:18.581 CEST
23/05/15 23:26:18 %MSG-w MemoryCheck: FastjetJetProducer:kt6GenJets 23-May-2015 23:26:18 CEST Run: 1 Event: 102
23/05/15 23:26:18 MemoryCheck: module FastjetJetProducer:kt6GenJets VSIZE 822.074 -0.5 RSS 513.012 0.0664062
23/05/15 23:26:18
23/05/15 23:26:18 %MSG
23/05/15 23:26:34 %MSG-w MemoryCheck: OscarProducer:g4SimHits 23-May-2015 23:26:33 CEST Run: 1 Event: 102
23/05/15 23:26:34 MemoryCheck: module OscarProducer:g4SimHits VSIZE 826.574 4.5 RSS 517.402 4.39062
23/05/15 23:26:34
23/05/15 23:26:34 %MSG
23/05/15 23:26:34 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 23-May-2015 23:26:33 CEST Run: 1 Event: 102
23/05/15 23:26:34 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 827.824 1.25 RSS 518.43 1.02734
23/05/15 23:26:34
23/05/15 23:26:34 %MSG
23/05/15 23:26:34 Begin processing the 3rd record. Run 1, Event 103, LumiSection 1 at 23-May-2015 23:26:33.951 CEST
23/05/15 23:26:54 %MSG-w MemoryCheck: OscarProducer:g4SimHits 23-May-2015 23:26:54 CEST Run: 1 Event: 103
23/05/15 23:26:54 MemoryCheck: module OscarProducer:g4SimHits VSIZE 830.344 2.51953 RSS 520.863 2.43359
23/05/15 23:26:54
23/05/15 23:26:54 %MSG
23/05/15 23:26:54 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 23-May-2015 23:26:54 CEST Run: 1 Event: 103
23/05/15 23:26:54 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 831.094 0.75 RSS 521.598 0.734375
23/05/15 23:26:54
23/05/15 23:26:54 %MSG
23/05/15 23:26:54 Begin processing the 4th record. Run 1, Event 104, LumiSection 1 at 23-May-2015 23:26:54.363 CEST
24/05/15 07:13:32 %MSG-w MemoryCheck: OscarProducer:g4SimHits 24-May-2015 07:13:31 CEST Run: 1 Event: 104
24/05/15 07:13:32 MemoryCheck: module OscarProducer:g4SimHits VSIZE 832.594 1.5 RSS 522.777 1.17969
24/05/15 07:13:32
24/05/15 07:13:32 %MSG
24/05/15 07:13:32 Begin processing the 5th record. Run 1, Event 105, LumiSection 1 at 24-May-2015 07:13:31.763 CEST
24/05/15 07:14:18 %MSG-w MemoryCheck: OscarProducer:g4SimHits 24-May-2015 07:14:18 CEST Run: 1 Event: 105
24/05/15 07:14:18 MemoryCheck: module OscarProducer:g4SimHits VSIZE 836.094 3.5 RSS 526.773 3.99609
24/05/15 07:14:18
24/05/15 07:14:18 %MSG
24/05/15 07:14:19 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 24-May-2015 07:14:18 CEST Run: 1 Event: 105
24/05/15 07:14:19 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 838.094 2 RSS 528.062 1.28906
24/05/15 07:14:19
24/05/15 07:14:19 %MSG
24/05/15 07:14:19 Begin processing the 6th record. Run 1, Event 106, LumiSection 1 at 24-May-2015 07:14:18.818 CEST
.
.
.
24/05/15 07:53:37 Begin processing the 92nd record. Run 1, Event 192, LumiSection 1 at 24-May-2015 07:53:36.676 CEST
24/05/15 07:53:38 Begin processing the 93rd record. Run 1, Event 193, LumiSection 1 at 24-May-2015 07:53:37.366 CEST
24/05/15 07:54:24 Begin processing the 94th record. Run 1, Event 194, LumiSection 1 at 24-May-2015 07:54:23.293 CEST
24/05/15 07:54:42 Begin processing the 95th record. Run 1, Event 195, LumiSection 1 at 24-May-2015 07:54:42.022 CEST
24/05/15 07:54:47 Begin processing the 96th record. Run 1, Event 196, LumiSection 1 at 24-May-2015 07:54:46.129 CEST
24/05/15 07:55:08 Begin processing the 97th record. Run 1, Event 197, LumiSection 1 at 24-May-2015 07:55:07.190 CEST
24/05/15 07:55:09 Begin processing the 98th record. Run 1, Event 198, LumiSection 1 at 24-May-2015 07:55:08.561 CEST
24/05/15 07:55:14 Begin processing the 99th record. Run 1, Event 199, LumiSection 1 at 24-May-2015 07:55:14.016 CEST
24/05/15 07:56:14 Begin processing the 100th record. Run 1, Event 200, LumiSection 1 at 24-May-2015 07:56:13.133 CEST
24/05/15 07:56:41 MemoryReport> Peak virtual size 876.648 Mbytes
24/05/15 07:56:41 Key events increasing vsize:
24/05/15 07:56:41 [5 run: 1 lumi: 1 event: 105 vsize = 838.094 deltaVsize = 6 rss = 528.062 delta = 9.63281
24/05/15 07:56:41 [14 run: 1 lumi: 1 event: 114 vsize = 847.398 deltaVsize = 9.30469 rss = 538.367 delta = 10.3047
24/05/15 07:56:41 [23 run: 1 lumi: 1 event: 123 vsize = 862.898 deltaVsize = 10.75 rss = 552.316 delta = 13.9492
24/05/15 07:56:41 [46 run: 1 lumi: 1 event: 146 vsize = 872.898 deltaVsize = 7.25 rss = 500.926 delta = -51.3906
24/05/15 07:56:41 [70 run: 1 lumi: 1 event: 170 vsize = 875.648 deltaVsize = 2.75 rss = 504.266 delta = 3.33984
24/05/15 07:56:41 [99 run: 1 lumi: 1 event: 199 vsize = 875.648 deltaVsize = -1 rss = 484.457 delta = -0.792969
24/05/15 07:56:41 [70 run: 1 lumi: 1 event: 170 vsize = 875.648 deltaVsize = 2.75 rss = 504.266 delta = 3.33984
24/05/15 07:56:41 [91 run: 1 lumi: 1 event: 191 vsize = 876.648 deltaVsize = 1 rss = 485.25 delta = -19.0156
24/05/15 07:56:41 TimeReport> Time report complete in 30682.4 seconds
24/05/15 07:56:41 Time Summary:
24/05/15 07:56:41 - Min event: 0.00643206
24/05/15 07:56:41 - Max event: 27997.4
24/05/15 07:56:41 - Avg event: 306.824
24/05/15 07:56:41 - Total job: 30682.4
24/05/15 07:56:41 CPU Summary:
24/05/15 07:56:41 - Min event: 0.006
24/05/15 07:56:41 - Max event: 100.229
24/05/15 07:56:41 - Avg event: 23.1414
24/05/15 07:56:41 - Total job: 2362.17
24/05/15 07:56:41 - Total event: 2314.14


When I have a closer look now, I see that with "TimeReport> Time report complete in 30682.4 seconds" the total elapsed time is meant; the cpu time needed is noted correctly.
My machine was shutdown overnight. Before the shutdown, I suspended the task and stopped BOINC.
In those cases the TimeReport is not very useful.
Thanks Ivan for paying attention.
ID: 405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Job-time reported incorrect after VM-sleep


©2024 CERN