Message boards :
Number crunching :
Job-time reported incorrect after VM-sleep
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 20 |
One whole task with VM-jobs coming all the time returned. In those 24 hours elapsed time the VM did 26 jobs. Minimum job-time reported was 2240 seconds, the maximum was 4570 seconds which was the first. The average of the 26 jobs was 3126 seconds. Memory usage of the 26 jobs was rather equal: 882MB on average. Minimum 871MB and Maximum 905MB. Now the title of this thread explained: After a pause of the VM (VM in saved state) and a resume of the VM the runtime of the current VM-job is counted including the sleep time. This could be several hours (me and BOINC sleeping) or days when BOINC should do other projects first. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
Do you have any sense of whether that is specific to this project, or more an underlying problem with BOINC/vmwrapper? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 20 |
Do you have any sense of whether that is specific to this project, or more an underlying problem with BOINC/vmwrapper? It has nothing to do with BOINC or vboxwrapper. 23/05/15 23:24:08 Beginning CMSSW wrapper script 23/05/15 23:24:08 slc5_amd64_gcc462 scramv1 CMSSW 23/05/15 23:24:08 Performing SCRAM setup... 23/05/15 23:24:08 Completed SCRAM setup 23/05/15 23:24:08 Retrieving SCRAM project... 23/05/15 23:24:08 Untarring /home/boinc/CMSRun/sandbox.tar.gz 23/05/15 23:24:08 Completed SCRAM project 23/05/15 23:24:08 Executing CMSSW 23/05/15 23:24:08 cmsRun -j FrameworkJobReport.xml PSet.py 23/05/15 23:26:12 Begin processing the 1st record. Run 1, Event 101, LumiSection 1 at 23-May-2015 23:26:12.118 CEST 23/05/15 23:26:18 Begin processing the 2nd record. Run 1, Event 102, LumiSection 1 at 23-May-2015 23:26:18.581 CEST 23/05/15 23:26:18 %MSG-w MemoryCheck: FastjetJetProducer:kt6GenJets 23-May-2015 23:26:18 CEST Run: 1 Event: 102 23/05/15 23:26:18 MemoryCheck: module FastjetJetProducer:kt6GenJets VSIZE 822.074 -0.5 RSS 513.012 0.0664062 23/05/15 23:26:18 23/05/15 23:26:18 %MSG 23/05/15 23:26:34 %MSG-w MemoryCheck: OscarProducer:g4SimHits 23-May-2015 23:26:33 CEST Run: 1 Event: 102 23/05/15 23:26:34 MemoryCheck: module OscarProducer:g4SimHits VSIZE 826.574 4.5 RSS 517.402 4.39062 23/05/15 23:26:34 23/05/15 23:26:34 %MSG 23/05/15 23:26:34 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 23-May-2015 23:26:33 CEST Run: 1 Event: 102 23/05/15 23:26:34 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 827.824 1.25 RSS 518.43 1.02734 23/05/15 23:26:34 23/05/15 23:26:34 %MSG 23/05/15 23:26:34 Begin processing the 3rd record. Run 1, Event 103, LumiSection 1 at 23-May-2015 23:26:33.951 CEST 23/05/15 23:26:54 %MSG-w MemoryCheck: OscarProducer:g4SimHits 23-May-2015 23:26:54 CEST Run: 1 Event: 103 23/05/15 23:26:54 MemoryCheck: module OscarProducer:g4SimHits VSIZE 830.344 2.51953 RSS 520.863 2.43359 23/05/15 23:26:54 23/05/15 23:26:54 %MSG 23/05/15 23:26:54 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 23-May-2015 23:26:54 CEST Run: 1 Event: 103 23/05/15 23:26:54 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 831.094 0.75 RSS 521.598 0.734375 23/05/15 23:26:54 23/05/15 23:26:54 %MSG 23/05/15 23:26:54 Begin processing the 4th record. Run 1, Event 104, LumiSection 1 at 23-May-2015 23:26:54.363 CEST 24/05/15 07:13:32 %MSG-w MemoryCheck: OscarProducer:g4SimHits 24-May-2015 07:13:31 CEST Run: 1 Event: 104 24/05/15 07:13:32 MemoryCheck: module OscarProducer:g4SimHits VSIZE 832.594 1.5 RSS 522.777 1.17969 24/05/15 07:13:32 24/05/15 07:13:32 %MSG 24/05/15 07:13:32 Begin processing the 5th record. Run 1, Event 105, LumiSection 1 at 24-May-2015 07:13:31.763 CEST 24/05/15 07:14:18 %MSG-w MemoryCheck: OscarProducer:g4SimHits 24-May-2015 07:14:18 CEST Run: 1 Event: 105 24/05/15 07:14:18 MemoryCheck: module OscarProducer:g4SimHits VSIZE 836.094 3.5 RSS 526.773 3.99609 24/05/15 07:14:18 24/05/15 07:14:18 %MSG 24/05/15 07:14:19 %MSG-w MemoryCheck: PoolOutputModule:RAWSIMoutput 24-May-2015 07:14:18 CEST Run: 1 Event: 105 24/05/15 07:14:19 MemoryCheck: module PoolOutputModule:RAWSIMoutput VSIZE 838.094 2 RSS 528.062 1.28906 24/05/15 07:14:19 24/05/15 07:14:19 %MSG 24/05/15 07:14:19 Begin processing the 6th record. Run 1, Event 106, LumiSection 1 at 24-May-2015 07:14:18.818 CEST . . . 24/05/15 07:53:37 Begin processing the 92nd record. Run 1, Event 192, LumiSection 1 at 24-May-2015 07:53:36.676 CEST 24/05/15 07:53:38 Begin processing the 93rd record. Run 1, Event 193, LumiSection 1 at 24-May-2015 07:53:37.366 CEST 24/05/15 07:54:24 Begin processing the 94th record. Run 1, Event 194, LumiSection 1 at 24-May-2015 07:54:23.293 CEST 24/05/15 07:54:42 Begin processing the 95th record. Run 1, Event 195, LumiSection 1 at 24-May-2015 07:54:42.022 CEST 24/05/15 07:54:47 Begin processing the 96th record. Run 1, Event 196, LumiSection 1 at 24-May-2015 07:54:46.129 CEST 24/05/15 07:55:08 Begin processing the 97th record. Run 1, Event 197, LumiSection 1 at 24-May-2015 07:55:07.190 CEST 24/05/15 07:55:09 Begin processing the 98th record. Run 1, Event 198, LumiSection 1 at 24-May-2015 07:55:08.561 CEST 24/05/15 07:55:14 Begin processing the 99th record. Run 1, Event 199, LumiSection 1 at 24-May-2015 07:55:14.016 CEST 24/05/15 07:56:14 Begin processing the 100th record. Run 1, Event 200, LumiSection 1 at 24-May-2015 07:56:13.133 CEST 24/05/15 07:56:41 MemoryReport> Peak virtual size 876.648 Mbytes 24/05/15 07:56:41 Key events increasing vsize: 24/05/15 07:56:41 [5 run: 1 lumi: 1 event: 105 vsize = 838.094 deltaVsize = 6 rss = 528.062 delta = 9.63281 24/05/15 07:56:41 [14 run: 1 lumi: 1 event: 114 vsize = 847.398 deltaVsize = 9.30469 rss = 538.367 delta = 10.3047 24/05/15 07:56:41 [23 run: 1 lumi: 1 event: 123 vsize = 862.898 deltaVsize = 10.75 rss = 552.316 delta = 13.9492 24/05/15 07:56:41 [46 run: 1 lumi: 1 event: 146 vsize = 872.898 deltaVsize = 7.25 rss = 500.926 delta = -51.3906 24/05/15 07:56:41 [70 run: 1 lumi: 1 event: 170 vsize = 875.648 deltaVsize = 2.75 rss = 504.266 delta = 3.33984 24/05/15 07:56:41 [99 run: 1 lumi: 1 event: 199 vsize = 875.648 deltaVsize = -1 rss = 484.457 delta = -0.792969 24/05/15 07:56:41 [70 run: 1 lumi: 1 event: 170 vsize = 875.648 deltaVsize = 2.75 rss = 504.266 delta = 3.33984 24/05/15 07:56:41 [91 run: 1 lumi: 1 event: 191 vsize = 876.648 deltaVsize = 1 rss = 485.25 delta = -19.0156 24/05/15 07:56:41 TimeReport> Time report complete in 30682.4 seconds 24/05/15 07:56:41 Time Summary: 24/05/15 07:56:41 - Min event: 0.00643206 24/05/15 07:56:41 - Max event: 27997.4 24/05/15 07:56:41 - Avg event: 306.824 24/05/15 07:56:41 - Total job: 30682.4 24/05/15 07:56:41 CPU Summary: 24/05/15 07:56:41 - Min event: 0.006 24/05/15 07:56:41 - Max event: 100.229 24/05/15 07:56:41 - Avg event: 23.1414 24/05/15 07:56:41 - Total job: 2362.17 24/05/15 07:56:41 - Total event: 2314.14 When I have a closer look now, I see that with "TimeReport> Time report complete in 30682.4 seconds" the total elapsed time is meant; the cpu time needed is noted correctly. My machine was shutdown overnight. Before the shutdown, I suspended the task and stopped BOINC. In those cases the TimeReport is not very useful. Thanks Ivan for paying attention. |
©2024 CERN