897 Hang

Case number:845829-997770
Topic:Crash/Hang
Opened by:gitwut
Status:Closed
Type:Bug
Opened on:Wednesday, May 21, 2014 - 14:10
Last modified:Friday, June 6, 2014 - 22:54

897 hung yesterday, probably not long after it started. The xml scriptlog is attached though there isn't much in it. I suspect that the puzzle is just too big, but don't see where anyone else has reported problems. I haven't had any similar problem with any CASP puzzles so far.

AttachmentSize
scriptlog-default.txt3.63 KB
(Wed, 05/21/2014 - 14:10  |  20 comments)


gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

Another crash, apparently not long after the script kicked off (Loop Rebuild 5.2):

http://fold.it/portal/recipe/48734

Same script as hang above.

<?xml version="1.0" encoding="UTF-8"?>
gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

Didn't like the xml log inside "pre" tags. The xml log shows that it only made it to the second set of rebuilds.

I've uploaded the puzzle to scientists as "gw9968.987 solo".

alcor29's picture
User offline. Last seen 2 hours 44 min ago. Offline
Joined: 11/16/2012

Re puzzle 897.

On Win 7, main client, crashed yesterday on low, running Quaking rebuild.

Crashed today on auto running same script.

args passed:
'C:\Foldit\Foldit.exe'
core.init: Rosetta version 50ca27ec2a52ddf29597add0d94a1b07f15ecde4 from https://github.com/RosettaCommons/main.git
core.init: command: C:\Foldit\Foldit.exe -database cmp-database-720f8f3c6f1b08bdbad2a60dee8dfb2d/database -resources cmp-resources-91373e2e6560ddccd3827577bf0e0080/resources -interactive_game novice -boinc_url https://fold.it
core.init: 'RNG device' seed mode, using 'CryptGenRandom', seed=-1037886837 seed_offset=0 real_seed=-1037886837
core.init.random: RandomGenerator:init: Normal mode, seed=-1037886837 RG_type=mt19937
starting the init thread!..
boinc base url: https://fold.it
checking updates...
binary
local: '2bfe2e07977035159ee66e7f30d9175d'
remote: '2bfe2e07977035159ee66e7f30d9175d'
database
local: '720f8f3c6f1b08bdbad2a60dee8dfb2d'
remote: '720f8f3c6f1b08bdbad2a60dee8dfb2d'
resources
local: '91373e2e6560ddccd3827577bf0e0080'
remote: '91373e2e6560ddccd3827577bf0e0080'
cleaning up old components:
binary 00000000000000000000000000000000
binary 2bfe2e07977035159ee66e7f30d9175d
database 720f8f3c6f1b08bdbad2a60dee8dfb2d
resources 00000000000000000000000000000000
resources 91373e2e6560ddccd3827577bf0e0080
CRASH: 462374

gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

Crashed again, this time I was trying Loop rebuild 3.5 but similar quick death (after rebuild 4).

wisky's picture
User offline. Last seen 1 day 17 min ago. Offline
Joined: 07/13/2011

I'm getting frequent crashes with rebuilders. Some can go all day without crashing, some crash within 1-4 hours... And Tvdl DRW takes 15-30 minutes before crash.

NickyCGS's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 06/27/2013
Groups: Repro-men

Thanks for the feedback. Gitwut or Alcor would either of you mind PMing me your options.txt files? I'm fairly certain I've found the cause of the crash and we can hopefully have a fix out soon.

wisky's picture
User offline. Last seen 1 day 17 min ago. Offline
Joined: 07/13/2011

Awesome!

Just for additional information... I didn't have any crashes at all while running band-only recipes. Seems to be specific to rebuilds.

gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

nickycgs, I sent you a PM with the options.txt file.

alcor29's picture
User offline. Last seen 2 hours 44 min ago. Offline
Joined: 11/16/2012

So did I, Nicky.

NickyCGS's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 06/27/2013
Groups: Repro-men

Alright, I've nailed down the exact cause of the issue and we should hopefully have a fix out soon. In the meantime, in your options.txt file change "graph_options/graph_length_value" : "100" to "graph_options/graph_length_value" : "25" or another lower number. Unfortunately it appears that the Undo Graph size is the cause of both these crashes and the memory usage detailed in http://fold.it/portal/node/997741
Added by Timo: Be aware that if you have multiple clients running you should change the graph setting on all clients before closing that client.

NickyCGS's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 06/27/2013
Groups: Repro-men

You can also edit this in game by decreasing Max Graph Length in the Graph Properties option of the Undo menu.

frood66's picture
User offline. Last seen 1 hour 46 min ago. Offline
Joined: 09/20/2011
Groups: Marvin's bunch

should everyone using devprev restrict graph length until there is a fix?

wisky's picture
User offline. Last seen 1 day 17 min ago. Offline
Joined: 07/13/2011

Thanks, NickyCGS!!

gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

I second that, thanks Nicky. I haven't had a crash on 897 since changing the option.

spmm's picture
User offline. Last seen 32 weeks 1 day ago. Offline
Joined: 08/05/2010
Groups: Void Crushers

A pic may also be of use Win 8.1:

See memory column:
897: 4 - 1,268.0 MB forgot to reduce the graph length, running a fuser with cuts, almost double the memory but should not crash the machine, as you can see it is only using 16% total and that is steady.

897: 3 - 748.8 MB - Void Crusher - graph memory and length reduced to 25
897: 2 - 888.8 MB -DRW - graph memory and length reduced to 25

Was previously crashing after an hour or so of DRW.

spmm's picture
User offline. Last seen 32 weeks 1 day ago. Offline
Joined: 08/05/2010
Groups: Void Crushers

'should not crash the machine' - should be 'should not crash the client' as there appears to be only 16% use of RAM.

spmm's picture
User offline. Last seen 32 weeks 1 day ago. Offline
Joined: 08/05/2010
Groups: Void Crushers

It appears that graph properties apply to ALL PUZLs - so if you need to set it low ie 25 for a big one 200+ residues, then it is 25 for the 71 residues puzls as well.
imo we need more practice on these big ones :) to work in conjunction with the smaller ones. Given the idea that we can step back through Undo.

gitwut's picture
User offline. Last seen 32 weeks 4 days ago. Offline
Joined: 05/18/2012
Groups: Contenders

Is there any word on when this will be fixed by an update?

I'm having difficulties recovering progress made on un-evolved Evos that have open cut-point scores higher than the credit-best scores. Clicking "restore credit best" restores the cut-point value and not the highest score without cut-points. In the past, it wasn't difficult to find a credit-best score in the undo buffer, but at 25, it is often impossible.

beta_helix's picture
User offline. Last seen 1 day 22 hours ago. Offline
Joined: 05/09/2008
Groups: None

We are working hard on getting this resolved, but it's proved trickier than we had hoped.
We hope to have the fix for this quite soon and post it to devprev.

Thank you for your patience with this nasty one!

beta_helix's picture
User offline. Last seen 1 day 22 hours ago. Offline
Joined: 05/09/2008
Groups: None
Status: Open » Closed

This has hopefully been addressed in the latest devprev update: http://fold.it/portal/node/997921
Please let us know if it still occurs!

Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons