GNU/Linux S5R3 "power users" App 4.21 available

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 253358595
RAC: 37836

RE: RE: Ubuntu 7.04

Message 76412 in response to message 76411

Quote:
Quote:

Ubuntu 7.04 Boinc 5.2.13

Wow!

But this is a segmentation fault inside the BOINC software itself, so someone should make a bug report to BOINC's TRAC system, I guess. This issue should not have any relation to Einstein@Home.

CU
Bikeman

Right, this is the BOINC Core Client. Given the pretty old version, though, I wonder if anyone @BOINC cares...

BM

BM

josep
josep
Joined: 9 Mar 05
Posts: 63
Credit: 1156542
RAC: 0

RE: Most, and perhaps

Message 76413 in response to message 76406

Quote:

Most, and perhaps all, of the signal 11 problems occurred when I had network problems. Also, the problem machines are all running the newer 5.10.x versions of BOINC.

Donald (and others), a possible solution to this would be to use a local DNS server, running BIND in one of your machines.

I have read that the signal 11 issues in newer versions of BOINC are related to the new syncronous access to DNS. If DNS is not available for a while, the core client remains stopped, trying to connect DNS, and the running science task fails.

Using a local DNS server, that will be always available, should solve the problem.

In my particular case, I had BIND already running, because my box crunching for E@H is a mail server. So I've simply pointed to it as the first DNS to look for (putting the chain "nameserver 127.0.0.1" as the first line of /etc/resolv.conf)

If you don't have BIND running on any machine, a "caching-only" installation of BIND should be enough. And that's always a nice addition to a local network, it will make all DNS lookups faster. Look at the following "HowTo":

http://www.langfeldt.net/DNS-HOWTO/BIND-9/

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: Most, and

Message 76414 in response to message 76413

Quote:
Quote:

Most, and perhaps all, of the signal 11 problems occurred when I had network problems. Also, the problem machines are all running the newer 5.10.x versions of BOINC.

Donald (and others), a possible solution to this would be to use a local DNS server, running BIND in one of your machines.

I have read that the signal 11 issues in newer versions of BOINC are related to the new syncronous access to DNS. If DNS is not available for a while, the core client remains stopped, trying to connect DNS, and the running science task fails.

Using a local DNS server, that will be always available, should solve the problem.

In my particular case, I had BIND already running, because my box crunching for E@H is a mail server. So I've simply pointed to it as the first DNS to look for (putting the chain "nameserver 127.0.0.1" as the first line of /etc/resolv.conf)

If you don't have BIND running on any machine, a "caching-only" installation of BIND should be enough. And that's always a nice addition to a local network, it will make all DNS lookups faster. Look at the following "HowTo":

http://www.langfeldt.net/DNS-HOWTO/BIND-9/

Interesting idea. I need to set up a BIND server, anyway, so I may give this a try.

josep
josep
Joined: 9 Mar 05
Posts: 63
Credit: 1156542
RAC: 0

Sorry, but I've done some

Sorry, but I've done some tests, and it seems that this "solution" does not solve anything...

I've been now experimenting with my DSL connection, disconnecting the router from the telephone line, something that produces obvious "connection problems", and then forcing an "update prefs" in BOINC's core client. The result: the running WU is immediately aborted with signal 11 error... I'm using BOINC 5.8.16 and einstein 4.20

I read the suggestion of using local DNS servers on BOINC's forums, but now I see that it does not work.

So it seems that the only possible fix by now is to use an older version of BOINC, that uses the old style asyncronous DNS lookup. This means a 5.4.x client or some of the very earlier versions of 5.8.x

josep
josep
Joined: 9 Mar 05
Posts: 63
Credit: 1156542
RAC: 0

Well, some more tests, and it

Well, some more tests, and it seems that perhaps a local DNS really helps...

Now I've left only in /etc/resolv.conf the first "nameserver 127.0.0.1" line, and removed any other "nameserver" in this file.

The present WU has survived the lack of network connection, and it is running OK now. I will see if it is validated at last.

The machine that runs the local DNS server cannot be switched off or restarted during a network failure. Switching it off erases all data in BIND's cache (stored in RAM memory), and the next attemp of a BOINC client to read DNS data will fail if there is no Internet connection available.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 819061042
RAC: 1275072

Hi! RE: So it seems

Hi!

Quote:


So it seems that the only possible fix by now is to use an older version of BOINC, that uses the old style asyncronous DNS lookup. This means a 5.4.x client or some of the very earlier versions of 5.8.x

OTOH, if you downgrade BOINC, you'll get some more bugs that have long since been fixed in BOINC.

I guess that crunchers who know they have frequent connection problems should consider setting "Network activity" to "Never" in the Boinc GUI and periodically (e.g once a day) press the Update button after checking that the Internet connection is working. This is especially true for users who have slower PCs because it's not nice to loose (say) a full day of crunching.

CU
Bikeman

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: Hi!RE: So it seems

Message 76418 in response to message 76417

Quote:
Hi!
Quote:


So it seems that the only possible fix by now is to use an older version of BOINC, that uses the old style asyncronous DNS lookup. This means a 5.4.x client or some of the very earlier versions of 5.8.x

OTOH, if you downgrade BOINC, you'll get some more bugs that have long since been fixed in BOINC.

I guess that crunchers who know they have frequent connection problems should consider setting "Network activity" to "Never" in the Boinc GUI and periodically (e.g once a day) press the Update button after checking that the Internet connection is working. This is especially true for users who have slower PCs because it's not nice to loose (say) a full day of crunching.

CU
Bikeman

Actually, I'm still running some old P-III machines, so it would be more like two to three days worth of crunching.

Melvin Bobo Slacke
Melvin Bobo Slacke
Joined: 22 Jan 05
Posts: 32
Credit: 1903599
RAC: 2913

RE: Right, this is the

Message 76419 in response to message 76412

Quote:

Right, this is the BOINC Core Client. Given the pretty old version, though, I wonder if anyone @BOINC cares...

BM

Thanks

Guess I will stick with this old Boinc 5.2.13 anyway, it was reliable for a long time.

Melvin Bobo Slacke
Melvin Bobo Slacke
Joined: 22 Jan 05
Posts: 32
Credit: 1903599
RAC: 2913

Another 13 units lost when I

Another 13 units lost when I forgot to switch off the network in Boinc Manager, signal 11 yes, hostid=1090631
Fedora C6 Boinc 5.10.21

Well I am a rather normal person ( I think ;)) and miss things now and then but I don't have the time to babysit Boinc.
Restarted with 5.2.13.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 253358595
RAC: 37836

The "signal 11" happens in

The "signal 11" happens in the BOINC library (the part of BOINC that gets linked into the application) whenever the Core Client becomes unresponsive. Newer Clients seem to become unresponsive more often than older ones (e.g. for DNS requests), but in principle it could happen with older Clients, too. We are working on fixing the problem.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.