My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent
I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.
My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent
I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.
Thanks for the report.
Ths "file ... 4.02 ... not found" could safely be ignored if the 4.21 App was working on your machine. However this doesn't seem to be the case, the App got a "signal 4", which is is an "illegal instruction". There shouldn't be anything in the App that a Core2 CPU can't handle. I'd suggest to download the archive again and check the md5 checksum before unpacking it again (overwriting the old files). You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.
The references to the other projects are helpful, thanks! Probably the Apps of Spinhenge like the one of Einstein are built with a 'bleeding edge' version of the BOINC library, while the other project use older versions.
I'm getting the same error about the 4.02 app not being found, but since some of my WUs get crunched successfully and the problems don't occur directly after that error message I don't think that, in my case, it is related to the signal 11.
I'm running BOINC 5.3.31 now, luckily most of my projects seem to have switched to fixed credit so it won't even hurt me in the way of getting credit. We'll see if the older client makes any difference.
I'm getting the same error about the 4.02 app not being found....
It's not really an error message - rather an information message.
You are crunching with the 4.21 app using the app_info.xml file created by Bernd and distributed as part of the 4.21 package. Bernd has no way of knowing what "brands" of tasks may be in your cache of work at the time you decide to switch to the 4.21 app. Depending on what previous app (beta or otherwise) you may have been running at the time you decided to switch to 4.21, it is possible that you could have tasks "branded" with any one of quite a few different earlier versions.
Most of the earlier versions (all except 4.02 if I recall correctly) have compatible checkpoint file formats so 4.21 can handle partially completed tasks from all earlier versions except 4.02. The app_info.xml file therefore has to list the 4.02 version as being "required" in case you just happened to have 4.02 branded work in your cache (extremely unlikely, surely - probably impossible these days).
So, if you don't have any 4.02 tasks in your cache, you won't ever have a problem but you will be informed that 4.02 is a specified app and you don't actually have it. That why I called it an information message rather than an error message. It would become an error message if the project suddenly stopped issuing the "current" work and reverted to issuing only work that was branded as requiring the 4.02 app - something that's just not going to happen :).
EDIT:
Everybody running the 4.21 app is going to see this same information message every time they start BOINC. The way to avoid the message (mentioned by Bernd if you read carefully) is to place a copy of the 4.02 .exe in the project folder with the 4.21 app. You can only do that if you kept a copy of that file as there's no visible link to old executables AFAIK.
I actually have a copy but I regard the message as so unimportant that I haven't bothered to install it to make the message go away :).
EDIT2:
1. You can get 4.02 manually - see next post
2. Removed reference to .pdb file - my brain was stuck in Windows mode for some reason :).
You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.
Whilst the app_info.xml mechanism is in place, the local client cannot get any app files - it's against the rules :). If you delete app_info.xml, you won't get 4.02 as the current default version is 4.20. The only way I could imagine the client getting 4.02 would be to do a dodgy edit to "rebrand" a task already in your cache and I'm certainly not suggesting that.
Of course manual downloading is possible if you know where to look, since there is no directly visible, easy to follow link on the website pointing you to the page where older versions of files reside. I didn't even know it was possible until just now when I decided to look up URLs for current files in client_state.xml and then go look to see if all the old stuff is there as well. I suppose I should have realised that it would all be there :).
Other than that the app is nice, got some sub 23,000 results when overclocked to 2.8GHz (Intel E2140 dual core 1.6GHz 1MB cache). The spread in computing times are not so big compared to what i seen on some AMD hosts in this thread, but that could still change, i dont have that many successful results yet.
It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
Here's the log from the daemon.
2008-01-09 05:17:38 [Hydrogen@Home] Scheduler RPC succeeded [server version 601]
2008-01-09 05:17:38 [Hydrogen@Home] Deferring communication for 2 hr 36 min 15 sec
2008-01-09 05:17:38 [Hydrogen@Home] Reason: no work from project
2008-01-09 05:20:30 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:20:30 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__174_S5R3a_1 (process got signal 11)
2008-01-09 05:20:30 [Einstein@Home] Computation for task h1_0724.60_S5R2__174_S5R3a_1 finished
2008-01-09 05:20:30 [Einstein@Home] Output file h1_0724.60_S5R2__174_S5R3a_1_0 for task h1_0724.60_S5R2__174_S5R3a_1 absent
2008-01-09 05:21:10 [PrimeGrid] Sending scheduler request: To fetch work
2008-01-09 05:21:10 [PrimeGrid] Requesting 864 seconds of new work
2008-01-09 05:21:10 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:21:10 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__173_S5R3a_0 (process got signal 11)
2008-01-09 05:21:10 [Einstein@Home] Computation for task h1_0724.60_S5R2__173_S5R3a_0 finished
2008-01-09 05:21:10 [Einstein@Home] Output file h1_0724.60_S5R2__173_S5R3a_0_0 for task h1_0724.60_S5R2__173_S5R3a_0 absent
2008-01-09 05:22:30 [---] Project communication failed: attempting access to reference site
2008-01-09 05:22:30 [PrimeGrid] Scheduler request failed: couldn't resolve host name
2008-01-09 05:22:30 [PrimeGrid] Deferring communication for 1 min 0 sec
2008-01-09 05:22:30 [PrimeGrid] Reason: scheduler request failed
2008-01-09 05:23:50 [---] Access to reference site failed - check network connection or proxy configuration.
It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
..... http://einsteinathome.org/host/1086583
The stock Windows app survived the loss of network.
You're definitely posting too early in the morning (or suffering from lack of sleep) - both of those are the same host ID!
But since you mention the stock Windows app - you must, by definition, be running a different build of BOINC too. Is it the app, or BOINC, that causes the unrecoverable error when the network goes down?
It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
..... http://einsteinathome.org/host/1086583
The stock Windows app survived the loss of network.
You're definitely posting too early in the morning (or suffering from lack of sleep) - both of those are the same host ID!
But since you mention the stock Windows app - you must, by definition, be running a different build of BOINC too. Is it the app, or BOINC, that causes the unrecoverable error when the network goes down?
A serious lack of sleep and a splitting headache from the kids screaming earlier today. I should go to bed soon.
They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.
As far as it being the app or the client, I'm not sure. I saw similar reports of work crashing upon network loss at ABC. But I'm not sure if they ended with signal 11s.
The Linux host is wired, so I can easily pull the network cable for testing purposes.
They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.
Does this run BOINC and the App as root? (try "ps -ef | grep eistein" or similar)?
My last WU's died ever with
)
My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent
I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.
RE: My last WU's died ever
)
Thanks for the report.
Ths "file ... 4.02 ... not found" could safely be ignored if the 4.21 App was working on your machine. However this doesn't seem to be the case, the App got a "signal 4", which is is an "illegal instruction". There shouldn't be anything in the App that a Core2 CPU can't handle. I'd suggest to download the archive again and check the md5 checksum before unpacking it again (overwriting the old files). You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.
The references to the other projects are helpful, thanks! Probably the Apps of Spinhenge like the one of Einstein are built with a 'bleeding edge' version of the BOINC library, while the other project use older versions.
BM
BM
I'm getting the same error
)
I'm getting the same error about the 4.02 app not being found, but since some of my WUs get crunched successfully and the problems don't occur directly after that error message I don't think that, in my case, it is related to the signal 11.
I'm running BOINC 5.3.31 now, luckily most of my projects seem to have switched to fixed credit so it won't even hurt me in the way of getting credit. We'll see if the older client makes any difference.
RE: I'm getting the same
)
It's not really an error message - rather an information message.
You are crunching with the 4.21 app using the app_info.xml file created by Bernd and distributed as part of the 4.21 package. Bernd has no way of knowing what "brands" of tasks may be in your cache of work at the time you decide to switch to the 4.21 app. Depending on what previous app (beta or otherwise) you may have been running at the time you decided to switch to 4.21, it is possible that you could have tasks "branded" with any one of quite a few different earlier versions.
Most of the earlier versions (all except 4.02 if I recall correctly) have compatible checkpoint file formats so 4.21 can handle partially completed tasks from all earlier versions except 4.02. The app_info.xml file therefore has to list the 4.02 version as being "required" in case you just happened to have 4.02 branded work in your cache (extremely unlikely, surely - probably impossible these days).
So, if you don't have any 4.02 tasks in your cache, you won't ever have a problem but you will be informed that 4.02 is a specified app and you don't actually have it. That why I called it an information message rather than an error message. It would become an error message if the project suddenly stopped issuing the "current" work and reverted to issuing only work that was branded as requiring the 4.02 app - something that's just not going to happen :).
EDIT:
Everybody running the 4.21 app is going to see this same information message every time they start BOINC. The way to avoid the message (mentioned by Bernd if you read carefully) is to place a copy of the 4.02 .exe in the project folder with the 4.21 app. You can only do that if you kept a copy of that file as there's no visible link to old executables AFAIK.
I actually have a copy but I regard the message as so unimportant that I haven't bothered to install it to make the message go away :).
EDIT2:
1. You can get 4.02 manually - see next post
2. Removed reference to .pdb file - my brain was stuck in Windows mode for some reason :).
Cheers,
Gary.
RE: You may also want to
)
Whilst the app_info.xml mechanism is in place, the local client cannot get any app files - it's against the rules :). If you delete app_info.xml, you won't get 4.02 as the current default version is 4.20. The only way I could imagine the client getting 4.02 would be to do a dodgy edit to "rebrand" a task already in your cache and I'm certainly not suggesting that.
Of course manual downloading is possible if you know where to look, since there is no directly visible, easy to follow link on the website pointing you to the page where older versions of files reside. I didn't even know it was possible until just now when I decided to look up URLs for current files in client_state.xml and then go look to see if all the old stuff is there as well. I suppose I should have realised that it would all be there :).
Cheers,
Gary.
Had network problems and got
)
Had network problems and got a load of Signal 11 with 4.21 on this host:
http://einsteinathome.org/host/1085263
Other than that the app is nice, got some sub 23,000 results when overclocked to 2.8GHz (Intel E2140 dual core 1.6GHz 1MB cache). The spread in computing times are not so big compared to what i seen on some AMD hosts in this thread, but that could still change, i dont have that many successful results yet.
Team Philippines
It definitely doesn't like
)
It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
Here's the log from the daemon.
2008-01-09 05:17:38 [Hydrogen@Home] Scheduler RPC succeeded [server version 601]
2008-01-09 05:17:38 [Hydrogen@Home] Deferring communication for 2 hr 36 min 15 sec
2008-01-09 05:17:38 [Hydrogen@Home] Reason: no work from project
2008-01-09 05:20:30 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:20:30 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__174_S5R3a_1 (process got signal 11)
2008-01-09 05:20:30 [Einstein@Home] Computation for task h1_0724.60_S5R2__174_S5R3a_1 finished
2008-01-09 05:20:30 [Einstein@Home] Output file h1_0724.60_S5R2__174_S5R3a_1_0 for task h1_0724.60_S5R2__174_S5R3a_1 absent
2008-01-09 05:21:10 [PrimeGrid] Sending scheduler request: To fetch work
2008-01-09 05:21:10 [PrimeGrid] Requesting 864 seconds of new work
2008-01-09 05:21:10 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:21:10 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__173_S5R3a_0 (process got signal 11)
2008-01-09 05:21:10 [Einstein@Home] Computation for task h1_0724.60_S5R2__173_S5R3a_0 finished
2008-01-09 05:21:10 [Einstein@Home] Output file h1_0724.60_S5R2__173_S5R3a_0_0 for task h1_0724.60_S5R2__173_S5R3a_0 absent
2008-01-09 05:22:30 [---] Project communication failed: attempting access to reference site
2008-01-09 05:22:30 [PrimeGrid] Scheduler request failed: couldn't resolve host name
2008-01-09 05:22:30 [PrimeGrid] Deferring communication for 1 min 0 sec
2008-01-09 05:22:30 [PrimeGrid] Reason: scheduler request failed
2008-01-09 05:23:50 [---] Access to reference site failed - check network connection or proxy configuration.
http://einsteinathome.org/task/90742227
http://einsteinathome.org/task/90727597
http://einsteinathome.org/host/1086583
The stock Windows app survived the loss of network.
http://einsteinathome.org/host/1086583
Kathryn :o)
Einstein@Home Moderator
RE: It definitely doesn't
)
Kathryn,
You're definitely posting too early in the morning (or suffering from lack of sleep) - both of those are the same host ID!
But since you mention the stock Windows app - you must, by definition, be running a different build of BOINC too. Is it the app, or BOINC, that causes the unrecoverable error when the network goes down?
RE: RE: It definitely
)
A serious lack of sleep and a splitting headache from the kids screaming earlier today. I should go to bed soon.
http://einsteinathome.org/host/1086585 is the Windows host and http://einsteinathome.org/host/1086583 is the Linux host.
They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.
As far as it being the app or the client, I'm not sure. I saw similar reports of work crashing upon network loss at ABC. But I'm not sure if they ended with signal 11s.
The Linux host is wired, so I can easily pull the network cable for testing purposes.
Kathryn :o)
Einstein@Home Moderator
RE: They are running
)
Does this run BOINC and the App as root? (try "ps -ef | grep eistein" or similar)?
BM
BM