1. Is it possible, that the first version(with some SSE2 instructions) might be faster than the new one?
Yes, it is possible, but I simply don't know.
Hey Bernd,
How about you make that SSE2 version available as a "power user" app so that those of us who weren't quick enough of the mark can at least test it a little?
That'll save us having to bribe Michael or th3 who are the two who have so far admitted to having it :-).
I would be surprised to see a significant (if at all measurable) speedup for the initial app version that contains SSE2 instructions.
The app code consists of parts are really important to performance, and those have now been converted to hand-optimized assembly code (SSE).
The rest of the code is in C but not that crucial for performance. Only in those parts of the code there will be a difference in the two app versions, mostly by scalar double precision code being compiled to x87 or SSE2 instructions, respectively. To make optimal use of SSE2, one would have to generate SSE2 versions of the handcoded sections.
So, Iwould not hold my breath wrt. the SSE2 app variant.
CU
Bikeman
In Bernd's initial posting I do not read anything about hand-coded SSE instructions, but about using a compiler switch.
I do not doubt what you are writing, but this app clearly is at least 10% faster, so there might be a chance that the SSE2 version is even a little faster.
Anyway, it will be fun to prove you are right. ;-)
Or in other words: Let's see if practice can prove theory. :-)
cu,
Michael
The overall speedup in the App compared to 4.38 mainly comes from prefetch compiler intrinsics placed in the Hough code that require these switches. Another bit of speedup arises from changes in the Assembler-coded "Kernel loop" (the "interleaving" of SSE and FPU commands we have in the 4.42 MacOS Intel App), but I think that this effect will be larger on modern Intel CPUs (Core2) than on AMDs.
My guess is that the SSE2 App will be slightly faster if you would measure it against the SSE one, but you'll only notice the difference if you'd run the same workunit side by side e.g. on a dual core machine. It shouldn't be worth another case distinction in a "switching App".
I added Donald A. Tevault's X2 6000 to the hosts that I feed into my DB, as I have an X2 6000 too, but which is running the SSE2 version. In a few days there will be results to compare.
Comparison can probably not be done on a single or some few results, because the speedup, if there is any, might only show up at some WUs, depending on their position close to a trough or to the peak.
One other point I'd like to mention. The instructions on the beta test page talk about adding the "five files" from the package ... Now I imagine the first version of the archive probably did contain five files but the current "fixed" version now contains seven files. Are these seven files all necessary? If so, perhaps you should fix the instructions to reflect the correct number.
I've installed the SSE version of 4.49 on a machine with an Athlon XP processor (SSE capability only). I chose this machine to test with because it had a cache of tasks with virtually consecutive seq#s which were in the trough section of the cycle. This means that there isn't a large variation in crunch times (on average) when a number of tasks is examined. The machine just happened to be finishing a task when I installed the new app so around 98% of the first task after changeover was crunched with the new app. There have been two tasks now crunched with the new app and there is a considerable speedup as shown in the list below:-
My latest WU was crunched in 28k s using 4.49 by my AMD Opteron 1210. The same WU was crunched in 38k s by a wingman using 4.26 on a Core 2 Duo chip running Windows. In theory the Core should be superior. How come?
Tullio
My latest WU was crunched in 28k s using 4.49 by my AMD Opteron 1210. The same WU was crunched in 38k s by a wingman using 4.26 on a Core 2 Duo chip running Windows. In theory the Core should be superior. How come?
Tullio
The Windows 4.26 app is an older app that isn't optimized for SSE. So, even with a better processor, he's running with a huge disadvantage.
1. Is it possible, that the first version(with some SSE2 instructions) might be faster than the new one?
Yes, it is possible, but I simply don't know.
Hey Bernd,
How about you make that SSE2 version available as a "power user" app so that those of us who weren't quick enough of the mark can at least test it a little?
That'll save us having to bribe Michael or th3 who are the two who have so far admitted to having it :-).
I tried this to no avail. I added ".old" to the "einstein_S5R3_4.49_i686-pc-linux-gnu_1" executable file to put it out of use and inserted the new "einstein_S5R3_4.49_1_i686-pc-linux-gnu" in the same folder. After restarting BOINC, it continues to try to download the original file. I'm making the assumption that some changes need to be implemented in the app_info.xml file. Will replacing references to the old app with the new one be the solution?
I tried this to no avail. I added ".old" to the "einstein_S5R3_4.49_i686-pc-linux-gnu_1" executable file to put it out of use and inserted the new "einstein_S5R3_4.49_1_i686-pc-linux-gnu" in the same folder. After restarting BOINC, it continues to try to download the original file. I'm making the assumption that some changes need to be implemented in the app_info.xml file. Will replacing references to the old app with the new one be the solution?
Looks like the file in the gz has wrong name. Try to rename the extracted file to "einstein_S5R3_4.49_i686-pc-linux-gnu_1"
I tried this to no avail. I added ".old" to the "einstein_S5R3_4.49_i686-pc-linux-gnu_1" executable file to put it out of use and inserted the new "einstein_S5R3_4.49_1_i686-pc-linux-gnu" in the same folder. After restarting BOINC, it continues to try to download the original file ....
Please slow down and tell us exactly what you did do. You describe it trying to download the original executable but that isn't possible when using an app_info.xml file. No executables will be downloaded - you have to provide everything yourself. Also the "_1" you have shown in the name of the file can't be correct as the file in the SSE archive has the "_1" at the end.
The following is what you should have done if you had been previously successfully running the SSE version of the 4.49 app :-
* Stop BOINC.
* Delete the SSE version of the file einstein_S5R3_4.49_i686-pc-linux-gnu_1 in your Einstein project directory.
* Replace it with the new SSE2 version from Bernd's latest archive, making sure it has exactly the same name as the one you have just deleted.
* Restart BOINC.
If you are trying to run the SSE2 version without having run the SSE version, this is the procedure :-
* Stop BOINC
* Install the full SSE 4.49 package as usual
* Before restarting BOINC, delete one file (einstein_S5R3_4.49_i686-pc-linux-gnu_1) and replace it with the SSE2 version as above (again making sure the name is correct)
* Restart BOINC
If you didn't do one of these two procedures, can you tell us precisely what you did do?
I haven't yet downloaded the SSE2 app that Bernd has provided but if the archive contains a file with the name as you have written it, then the filename is wrong and should be fixed to agree with the name in the SSE archive. You can always cross check the name as it is given in app_info.xml. That is the way that the executable has got to be named.
RE: RE: RE: 1. Is it
)
For now I put a copy of the SSE2 App executable in http://einstein.phys.uwm.edu/power_apps/einstein_S5R3_4.49_1_i686-pc-linux-gnu.gz. This isn't a full-featured App package, you'll have to replace the file "einstein_S5R3_4.49_i686-pc-linux-gnu_1" in the 4.49 Beta Test App package with that (expanded) file.
BM
BM
RE: RE: Hi all! I would
)
The overall speedup in the App compared to 4.38 mainly comes from prefetch compiler intrinsics placed in the Hough code that require these switches. Another bit of speedup arises from changes in the Assembler-coded "Kernel loop" (the "interleaving" of SSE and FPU commands we have in the 4.42 MacOS Intel App), but I think that this effect will be larger on modern Intel CPUs (Core2) than on AMDs.
My guess is that the SSE2 App will be slightly faster if you would measure it against the SSE one, but you'll only notice the difference if you'd run the same workunit side by side e.g. on a dual core machine. It shouldn't be worth another case distinction in a "switching App".
BM
BM
RE: I added Donald A.
)
Cool! It'll be interesting to see the results.
RE: For now I put a copy of
)
Thanks for doing that.
One other point I'd like to mention. The instructions on the beta test page talk about adding the "five files" from the package ... Now I imagine the first version of the archive probably did contain five files but the current "fixed" version now contains seven files. Are these seven files all necessary? If so, perhaps you should fix the instructions to reflect the correct number.
Cheers,
Gary.
I've installed the SSE
)
I've installed the SSE version of 4.49 on a machine with an Athlon XP processor (SSE capability only). I chose this machine to test with because it had a cache of tasks with virtually consecutive seq#s which were in the trough section of the cycle. This means that there isn't a large variation in crunch times (on average) when a number of tasks is examined. The machine just happened to be finishing a task when I installed the new app so around 98% of the first task after changeover was crunched with the new app. There have been two tasks now crunched with the new app and there is a considerable speedup as shown in the list below:-
h1_1033.40 _346 4.49 27,748.25
h1_1033.40 _345 4.49 27,964.88
As you can see, this is a substantial increase in speed. All of the above four are waiting on wingmen so I can't comment on validation.
Great stuff, Bernd.
Cheers,
Gary.
My latest WU was crunched in
)
My latest WU was crunched in 28k s using 4.49 by my AMD Opteron 1210. The same WU was crunched in 38k s by a wingman using 4.26 on a Core 2 Duo chip running Windows. In theory the Core should be superior. How come?
Tullio
RE: My latest WU was
)
The Windows 4.26 app is an older app that isn't optimized for SSE. So, even with a better processor, he's running with a huge disadvantage.
RE: RE: RE: RE: 1. Is
)
I tried this to no avail. I added ".old" to the "einstein_S5R3_4.49_i686-pc-linux-gnu_1" executable file to put it out of use and inserted the new "einstein_S5R3_4.49_1_i686-pc-linux-gnu" in the same folder. After restarting BOINC, it continues to try to download the original file. I'm making the assumption that some changes need to be implemented in the app_info.xml file. Will replacing references to the old app with the new one be the solution?
RE: RE: For now I put a
)
Looks like the file in the gz has wrong name. Try to rename the extracted file to "einstein_S5R3_4.49_i686-pc-linux-gnu_1"
Team Philippines
RE: I tried this to no
)
Please slow down and tell us exactly what you did do. You describe it trying to download the original executable but that isn't possible when using an app_info.xml file. No executables will be downloaded - you have to provide everything yourself. Also the "_1" you have shown in the name of the file can't be correct as the file in the SSE archive has the "_1" at the end.
The following is what you should have done if you had been previously successfully running the SSE version of the 4.49 app :-
* Delete the SSE version of the file einstein_S5R3_4.49_i686-pc-linux-gnu_1 in your Einstein project directory.
* Replace it with the new SSE2 version from Bernd's latest archive, making sure it has exactly the same name as the one you have just deleted.
* Restart BOINC.
If you are trying to run the SSE2 version without having run the SSE version, this is the procedure :-
* Install the full SSE 4.49 package as usual
* Before restarting BOINC, delete one file (einstein_S5R3_4.49_i686-pc-linux-gnu_1) and replace it with the SSE2 version as above (again making sure the name is correct)
* Restart BOINC
If you didn't do one of these two procedures, can you tell us precisely what you did do?
I haven't yet downloaded the SSE2 app that Bernd has provided but if the archive contains a file with the name as you have written it, then the filename is wrong and should be fixed to agree with the name in the SSE archive. You can always cross check the name as it is given in app_info.xml. That is the way that the executable has got to be named.
Cheers,
Gary.