S5R3 4.49 client : 18321.86, 17474.21, and 17632.52 seconds for crunching units.
S5R3 4.38 client : 19565 to 22397 seconds for crunching units.
S5R3 4.49 sse2 experimental client: my first unit takes 15755.53 seconds to be finished ( http://einsteinathome.org/task/97607592 )
Please be aware of the importance of the cyclic nature of crunch times. You need to take into account the seq#s of each task.
The three 4.49 SSE results you quote come from a slower part of the cycle whilst the single SSE2 result is from a faster part of the cycle and this would account for some of the drop in time from around 17.5Ksecs to 15.7Ksecs. Your next result is from an even faster part of the cycle and so you see a further drop in time to around 14.3Ksecs. The next four results after that should be even slightly faster again - maybe some will even drop below 14Ksecs if you are lucky :-).
I think I stand corrected, against my expectations the original SSE2-only version seems to be considerably faster than the SSE version! Not sure why exactly, tho.
I'm not sure about that as I tried both the SSE and the SSE2 versions on two AMD64s of a similar vintage (both SSE2 capable). They've done a few tasks each now and my initial impression is that the SSE2 version may be marginally faster but there's not much in it. I can't really be sure yet as the host running the SSE2 version didn't really have a good set of seq#s so I'll continue and hope to get a better comparison soon.
I've tried the SSE version on quite a number of architectures now (Athlon XP, AMD64, Coppermine PIII, Tualatin PIII) and (working near the trough of a cycle) there seems to be a reasonably constant speedup for all of them. I've reduced my "figure of merit" to a number which represents the new time as a fraction of the old time. On a range of hosts I've looked at so far, that number seems to be around 0.84 +/- 0.02. Please be aware that these were done quickly by choosing suitable hosts that had a series of tasks in the cache that had consecutive seq#s in the trough area of the cycle and thus do not represent what might be happening near a peak. Also, the "saw-tooth" variations close to a trough may be affecting the issue as well, so you can't put a high degree of reliance on this. I just wanted a ball park number and I'm happy with 0.84 for that purpose.
I've done some experiments on a Core 2 Duo which seem to indicate that the SSE2 version is even faster. I'll have to do some more profiling and tests, tho. I think Bernd mentioned before that it might be worthwhile to have a "reference" stand-alone workunit that could be used to benchmark the app, and my tests go alaong this path.
I've done some experiments on a Core 2 Duo which seem to indicate that the SSE2 version is even faster. I'll have to do some more profiling and tests, tho. I think Bernd mentioned before that it might be worthwhile to have a "reference" stand-alone workunit that could be used to benchmark the app, and my tests go alaong this path.
Hmmmm .... I remember that suggestion. What type(s) of platform(s) would be useful to devote to such 'reference' testing?
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Both of my quads are running the SSE2 version (the AMD 9600 Black @ 2.616 and Intel Q6600 @ 2.9)if someone wants to snag the data from them, or if Bernd just wants an easier way to know how many are testing.
RE: Rename it without the
)
Don't forget the "_1" at the end.
greetings from Ruthe/Germany
S5R3 4.49 client : 18321.86,
)
S5R3 4.49 client : 18321.86, 17474.21, and 17632.52 seconds for crunching units.
S5R3 4.38 client : 19565 to 22397 seconds for crunching units.
S5R3 4.49 sse2 experimental client: my first unit takes 15755.53 seconds to be finished ( http://einsteinathome.org/task/97607592 )
Impressive performance.
Continue.
RE: Impressive
)
Please be aware of the importance of the cyclic nature of crunch times. You need to take into account the seq#s of each task.
The three 4.49 SSE results you quote come from a slower part of the cycle whilst the single SSE2 result is from a faster part of the cycle and this would account for some of the drop in time from around 17.5Ksecs to 15.7Ksecs. Your next result is from an even faster part of the cycle and so you see a further drop in time to around 14.3Ksecs. The next four results after that should be even slightly faster again - maybe some will even drop below 14Ksecs if you are lucky :-).
Cheers,
Gary.
Hi! I think I stand
)
Hi!
I think I stand corrected, against my expectations the original SSE2-only version seems to be considerably faster than the SSE version! Not sure why exactly, tho.
CU
Bikeman
RE: I think I stand
)
I'm not sure about that as I tried both the SSE and the SSE2 versions on two AMD64s of a similar vintage (both SSE2 capable). They've done a few tasks each now and my initial impression is that the SSE2 version may be marginally faster but there's not much in it. I can't really be sure yet as the host running the SSE2 version didn't really have a good set of seq#s so I'll continue and hope to get a better comparison soon.
I've tried the SSE version on quite a number of architectures now (Athlon XP, AMD64, Coppermine PIII, Tualatin PIII) and (working near the trough of a cycle) there seems to be a reasonably constant speedup for all of them. I've reduced my "figure of merit" to a number which represents the new time as a fraction of the old time. On a range of hosts I've looked at so far, that number seems to be around 0.84 +/- 0.02. Please be aware that these were done quickly by choosing suitable hosts that had a series of tasks in the cache that had consecutive seq#s in the trough area of the cycle and thus do not represent what might be happening near a peak. Also, the "saw-tooth" variations close to a trough may be affecting the issue as well, so you can't put a high degree of reliance on this. I just wanted a ball park number and I'm happy with 0.84 for that purpose.
Cheers,
Gary.
I've done some experiments on
)
I've done some experiments on a Core 2 Duo which seem to indicate that the SSE2 version is even faster. I'll have to do some more profiling and tests, tho. I think Bernd mentioned before that it might be worthwhile to have a "reference" stand-alone workunit that could be used to benchmark the app, and my tests go alaong this path.
CU
Bikeman
RE: I've done some
)
Hmmmm .... I remember that suggestion. What type(s) of platform(s) would be useful to devote to such 'reference' testing?
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I just installed the beta app
)
I just installed the beta app (not the SSE2 version yet) on my Fedora 7 -32 bit host.
The result that was in progress picked up fine and is crunching away.
The one that just downloaded and that is branded with 4.49 has an estimated time to completion of 40:26:20. It should take about 9.
DCF (as grabbed from client_state.xml) is 0.922172
Oh wise people, what did I do wrong?
Kathryn :o)
Einstein@Home Moderator
RE: I just installed the
)
Switching Apps in the middle of a task usually confuses the run-time estimation, but I don't precisely know why.
BM
BM
Both of my quads are running
)
Both of my quads are running the SSE2 version (the AMD 9600 Black @ 2.616 and Intel Q6600 @ 2.9)if someone wants to snag the data from them, or if Bernd just wants an easier way to know how many are testing.