Validation Error

Juergen Kozok

Joined: 21 Feb 05

Posts: 7

Credit: 3075642

RAC: 0

23 Feb 2017 11:47:21 UTC

Topic 205679

(moderation:

)

I am facing now since a few weeks the above error. once in a while a calculation get through without this. But the others after a lot of CPU time when it seams to be done is reporting this. I have stopped execution as this is waist of time. Any idea whats Happening and how to correct this?

mac

Joined: 12 Aug 06

Posts: 6

Credit: 1431583

RAC: 0

I have seen some reports of

23 Feb 2017 14:12:41 UTC

Message 155806

(moderation:

)

I have seen some reports of many fake GT610 cards flying around.

run gpuz and verify all parameters with this table https://en.wikipedia.org/wiki/GeForce_600_series

https://forums.geforce.com/default/topic/975537/?comment=5014554

https://forums.geforce.com/default/topic/892374/geforce-drivers/solved-by-sora-amp-mrinfinit3-thanks-a-lot-for-them-/1/

Juergen Kozok

Joined: 21 Feb 05

Posts: 7

Credit: 3075642

RAC: 0

according to this program the

23 Feb 2017 17:08:24 UTC

Message 155809

(moderation:

)

according to this program the Card Looks like a regular low end GT610, any reason that this is not working? it certainly work with other Projects, eg Seti.

Graphics Processor

GPU Name:	GF119
GPU Variant:	GF119-300-A1
Architecture:	Fermi
Process Size:	40 nm
Transistors:	292 million
Die Size:	79 mm²

Graphics Card

Released:	~~Apr 2nd, 2012~~ May 14th, 2012
Production Status:	Active
Bus Interface:	PCIe 2.0 x16
MSI Part #:	N610GT-MD2GD3/LP

Clock Speeds

GPU Clock:	810 MHz
Shader Clock:	1620 MHz
Memory Clock:	~~898 MHz~~ 500 MHz (-44%) ~~1796 MHz effective~~ 1000 MHz effective

Memory

Memory Size:	~~1024 MB~~ 2048 MB
Memory Type:	DDR3
Memory Bus:	64 bit
Bandwidth:	~~14.37 GB/s~~ 8.00 GB/s

Render Config

Shading Units:	48
TMUs:	8
ROPs:	4
SM Count:	1
Pixel Rate:	1.620 GPixel/s
Texture Rate:	6.48 GTexel/s
Floating-point performance:	155.52 GFLOPS

Board Design

Slot Width:	Single-slot
Length:	~~5.7 inches~~ 5.67 inches ~~145 mm~~ 144 mm
TDP:	29 W
Outputs:	1x DVI 1x HDMI 1x VGA
Power Connectors:	None
Board Number:	P1310

Graphics Features

DirectX:	11.0
OpenGL:	4.5
OpenCL:	1.1
CUDA:	2.1
Shader Model:	5.0

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Validate errors are almost

23 Feb 2017 17:21:41 UTC

Message 155810

(moderation:

)

Validate errors are almost always caused by hardware running at the edge or above the edge for what it's capable of.

Start by checking the temperatures of the card and CPU, clean heat sinks if necessary.
If the card is over clocked then reset to stock clocks or under clock the card.
If any other part of the computer is over clocked then do the same, reset to stock or down clock under stock specs.
Check the condition of the power supply, good stable power is important for stable operations.

mac

Joined: 12 Aug 06

Posts: 6

Credit: 1431583

RAC: 0

it can be driver installation

23 Feb 2017 18:16:17 UTC

Message 155811

(moderation:

)

it can be driver installation problem also

run DDU in safe mode, clean nvidia and install latest driver again

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119775823250

RAC: 25703511

Juergen Kozok wrote:I am

24 Feb 2017 0:59:00 UTC

Message 155819

(moderation:

)

Juergen Kozok wrote:

I am facing now since a few weeks the above error.

I looked at your current tasks list and they don't show as validate errors or invalid results but rather as computation errors. It struck me as unusual that all the ones still showing in the database failed at pretty much the same elapsed time. So I decided to click a task ID link (I chose the oldest one showing) to see what was actually returned to the project. Here is a snippet of the very last bit of what was returned (with a few tweaks for readability) with the point of error highlighted.

===== Start of log excerpt =====

% Binary point 1255/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
% C 1 0
% Time spent on semicoherent stage: 39252.7849s
% Writing semicoherent output file.
% Following up candidate number: 1
% Refining in S
% Following-up in P
ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:1048: clFinish failed. status=-36
20:31:47 (10956): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags: PRECISION
20:31:53 (10956): [normal]: done. calling boinc_finish(28).
20:31:53 (10956): called boinc_finish

</stderr_txt>

===== End of log excerpt =====

The GPU has processed the very last 'binary point' (1255 out of 1255) and so the 'follow-up' stage has started. This is where the top 10 most likely candidates will be examined in detail. You can see there is an immediate error reported as status=-36. One of the Devs will have to comment on what that means and what (if anything) can be done about it.

I decided to look at a second task ID (I chose the 2nd oldest) and this time I saw something a bit different.

===== Start of log excerpt =====

% Binary point 1255/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
% Time spent on semicoherent stage: 1372.6947s
% Writing semicoherent output file.
% Following up candidate number: 1
% Refining in S
% Following-up in P
% C 2 1256
% Following up candidate number: 2
% Refining in S
% Following-up in P
% C 3 1257
% Following up candidate number: 3
% Refining in S
% Following-up in P
09:40:40 (2192): [normal]: This Einstein@home App was built at: Feb 15 2017 09:23:49

09:40:40 (2192): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe'.
09:40:40 (2192): [debug]: 1.1e+016 fp, 3.7e+009 fp/s, 2805161 s, 779h12m41s38
09:40:40 (2192): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe
--inputfile ../../projects/einstein.phys.uwm.edu/LATeah0012L.dat --alpha 4.42281478648 --delta -0.0345027837249
--skyRadius 2.152570e-06 --ldiBins 15 --f0start 1116.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 3.344368011e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah0012L_1124_22467010.dat --debug 1 --device 0 -o LATeah0012L_1124.0_0_0.0_22467010_1_0.out
output files: 'LATeah0012L_1124.0_0_0.0_22467010_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah0012L_1124.0_0_0.0_22467010_1_0' 'LATeah0012L_1124.0_0_0.0_22467010_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0012L_1124.0_0_0.0_22467010_1_1'
09:40:40 (2192): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
09:40:40 (2192): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000000147900 , 00000000001473B0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "GeForce GT 610" by: NVIDIA Corporation
Max allocation limit: 536870912
Global mem size: 2147483648
OpenCL device has FP64 support
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0012L.dat
% Total amount of photon times: 30007
% Preparing toplist of length: 10
% Read 1255 binary points
% checkpoint read: skypoint 3 binarypoint 1257
% fft_size: 16777216 (0x1000000); alloc: 67108872
% Time spent on semicoherent stage: 0.0000s
% Writing semicoherent output file.

% Following up candidate number: 3
% Refining in S
% Following-up in P
Error during OpenCL host->device transfer read coh_followup_list(error: -5)
09:40:54 (2192): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: PRECISION
Error in OpenCL context: CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on GeForce GT 610 (Device 0).

09:40:59 (2192): [normal]: done. calling boinc_finish(65).
09:40:59 (2192): called boinc_finish

</stderr_txt>

===== End of log excerpt =====

In this second example, the followup processing had actually started and the 3rd candidate was being examined when BOINC appears to have been restarted (for whatever reason). I've highlighted the line that indicates the restart. During the restart messages, there is a line (I've highlighted it also) that says the card is double precision capable. I was surprised to see that so I had a look at what was listed in Wikipedia. If you scroll down to the GT 610, you will see it listed as 'Unknown' for DP capability.

I suspect that perhaps it doesn't have DP and that may be why it crashes if the follow-up stage is attempted on the GPU. The Devs will really have to sort this one out, particularly as the processing of the candidates had started successfully before the restart and then immediately failed after the restart. Something is a bit weird with that. Also the error code is quite different compared to the first example.

Cheers,
Gary.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

The oddly described "Printer

24 Feb 2017 8:16:52 UTC

Message 155834

(moderation:

)

The oddly described "Printer out of paper error" =-36 error, seen a couple of folks with these now.

Validation Error

Forums › Cruncher's Corner

I have seen some reports of

according to this program the

Graphics Processor

Graphics Card

Clock Speeds

Memory

Render Config

Board Design

Graphics Features

Validate errors are almost

it can be driver installation

Juergen Kozok wrote:I am

The oddly described "Printer

Comment viewing options

Forums › Cruncher's Corner