I stand corrected. Somehow, I am not surprised that DP performance was drastically cut. After all, the big guys can't sell DP card for top dollar if lower end cards have similar performance.
It's more an architectural choice and trade off. Double precision and compute in general requires a lot of power and large registers to maximize the number of threads the GPU can handle in flight.
Maxwell was designed as a mobile-centric architecture - high performance per watt for the consumer gaming industry. Compute is at odds with this (as AMD has found out as well of late), so the decision was taken to strip out the double precision to maximize gaming potential.
When Maxwell was being designed, Nvidia took the decision to rework the GK110 (which is much better suited for compute) into the
GK 210. This GPU will soldier on for GPGPU duties until Pascal arrives. The GK 210 is intended solely for professional (Tesla) work.
The Titan X isn't a compute card, but those benchmarks are OpenCL based (so a comparison can be made with AMD cards), whereas CUDA coded apps are as a general rule more mature, and much more effective. Compile a list of OpenCL and CUDA content creation apps and see which OCL surpass those of CUDA - it will be a short list indeed.
Few sites bench OpenCL against CUDA on the same app, rather they prefer to "normalize" the result by choosing OpenCL only - basically the same scenario as games being tested in DX11 mode only rather than mantle for AMD and DX11 for Nvidia, even though performance is neither optimal for AMD, nor likely to be employed by the user.
Case in point:
So, you were saying? You can continue with the guerrilla marketing, and I could continue to show OpenCL vs CUDA - or even worse, HPC workloads, but in the end, this isn't a thread about Titan X's shortcomings, or compute tasks (most of which you've supplied are synthetics), it is about the GTX 980 Ti.