fixed an error on checking the thread block sizes. Now we check against correct maxima (AFAIK :-) and we check against the maximum number of threads and we ensure that each grid/ block dimension is bigger than 0!
For convenience, I also added a new trace option : -trace g It reports about GPU related issues; currently, it only reports kernel invocations and their configurations