Summary: | Just some notes towards convincing gcc 4.0 to emit sse instructions | ||
---|---|---|---|
Product: | LinuxSampler | Reporter: | Mike Taht <mike.taht> |
Component: | other | Assignee: | Christian Schoenebeck <cuse> |
Status: | CLOSED FIXED | ||
Severity: | enhancement | CC: | hangup |
Priority: | P5 | ||
Version: | SVN Trunk | ||
Hardware: | PC | ||
OS: | Linux |
Description
Mike Taht
2005-09-22 02:33:59 CEST
I'm still looking for the best/std way to declare vector specific variables that you can manipulate other than this union thing. The total coverage of how to use them is in: http://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/Vector-Extensions.html#Vector-Extensions Which doesn't go into that. The code generated IS 16 byte aligned in the test program, but perhaps it's better to make that also an __attribute__. Vladimir and I also played around with GCC's vector extensions before we decided to go with hand crafted assembly code. The problem was that the vector extension implementation was still quite incomplete (we used gcc 3.4 at that point I think). Some important operations like accessing a single cell of a vector were missing (as already pointed out by you). Of course for simple algorithms like inerpolation / resampling this is not a problem, but for feedback control systems like the filter we are using in the gig Engine, where every calculated sample point depends on the result of the previous one, accessing single vector cells is mandatory. Also IIRC g++ 3.4 did not support vector extensions at all, only gcc 3.4 (that is the C compiler part). This seems to have changed with gcc/g++ 4.0 fortunately. Another problem (as already pointed out by Vladimir on the list) was that by ABI definition all floating point arguments of a function / method are transferred via the 387 FPU stack on x86 machines. For float->int conversions though we need to use MMX instructions, and you cannot mix 387 FPU and MMX instructions without exiting the MMX mode (by using the EMMS instruction) which takes a looooot CPU cycles. But this problem could be solved with current CVS version of LS, since the main loop is now placed in just one method ATM (at least if the Filter::apply() method is compiled as an inliner). But of course the vector extensions will be the way to go in future. Because the hand crafted assembly is a lot work to maintain and causes other problems like register shortage on -O1 optimization level for example. So IMO the main question currently is when the following operations are imlemented in gcc: * accessing single cells of a vector * rotating the cells of a vector / bitshifting Maybe the last one is already present with gcc 4.0, not tested yet. For the first one we might ask on the GCC list? Ah... and of course you are right about the "aligned" GCC extension attribute, this is mandatory to let GCC know that it's operating on SSE safe data, since most SSE instructions require 16 byte aligned memory addresses. GCC would of course not be able to figure that out if you are using memalign() for example and most probably use scalar (maybe even 387 FPU) instructions then instead. So anyway without that attribute it would be slower than it could be. Closing this report now. GCC vector extensions have been added to audio mix down functions of AudioChannel.cpp, for interpolation and other more complex tasks the current GCC vector extensions seem to be not sufficient enough yet. Feel free to reopen this report, in case vector extension support improved in GCC. |