Reference documentation for deal.II version 9.1.0-pre
Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | Friends | List of all members
VectorizedArray< float > Class Template Reference

#include <deal.II/base/vectorization.h>

Public Member Functions

VectorizedArrayoperator= (const float x)
 
float & operator[] (const unsigned int comp)
 
const float & operator[] (const unsigned int comp) const
 
VectorizedArrayoperator+= (const VectorizedArray &vec)
 
VectorizedArrayoperator-= (const VectorizedArray &vec)
 
VectorizedArrayoperator*= (const VectorizedArray &vec)
 
VectorizedArrayoperator/= (const VectorizedArray &vec)
 
void load (const float *ptr)
 
void store (float *ptr) const
 
void streaming_store (float *ptr) const
 
void gather (const float *base_ptr, const unsigned int *offsets)
 
void scatter (const unsigned int *offsets, float *base_ptr) const
 

Public Attributes

__m128 data
 

Static Public Attributes

static const unsigned int n_array_elements = 4
 

Private Member Functions

VectorizedArray get_sqrt () const
 
VectorizedArray get_abs () const
 
VectorizedArray get_max (const VectorizedArray &other) const
 
VectorizedArray get_min (const VectorizedArray &other) const
 

Friends

template<typename Number2 >
VectorizedArray< Number2 > std::sqrt (const VectorizedArray< Number2 > &)
 

Detailed Description

template<>
class VectorizedArray< float >

Specialization for float and SSE2.

Definition at line 2529 of file vectorization.h.

Member Function Documentation

VectorizedArray& VectorizedArray< float >::operator= ( const float  x)
inline

This function can be used to set all data fields to a given scalar.

Definition at line 2543 of file vectorization.h.

float& VectorizedArray< float >::operator[] ( const unsigned int  comp)
inline

Access operator.

Definition at line 2553 of file vectorization.h.

const float& VectorizedArray< float >::operator[] ( const unsigned int  comp) const
inline

Constant access operator.

Definition at line 2563 of file vectorization.h.

VectorizedArray& VectorizedArray< float >::operator+= ( const VectorizedArray< float > &  vec)
inline

Addition.

Definition at line 2574 of file vectorization.h.

VectorizedArray& VectorizedArray< float >::operator-= ( const VectorizedArray< float > &  vec)
inline

Subtraction.

Definition at line 2589 of file vectorization.h.

VectorizedArray& VectorizedArray< float >::operator*= ( const VectorizedArray< float > &  vec)
inline

Multiplication.

Definition at line 2604 of file vectorization.h.

VectorizedArray& VectorizedArray< float >::operator/= ( const VectorizedArray< float > &  vec)
inline

Division.

Definition at line 2619 of file vectorization.h.

void VectorizedArray< float >::load ( const float *  ptr)
inline

Load n_array_elements from memory into the calling class, starting at the given address. The memory need not be aligned by 16 bytes, as opposed to casting a float address to VectorizedArray<float>*.

Definition at line 2636 of file vectorization.h.

void VectorizedArray< float >::store ( float *  ptr) const
inline

Write the content of the calling class into memory in form of n_array_elements to the given address. The memory need not be aligned by 16 bytes, as opposed to casting a float address to VectorizedArray<float>*.

Definition at line 2649 of file vectorization.h.

void VectorizedArray< float >::streaming_store ( float *  ptr) const
inline

Write the content of the calling class into memory in form of n_array_elements to the given address using non-temporal stores that bypass the processor's caches, using _mm_stream_pd store intrinsics on supported CPUs. The destination of the store ptr must be aligned by the amount of bytes in the vectorized array.

This store operation can be faster than usual store operations in case the store is streaming because it avoids the read-for-ownership transfer typically invoked in standard stores. This approximately works as follows (see the literature on computer architecture for details): When an algorithm stores some results to a memory address, a processor typically wants to move it into some of its caches as it expects the data to be re-used again at some point. Since caches are organized in lines of sizes either 64 byte or 128 byte but writes are usually smaller, a processor must first load in the destination cache line upon a write because only part of the cache line is overwritten initially. If a series of stores write data in a chunk bigger than any of its caches could handle, the data finally has to be moved out from the caches to main memory. But since all addressed have first been read, this doubles the load on main memory, which can incur a performance penalty. Furthermore, the organization of caches in a multicore context also requires reading an address before something can be written to cache to that address, see e.g. the Wikipedia article on the MESI protocol for details. The instruction underlying this function call signals to the processor that these two prerequisites on a store are relaxed: Firstly, one expects the whole cache line to be overwritten (that the memory subsystem then handles appropriately), so no need to first read the "remainder" of the cache line. Secondly, the data behind that particular memory will not be subject to cache coherency protocol as it will be in main memory both when the same processor wants to access it again as well as any other processors in a multicore chip. Due to this particular setup, any subsequent access to the data written by this function will need to query main memory, which is slower than an access from a cache both latency-wise and throughput-wise. Thus, this command should only be used for large stores that will collectively not fit into caches, as performance will be degraded otherwise. For a typical use case, see also this blog article.

Note that streaming stores are only available in the specialized SSE/AVX classes of VectorizedArray of type double or float, not in the generic base class.

Note
Memory must be aligned by 16 bytes.

Definition at line 2659 of file vectorization.h.

void VectorizedArray< float >::gather ( const float *  base_ptr,
const unsigned int *  offsets 
)
inline

Load n_array_elements from memory into the calling class, starting at the given address and with given offsets, each entry from the offset providing one element of the vectorized array.

This operation corresponds to the following code (but uses a more efficient implementation in case the hardware allows for that):

for (unsigned int v=0; v<VectorizedArray<Number>::n_array_elements; ++v)
this->operator[](v) = base_ptr[offsets[v]];

Definition at line 2680 of file vectorization.h.

void VectorizedArray< float >::scatter ( const unsigned int *  offsets,
float *  base_ptr 
) const
inline

Write the content of the calling class into memory in form of n_array_elements to the given address and the given offsets, filling the elements of the vectorized array into each offset.

This operation corresponds to the following code (but uses a more efficient implementation in case the hardware allows for that):

for (unsigned int v=0; v<VectorizedArray<Number>::n_array_elements; ++v)
base_ptr[offsets[v]] = this->operator[](v);

Definition at line 2700 of file vectorization.h.

VectorizedArray VectorizedArray< float >::get_sqrt ( ) const
inlineprivate

Return the square root of this field. Not for use in user code. Use sqrt(x) instead.

Definition at line 2719 of file vectorization.h.

VectorizedArray VectorizedArray< float >::get_abs ( ) const
inlineprivate

Return the absolute value of this field. Not for use in user code. Use abs(x) instead.

Definition at line 2732 of file vectorization.h.

VectorizedArray VectorizedArray< float >::get_max ( const VectorizedArray< float > &  other) const
inlineprivate

Return the component-wise maximum of this field and another one. Not for use in user code. Use max(x,y) instead.

Definition at line 2749 of file vectorization.h.

VectorizedArray VectorizedArray< float >::get_min ( const VectorizedArray< float > &  other) const
inlineprivate

Return the component-wise minimum of this field and another one. Not for use in user code. Use min(x,y) instead.

Definition at line 2762 of file vectorization.h.

Friends And Related Function Documentation

template<typename Number2 >
VectorizedArray<Number2> std::sqrt ( const VectorizedArray< Number2 > &  )
friend

Make a few functions friends.

Member Data Documentation

const unsigned int VectorizedArray< float >::n_array_elements = 4
static

This gives the number of vectors collected in this class.

Definition at line 2535 of file vectorization.h.

__m128 VectorizedArray< float >::data

Actual data field. Since this class represents a POD data type, it remains public.

Definition at line 2710 of file vectorization.h.


The documentation for this class was generated from the following files: