Work Log (Stream Abstraction for GPU)

Week 1 (10/11/2009 ~ 10/18/2009) Brainstorming
Week 2 (10/19/2009 ~ 10/25/2009) Goal: Make RandomSourceFilter<float> | DelayFilter<float, float> work

Abstract base class: FilterBase
Filter<inType, outType> -> FilterBase
other filters derived from Filter<>. Examples are RandomSourceFilter<outType>, FileSourceFilter<outType>, SocketSourceFilter<outType>, FileSinkFilter<inType>, SplitFilter<inType, outType, fanOutSize>, JoinFilter<inType, outType, fanInSize> ...
TODO: Are we gonna support multiple in/out stream/types? If so, the class may need refactoring again. High risk...

Need a global system object that holds all topological information? Currently I have a StreamSystem object that knows all filters and their relationships.
Designing data channel interface (10/22):

int ChannelBase::reserve(void **, int bSize) : instead of push() to remove the extra memory copy
void ChannelBase::reserve_done()
int ChannelBase::pop(void **buffer, int peekBSize, int popBSize, int parallel, bool consumeFlag = true)
Two kind of concrete channels<>: inter-process channel(in multiple node case), intra-process channel.

Each filter is a cpu thread, currently
Filter runs a kernel once its input is ready and meets the requirement for parallelism
Channel buffer management? Currently using produce/consume style.

Week3 (10/26/2009 ~ 11/1/2009) Goal: simple example working; integrate CUDA(defer)

Adding TermOutFilter causes crash. debugging (10/26)
Lessons learned: (a) add "volatile" keyword for class members if the class is run in multi-threaded. (b) g++ in mac seems to have bugs handling volatile variables, not sure if it is because of the compiler or pthread (10/27)
Working on a way to gracefully terminate the program by flushing signals from source to sink filters. (10/28)
RandomSourceFilter<float> | DelayFilter<float, float> | outputFilter<float> works. (10/29)
Typelist is a potential solution to represent filters with multiple input and output ports. Filter now is defined as Filter<inTypeList, outTypeList> where the *TypeList can have arbitrary number of types in it. (See Loki::TypeList and "Modern C++ Design: Generic Programming and Design Patterns Applied") Re-factoring code... 10/30)
Made FIR filter working. Need to change the algorithm in the channel buffer management: copy tail buffer to the front of the head in case the tail is large enough.

simple multi-node case working: identity filter (11/17)
Refactor Makefile: add nvcc (11/17)
Add gpu object per process. It encapsulates all cuda calls and provide dma functions. (11/18)
Revise Channel classes to handle different end node configurations(GPU/CPU/Network->GPU/CPU/Network, 6 combs) (11/19)
Add GPU kernel calls. (11/20): make several filter examples working

Add random double generator filter (11/31/09)
Rewrite Reduce filter (12/1/09)
Add bucketsort filter(12/2/09)
Bucket sort CPU version working(12/3/09)
Add GPU part in bucket sort(12/4/09)