GPU's can be used to quickly solve certain parallel problems. However, not all problems are good candidates for GPU computation. Additionally, we want to keep both the CPU and GPU as busy as possible at all times. For this assignment, we will be using both Bag of Tasks implementations from HW3 and HW4 to keep both the GPU and CPU as busy as possible. I/O format will remain the same as before, but this time you should grab a task and dynamically decide which device to run it on. Also, for this assignment, you CANNOT assume files will fit into memory. You should define a maximum chunk size and if files are larger than that, process them in parts. We are going to assume the CPU can handle a smaller number of tasks at one time than the GPU can. This means you should pull a relatively small constant number of tasks, start up your HW3 implementation and run it. At the same time, grab a larger number of tasks and send them to the GPU to be processed as in Problem 4. Finally, once all tasks are complete, you will need to merge the two hash tables you have produced (the GPU table as well as the CPU table), and output the final counts. The objective is to finish as fast as possible.
The following files are provided to solve this exercise:
Getting started with CUDA and references: