Hello!
Only after seeing the slides you created did I realise that the latency if DD3 is similar to that of GDDR5 and thus this is actually feasible!
I have worked on a ethminer port to Cell/Broadband Engine (CBE) in the past, i.e. the PS3. The project didn't go very far but involved things like managing the data bus between the main processor(PPU) and the "synergetic" processors (SPU), cache management for SPU controller, and many optimisations so that the code kept running on the PPU and SPUs as fast as possible.
So I believe I know the ethereum source quite well.
Currently I am working on making a Neural net lib with a hobby group that utilises an FPGA. I haven't written mining code on an FPGA before though.
Also I think that the ZCash algorithm is quite conducive to mining. It requires 50 mB memory and arria 10 gt 1150, a rather new FPGA, has 53 mB DRAM!!! Compare that to 2 mB L3 cache (which is DRAM) that most processors have.