The chasm between Software Engineers and FPGAs for ML applications

General

The chasm between Software Engineers and FPGAs for ML applications

AI and Machine Learning (ML) are penetrating all industries. As AI algorithms mature, the infrastructure supporting them are maturing as well. Right now, there are ASIC, CPU, GPU and FPGA solutions available as hardware platforms to accelerate these algorithms. At Plunify, we are more familiar with FPGAs but many of us are software people at heart. When I say "software", I don't mean embedded engineers, firmware engineers or people who developed drivers; we are talking about developers coding in languages like .NET, Java, Python, R, SQL, C++ and Javascript.

One of the newer projects we are working on is ML-driven placement for chip design. Using hundreds of thousands of different placements to train on, we use ML techniques to predict the final timing performance of the design before you even do routing. (This is another post)

What framework should one use to start a ML project?

Considering a number of available options, we eventually chose Tensorflow. This may seem incredulous because shouldn't we use something that can easily target FPGAs as the hardware? I mean we have developed Xilinx Vivado plugin called Plunify Cloud and an design optimization software called InTime. So we should know a thing or two about FPGAs and Tensorflow is not remotely close to FPGAs.

But don't get us wrong. We are firm believers of the acceleration beast that exists in FPGAs. However, at the start of every ML project. the main considerations for picking a suitable development environment are skill-set and availability of learning resources. For skill-set, we know all the languages I listed above and more. For resources, Tensor Flow+Keras tutorials and documentation seem to beat all others hands down. With so many options, you want to get up and running quickly and see if your methods work.

Which acceleration platform is the one to use?

How do we speed up ML training and inference as we generate hundreds of thousands of designs with different placements and resource requirements? The acceleration question is coming eventually - should we use GPU/TPU/FPGAs? It is pretty evident that the user-friendliness odds are stacked against FPGAs. The logical option is still the Google Cloud Platform or even GPUs.

For us, we have to figure out how to convert what we are doing into an acceleration environment for FPGAs. And when I say "convert", it is not a complete rewrite. Maybe we should have started with Caffe but again, software developers don't think about the acceleration platform in the starting phase.

I know nuts about engines. I just want a faster car.

There is an alternative - High Level Synthesis. Write in C/C++ and convert to Verilog or VHDL. (This approach has been around for very long and heavily critique.) Recently we spoke with the good folks at Hastlayer who are providing a .NET SDK capable of converting a .NET program to VHDL. Sure, there are some limitations but as a software developer, it is a great leap from trying to understand clocks, frequencies, types of device family and boards. All we self-centered software developers care about is how fast can my ML training be done. If my program runs for 1 day on CPU, I want to know how fast it runs on FPGA. If it takes me 2 hours, can I use an optimized version (e.g. same .NET program but optimized by InTime) to run for only 30 minutes?

Engage in the natural habitats or convert later

Engaging software developers in their natural habitats early definitely helps. Seeing how fast the field is expanding, the majority of the programmers writing machine learning algorithms are not going to data scientists with a Phd in mathematics or FPGA/ASIC design engineers. It will be regular guys with a typical CS degree like you and me using ready-made libraries. It may be a bit late to come up with your own ML framework without the resources of a huge corporation or team (well, anything is possible). If not, having an easy, automated conversion path seems like the next best option.

So instead of saying "you need to learn how to target FPGAs from day one", we should be saying "do this to convert your Tensorflow / Pytorch program to target an FPGA". Or let us run this on FPGAs for you.

In both options, there are chasms to cross. If you are a software developer hitting these issues, it will be great to hear your thoughts.

Join Plunify Newsletter

Enter your email address and have the latest insights on FPGA, cloud and Machine Learning delivered straight to your inbox.

Leave a Reply