### Story

A few weeks ago we needed to convert one of our own Tensorflow graphs into a TensorRT network. As many of you probably know, there are a few options to accomplish this, like the Tensorflow to UFF and UFF to TensorRT parser or the Tensorflow to ONNX and ONNX to TensorRT parser. When trying the first approach the following error message was one of many we encountered: `UffParser: Validator error: slice_9-26_9-26: Unsupported operation Slice`

. Some of the problems are circumventable but in the end we had to abandon the *UFF to TensorRT* parser, since it is full of bugs and closed source. The ONNX way seemed more promising since its intermediate format was visualisable and changeable. Unfortunately the packages provided by *Anaconda* and *PyPI* were flawed and fixing the C++ source code felt like a lot of work. Especially since the python API of TensorRT to construct networks looked clean and had all operations we needed.

### Goal

The goal now was to create a converter written in *pure python* to parse a Tensorflow graph and create a TensorRT network *without any intermediate format*. The C++ code of the *ONNX to TensorRT* parser could be used as a good reference. An easy extendibility and fast testing cycles were our other concerns for the new library.

### Process

In general, the conversion process can be divided into five steps.

- Preparing the Tensorflow graph
- Parsing the graph definition
- Constructing the TensorRT network
- Optimizing the network into an engine
- Testing the inference result

In step 1 a potential graph is converted into a frozen graph to merge the graph structure and the weights into a single entity. It might be appropriate to strip_unsed nodes and attributes. In step 2 the syntax of the frozen graph can be verified by a parser, and any unknown operations are specified. Step 3 is about verifying the shapes of tensors and if they are supported by TensorRT, since there are quite a few restrictions. Some of the attributes e.g. `keep_dims`

might not be available for specific layers. The optimization process creates a serialized engine which can be used in an execution context of TensorRT to run an inference step. Comparing its results with the output of a Tensorflow graph is crucial to spot eventual low-level implementation differences.

### Outcome

Four out of the five steps listed above are covered in our converter. The first step was left out since its realization depends on the input graph. A possible implementation is shown in our example where a ResNet50 is converted. The library itself consists of four files. A `tf_parser.py`

for task 2, a `trt_builder.py`

for task 3, a `trt_inference.py`

for task 4 and a `trt_importer.py`

to do optimization but also connecting the other files in a simple to use API.

- from_tensorflow_graph_def(…)
- optimize_network(…)
- store_engine(…)
- load_engine(…)
- inference_engine(…)

### Contributions

Right now the library supports only operations with static shapes. Therefore all shapes need to be known at construction time. Furthermore several operations are not yet implemented. Some of them are easy to add, others have no TensorRT equivalent and require additional source code to work. We hope together with the machine learning community to fill in missing layers and happily accept pull requests which help to improve the project.

Link to the TF2TRT converter:

https://github.com/Visual-Computing/TF2TRT/

I’m curious why you guys did not use the tensorflow-trt (https://github.com/tensorflow/tensorrt)? If you did, were the results different from your approach?

TF-TRT tries to convert supported TensorFlow operations into TensorRT operations and merges several of them in a TensorRT engine, which then replaces these TensorFlow operations in the graph. The resulting graph might still contain Tensorflow operations and therefore can only be executed in the Tensorflow runtime, not in the TensorRT standalone library.