Overlay Tutorial¶
This notebook gives an overview of how the Overlay class has changed in PYNQ 2.0 and how to use it efficiently.
The redesigned Overlay class has three main design goals * Allow overlay users to find out what is inside an overlay in a consistent manner * Provide a simple way for developers of new hardware designs to test new IP * Facilitate reuse of IP between Overlays
This tutorial is primarily designed to demonstrate the final two points, walking through the process of interacting with a new IP, developing a driver, and finally building a more complex system from multiple IP blocks. All of the code and block diagrams can be found at [https://github.com/PeterOgden/overlay_tutorial]. For these examples to work copy the contents of the overlays directory into the home directory on the PYNQ-Z1 board.
Developing a Single IP¶
For this first example we are going to use a simple design with a single IP contained in it. This IP was developed using HLS and adds two 32-bit integers together. The full code for the accelerator is:
void add(int a, int b, int& c) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE s_axilite port=a
#pragma HLS INTERFACE s_axilite port=b
#pragma HLS INTERFACE s_axilite port=c
c = a + b;
}
With a block diagram consisting solely of the HLS IP and required glue logic to connect it to the ZYNQ7 IP

Simple Block Diagram
To interact with the IP first we need to load the overlay containing the IP.
In [1]:
from pynq import Overlay
overlay = Overlay('/home/xilinx/tutorial_1.bit')
Creating the overlay will automatically download it. We can now use a question mark to find out what is in the overlay.
In [2]:
overlay?
All of the entries are accessible via attributes on the overlay class
with the specified driver. Accessing the scalar_add
attribute of the
will create a driver for the IP - as there is no driver currently known
for the Add
IP core DefaultIP
driver will be used so we can
interact with IP core.
In [3]:
add_ip = overlay.scalar_add
add_ip?
Reading the documentation generated by HLS tells us that use the core we
need to write the two arguments to offset 0x10
and 0x18
and read
the result back from 0x20
.
In [4]:
add_ip.write(0x10, 4)
add_ip.write(0x18, 5)
add_ip.read(0x20)
Out[4]:
9
Creating a Driver¶
While the UnknownIP
driver is useful for determining that the IP is
working it is not the most user-friendly API to expose to the eventual
end-users of the overlay. Ideally we want to create an IP-specific
driver exposing a single add
function to call the accelerator.
Custom drivers are created by inheriting from UnknownIP
and adding a
bindto
class attribute consisting of the IP types the driver should
bind to. The constructor of the class should take a single
description
parameter and pass it through to the super class
__init__
. The description is a dictionary containing the address map
and any interrupts and GPIO pins connected to the IP.
In [5]:
from pynq import DefaultIP
class AddDriver(DefaultIP):
def __init__(self, description):
super().__init__(description=description)
bindto = ['xilinx.com:hls:add:1.0']
def add(self, a, b):
self.write(0x10, a)
self.write(0x18, b)
return self.read(0x20)
Now if we reload the overlay and query the help again we can see that our new driver is bound to the IP.
In [6]:
overlay = Overlay('/home/xilinx/tutorial_1.bit')
overlay?
And we can access the same way as before except now our custom driver
with an add
function is created instead of DefaultIP
In [7]:
overlay.scalar_add.add(15,20)
Out[7]:
35
Reusing IP¶
Suppose we or someone else develops a new overlay and wants to reuse the
existing IP. As long as they import the python file containing the
driver class the drivers will be automatically created. As an example
consider the next design which, among other things includes a renamed
version of the scalar_add
IP.

Second Block Diagram
Using the question mark on the new overlay shows that the driver is still bound.
In [8]:
overlay = Overlay('/home/xilinx/tutorial_2.bit')
overlay?
IP Hierarchies¶
The block diagram above also contains a hierarchy looking like this:

Hierarchy
Containing a custom IP for multiple a stream of numbers by a constant
and a DMA engine for transferring the data. As streams are involved and
we need correctly handle TLAST
for the DMA engine the HLS code is a
little more complex with additional pragmas and types but the complete
code is still relatively short.
typedef ap_axiu<32,1,1,1> stream_type;
void mult_constant(stream_type* in_data, stream_type* out_data, ap_int<32> constant) {
#pragma HLS INTERFACE s_axilite register port=constant
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE axis port=in_data
#pragma HLS INTERFACE axis port=out_data
out_data->data = in_data->data * constant;
out_data->dest = in_data->dest;
out_data->id = in_data->id;
out_data->keep = in_data->keep;
out_data->last = in_data->last;
out_data->strb = in_data->strb;
out_data->user = in_data->user;
}
Looking at the HLS generated documentation we again discover that to set
the constant we need to set the register at offset 0x10
so we can
write a simple driver for this purpose
In [9]:
class ConstantMultiplyDriver(DefaultIP):
def __init__(self, description):
super().__init__(description=description)
bindto = ['Xilinx:hls:mult_constant:1.0']
@property
def constant(self):
return self.read(0x10)
@constant.setter
def constant(self, value):
self.write(0x10, value)
The DMA engine driver is already included inside the PYNQ driver so nothing special is needed for that other than ensuring the module is imported. Reloading the overlay will make sure that our newly written driver is available for use.
In [10]:
import pynq.lib.dma
overlay = Overlay('/home/xilinx/tutorial_2.bit')
dma = overlay.const_multiply.multiply_dma
multiply = overlay.const_multiply.multiply
The DMA driver transfers numpy arrays allocated using the xlnk
driver. Lets test the system by multiplying 5 numbers by 3.
In [11]:
from pynq import Xlnk
import numpy as np
xlnk = Xlnk()
in_buffer = xlnk.cma_array(shape=(5,), dtype=np.uint32)
out_buffer = xlnk.cma_array(shape=(5,), dtype=np.uint32)
for i in range(5):
in_buffer[i] = i
multiply.constant = 3
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()
out_buffer
Out[11]:
ContiguousArray([ 0, 3, 6, 9, 12], dtype=uint32)
While this is one way to use the IP, it still isn’t exactly user-friendly. It would be preferable to treat the entire hierarchy as a single entity and write a driver that hides the implementation details. The overlay class allows for drivers to be written against hierarchies as well as IP but the details are slightly different.
Hierarchy drivers are subclasses of pynq.DefaultHierarchy
and,
similar to DefaultIP
have a constructor that takes a description of
hierarchy. To determine whether the driver should bind to a particular
hierarchy the class should also contain a static checkhierarchy
method which takes the description of a hierarchy and returns True
if the driver should be bound or False
if not. Similar to
DefaultIP
, any classes that meet the requirements of subclasses
DefaultHierarchy
and have a checkhierarchy
method will
automatically be registered.
For our constant multiply hierarchy this would look something like:
In [12]:
from pynq import DefaultHierarchy
class StreamMultiplyDriver(DefaultHierarchy):
def __init__(self, description):
super().__init__(description)
def stream_multiply(self, stream, constant):
self.multiply.constant = constant
with xlnk.cma_array(shape=(len(stream),), \
dtype=np.uint32) as in_buffer,\
xlnk.cma_array(shape=(len(stream),), \
dtype=np.uint32) as out_buffer:
for i, v, in enumerate(stream):
in_buffer[i] = v
self.multiply_dma.sendchannel.transfer(in_buffer)
self.multiply_dma.recvchannel.transfer(out_buffer)
self.multiply_dma.sendchannel.wait()
self.multiply_dma.recvchannel.wait()
result = out_buffer.copy()
return result
@staticmethod
def checkhierarchy(description):
if 'multiply_dma' in description['ip'] \
and 'multiply' in description['ip']:
return True
return False
We can now reload the overlay and ensure the higher-level driver is loaded
In [13]:
overlay = Overlay('/home/xilinx/tutorial_2.bit')
overlay?
and use it
In [14]:
overlay.const_multiply.stream_multiply([1,2,3,4,5], 5)
Out[14]:
ContiguousArray([ 5, 10, 15, 20, 25], dtype=uint32)
Overlay Customisation¶
While the default overlay is sufficient for many use cases, some overlays will require more customisation to provide a user-friendly API. As an example the default AXI GPIO drivers expose channels 1 and 2 as separate attributes meaning that accessing the LEDs in the base overlay requires the following contortion
In [15]:
base = Overlay('base.bit')
base.leds_gpio.channel1[0].on()
To mitigate this the overlay developer can provide a custom class for
their overlay to expose the subsystems in a more user-friendly way. The
base overlay includes custom overlay class which performs the following
functions: * Make the AXI GPIO devices better named and range/direction
restricted * Make the IOPs accessible through the pmoda
, pmodb
and ardiuno
names * Create a speical class to interact with RGB
LEDs
The result is that the LEDs can be accessed like:
In [16]:
from pynq.overlays.base import BaseOverlay
base = BaseOverlay('base.bit')
base.leds[0].on()
Using a well defined class also allows for custom docstrings to be provided also helping end users.
In [17]:
base?
Creating a custom overlay¶
Custom overlay classes should inherit from pynq.UnknownOverlay
taking a the full path of the bitstream file and possible additional
keyword arguments. These parameters should be passed to
super().__init__()
at the start of __init__
to initialise the
attributes of the Overlay. This example is designed to go with our
tutorial_2 overlay and adds a function to more easily call the
multiplication function
In [18]:
class TestOverlay(Overlay):
def __init__(self, bitfile, **kwargs):
super().__init__(bitfile, **kwargs)
def multiply(self, stream, constant):
return self.const_multiply.stream_multiply(stream, constant)
To test our new overlay class we can construct it as before.
In [19]:
overlay = TestOverlay('/home/xilinx/tutorial_2.bit')
overlay.multiply([2,3,4,5,6], 4)
Out[19]:
ContiguousArray([ 8, 12, 16, 20, 24], dtype=uint32)
Included Drivers¶
The pynq library includes a number of drivers as part of the
pynq.lib
package. These include * AXI GPIO * AXI DMA (simple mode
only) * AXI VDMA * AXI Interrupt Controller (internal use) * Pynq-Z1
Audio IP * Pynq-Z1 HDMI IP * Color convert IP * Pixel format
conversion * HDMI input and output frontends * Pynq Microblaze program
loading