Today we’re starting an occasional series of articles by our engineers where they’ll share their particular insights and experience of what it takes to build our open software for mobile wireless networks.
This first piece is by Oriol Font-Bach, PhD, our FPGA Team Lead.
In spite of the COVID-19 pandemic, the truth is that we, SDR engineers, hackers and lovers, are living quite an exciting period. Not only are we participating in shaping the future of communications with our 4G and 5G implementations, but we are also being equipped with processing solutions which were unthinkable just a few years ago. And being the kind of SDR engineers that like to fiddle with FPGA acceleration in our embedded implementations, we are delighted to take advantage of high-performance, heterogeneous and highly configurable architectures such as the Radio Frequency System-on-Chip (RFSoC) from Xilinx. Very conveniently, from an embedded 4G and 5G SDR development point of view, these chips provide us with multi-core processors, FPGA logic, multi-gigasample RF data converters and dedicated high-speed memory and interfacing elements, all in a single silicon die. Of course, some engineering efforts are required before we can put all those processing elements to good use. But where would the fun lie for us, otherwise?
Target SDR setup of our 4G/5G embedded UE solution (or another introduction, but this time with technical content)
In our very first Engineering Insights post, we would like to give you our two cents on how to start implementing a 4G/5G FPGA-accelerated (hurray!) SDR UE on a RFSoC device. Since we love to get our hands dirty and fiddle with all sorts of SDR platforms, it’s just natural that we try to keep the focus of this article as practical as possible. Hence, we’ll start by selecting the ZCU111 prototyping platform as an implementation target, given that it constitutes a ready-to-go COTS RFSoC-based candidate to host our embedded SDR UE implementations.
Once we have selected the board and target RFSoC device, the most important question for us is: how can we fully squeeze the many possibilities around this complex processing solution, so that it can accommodate our flexible SDR implementation needs? Let’s start from the most elemental step (that we often take for granted in many SDR developments): how do we get a stream of received I/Q samples to our SDR stack implementation? So, to start with, we need to configure the central receive frequency and sampling rate according to our specific UE requirements and target RAN technology, in order for the analog to digital converters (ADCs) to start generating that precious I/Q data for us. Of course, we’ll also want to provide the I/Q samples generated by our SDR stack to the digital to analog converters (DACs) so they can be transmitted, but considering that the required steps will be analogous (in an inverted order), we’ll keep the focus of this article on the ADC chain.

Let’s start by inspecting what is being offered by our selected prototyping platform and RFSoC device. As we can see in the figure above, the RF data converters are directly accessible from the FPGA, which in turn will implement the required interfacing to exchange I/Q data with the CPU. Hence, the first thing we’ll want to add in our FPGA design is the RF data converter (RFdc) IP core provided by Xilinx (you’re just a few clicks away from getting AXI-based access to the RF data converters!). This will grant access to the samples generated by the ADC chain(s). Then, for instance, a DMA-based solution can be used to pass those samples to the CPU. For our embedded UE, we do very much prefer to accelerate as much heavy DSP as we can in the FPGA, hence in the following we’ll cover how to get the baseband I/Qs so they can be further processed in the FPGA (so, to keep on reading the article or not is entirely up to you).
The ADC chain RF data converters in the selected RFSoC device offer decimation up to x8 per channel and a mixer with a 48-bit configurable Numerically-Controlled Oscillator (NCO). The supported sampling rates range from 1.0 to 4.096 Gsps. Very conveniently, Xilinx also offers a related software library to help us configure the RFdc from the CPU.
Our proposed path: the journey starts at the CPU side
Now, how do we configure the RFdc to get the I/Q samples, all while fulfilling our requirements? Given that our goal is to implement a 4G/5G UE, then targeting sampling frequency multiples of 30.72 MHz (and a 15 kHz subcarrier spacing) seems a good starting point. In fact, it will enable supporting different BW configurations for both LTE and NR (at least in numerology 0). Settling for 1.96608 Gsps will do the trick, as the RFdc can be easily configured to provide this LTE/NR-friendly sampling frequency from the 245.76 MHz reference clock provided by the on-board RF PLL. Let us show you our way forward towards the conquest of the RFdc.
At SRS, we have built our software suite upon a layered structure, which provides a neat solution to organize the code and delimit functionalities (and, why hide it, it’s also handy to be able to “hand over” bugs to the guys in other layers – does “this is not a hardware issue” sound familiar?). Our bottom layer is responsible for interfacing with the underlying hardware in our SDR platform, including the RF devices and FPGA; hence, here’s where you’ll find the piece of code configuring our proposed ADC solution. Ok, now the technically accurate description: we have extended our HW layer to provide support to FPGA-based embedded solutions, including a custom driver built around the XRFdc library provided by Xilinx (how cool is that? Our list of supported SDR boards is growing nicely! Although we all know that USRPs will always have a special place in our hearts).
The XRFdc library comes with an API that abstracts away the low-layer operations by conveniently equipping the user with cleanly defined functions and data structures. The basic workflow described by Xilinx is based around PetaLinux and the creation of dedicated RFdc applications (here we won’t dig into such details, but we can recommend you to read this blog instead). Nevertheless, we are interested in a more flexible solution (i.e., we want to cleanly integrate our custom RFdc driver into larger CMake based projects); consequently, we find it better to make sure that our built embedded Linux image and rootfs end up containing all necessary libraries to create an SDK around it, so that it can be easily used later in our cross-compilation environment (yes, in order to be able to use a regular ‘cmake && make’ workflow we’ll need to create a specialized cmake toolchain file… but this only needs to be done once, so it’s not that bad at the end of the day).
Wow, that was a lot of text! So, please, let us show you a harmless snippet of code below showing how we deal with the initialization of the RFdc driver (it feels like we’ve waited long enough to have an excuse to do so!).
XRFdc *RFdcInstPtr = &handler->RFdcInst;
XRFdc_Config *ConfigPtr = NULL;
XRFdc_IPStatus IPStatusPtr = {};
// Initialize libmetal layer
struct metal_init_params init_param = METAL_INIT_DEFAULTS;
if(metal_init(&init_param)){
ERROR(“Failed to run libmetal initialization”);
return -1;
}
// Initialize the RFdc driver
ConfigPtr = XRFdc_LookupConfig(RFDC_DEVICE_ID);
if (ConfigPtr == NULL){
ERROR(“Couldn’t look up RFdc configuration”);
return -1;
}
…
Status = XRFdc_RegisterMetal(RFdcInstPtr, RFDC_DEVICE_ID, &handler→phy_deviceptr);
…
INFO(“RF_RFdc: RFdc driver successfully registered and mapped to Libmetal”);
/* Initializes the controller */
Status = XRFdc_CfgInitialize(RFdcInstPtr, ConfigPtr);
if (Status != XRFDC_SUCCESS){
ERROR(“ERROR: Failed to initialize RFdc controller”);
return -1;
}
INFO(“RF_RFdc: RFdc controller successfully initialized”);
Great! Now we can actually deal with configuring the RFdc. First thing you’ll want to do is program the reference clock used in the ADC tiles. In the ZCU111 this means programming the LMK02408 and LMX2594 chips and the ADC tiles will use the 245.76 MHz reference clock. Yes! Another snippet of code follows (it’s just a quick snapshot really, but you’ll find plenty of examples in this Xilinx repo – we’d recommend you to take a look at those making use of XRFdc_Mixer_Settings and XRFdc_SetMixerSettings, as you’ll want to configure the NCO and define your central RF frequency).
//Configuring the clocks
LMK04208ClockConfig(I2CBUS, LMK04208_CKin);
//The RFdc IP expect a 245.76 MHz reference clock (as it is set in Vivado)
LMX2594ClockConfig(I2CBUS, RFDC_REF_SAMPLE_FREQ_KHZ);
We recommend you to explicitly wake up the ADC tiles you want to use after configuring the clock (we found that they do love being idle and can use a little push). So that’ll be our third and last RFdc driver code snippet before we jump into the details at the FPGA side.
Status = XRFdc_StartUp(RFdcInstPtr, XRFDC_ADC_TILE, 1);
The expedition arrives to the vast realms of DSP acceleration in an FPGA (just an excessively grandiose title to cover the RTL details we want to show you)
Yes, we know that you, dear colleague, know how to configure a chip, how to twiddle an API and write some great code to conveniently deal with it (sorry if so far the article seemed quite general!). But our objective here is to show how to continue, after you find out that configuring the RFdc still leaves you a few steps away from getting those desired baseband I/Q samples, as it could be expected from a highly configurable (and generic or RNA-agnostic, if you prefer) device. As it happens, if you use the maximum on-chip decimation, you’ll notice that the resulting signal is sampled at 245.76 Msps, which puts us on the right track, but will still require some extra (in-FPGA) decimation that will depend on the desired UE DL BW. To overcome this, we have implemented a configurable decimation solution (yes, the counterpart interpolation solution is also there, but let’s keep the focus on the ADC chain, shall we?).
Our configurable decimation stage will thus receive an AXI stream of parallel I/Q data sampled at 245.76 Msps. Yet, the AXI interface uses a 122.88MHz clock (for each ADC channel we’ll have two 32-bit AXI ports, where each word provides two consecutive I/Q samples). Thus, in order to support BWs from 6 to 20 MHz (i.e., sampling rates from 1.92 MHz to 30.72 MHz) this block will implement a variable decimation, up to x128. Of course, it will also unpack the AXI data packets and, more importantly, it will procure an output clock matching the desired baseband sampling rate as well (yes, we also find that disposing of a clock that allows processing the I/Q samples one at a time, often comes in quite handy). Below we do our best to explain it clearly and without entering into too much detail, but knowing that an image always does a better job, please first take a look at the diagram below.

First, in the most typical SDR fashion, the FPGA will receive the desired configuration from the CPU. This will include our target baseband sampling rate, the amount of decimation to be applied and the expected frequency for the output baseband sampling clock.
A series of cascaded FIR filters will be used to implement the extra decimation for our UE. A control finite state machine (FSM) will take care of enabling those FIR instances required to attain so (effectively idling the others – yes, while we do use FPGAs, we are also aware that it is nice to keep the consumption as low as possible). We know that ‘cascaded’ can sound scary. “What about latency?”, you might ask. Well, our proposed solution will run the FIRs at 245.76 MHz, which is at least 8x times faster than our desired baseband sampling frequency, and facilitates a (nearly) direct interfacing with the AXI input bus. Before that, though, we do need to unpack the AXI data packets, which is really the easiest of it all. Ok, so we’ll be ready to forward I/Qs to our first FIR in the cascade, as soon as our clock generation circuitry reaches a stable status. The reason behind this being the fact that it is desirable to use a buffered version of the input RFdc AXI clock (245.76 MHz) to drive both the FIRs and the related control logic. The outputs of each FIR are connected to a FIFO that will finally enable translating them to the output baseband clock and forward them to the other DSP stages implemented in the FPGA (again, those FIFOs that are not required will be kept in an idle state).
Very conveniently for us, the RFSoC device is also equipped with several Mixed Mode Clock Manager (MMCM) primitives to synthesize the clocks we need from the input RFdc AXI clock. Wrapping these primitives we have the clock wizard IP cores, which ease the task of dynamically reconfiguring the MMCM at run-time from the FPGA itself. To that end, we have a simple FSM that writes the required values to the internal registers of the MMCM so that it generates our desired baseband clock (as well as the buffered version of the RFdc AXI clock, that is). Finally, we don’t want to output the baseband clock until it has been configured according to the CPU request and it is stable (yes, we want to ignore the first default clock and the non-stable portion of the desired one too). Good for us that we also have some BUFG primitives with clock-enable and clear input ports that we can drive from the MMCM ‘locked’ and CPU AXI reset signals (don’t forget to configure the MMCM properly, so as not to cascade BUFGs unless you really need to do so).
Below we have included a block diagram of the proposed solution. It will probably help you better understand the text above (we tried to keep it clear and without too much detail, but we’re Engineers at the end of the day!

And now, allow us to introduce you to a new section in this Engineering Insights post: “things that are much funnier to tell than to debug“. So you are testing the dynamic reconfiguration of the MMCM. Nice, the first configuration works (start-to-end) and you get a baseband signal that is actually working with your stack (of course, we first captured some data and tested it offline, we’ll explain later). So you feel bold and jump into testing another configuration (without resetting the board, that is, otherwise it doesn’t count as a valid test!). Wait. What is happening? Why doesn’t it work now? We have correctly registered the new configuration received from the CPU… Long story short, the design has portions working at different clock domains, each being driven by its own control FSM(s) but, and this is a big but, there are dependencies amongst the FSMs. And all FSMs depend on the ‘locked’ status of the MMCM (i.e., stable output clocks or not). Hence, on top of all carefully placed cross-clock domain control logic, the FSM driving the MMCM needs to be sure that ‘locked’ has been deasserted in all clock domains before actually starting its reconfiguration. In some future post we’ll tell you about the joys of hardware debugging (can’t wait for that one!).
Let’s evaluate the solution in the lab (just some quick results)
As part of our lab setup, we have a(n always evolving) continuous integration (CI) framework that eases the task of testing our implementations, while also ensuring that our code passes a thorough and reliable validation process. In this case, the CI setup allows automatically compiling our code, building our bitstreams and downloading it all to the ZCU111. It will also take care of executing our transmitter application (e.g., the pdsch_enodeb application if we want to generate an LTE DL signal). More importantly, it will also configure the RFdc. Then, it can capture data at the output of our configurable decimation solution or it can configure other FPGA-accelerated blocks, such as the PSS detection in the LTE case, so that we can validate the resulting signal either offline (e.g., using it as a test vector with SRS stack code) or in-FPGA (e.g., using ILAs).
The figure below shows a screenshot of the offline validation approach for an LTE signal of 1.4 MHz of BW (i.e., maximum supported decimation), which is captured at the output of our configurable decimation solution and passed through our software PSS detection.

And we close the post with one more screenshot of the actual FPGA-accelerated PSS detection of the same signal, as seen through an ILA core. Beautiful, isn’t it?
This is the end

It’s great that you have read this far! But we feel it’s time to wrap up. Yes, it has been a short ride, but we hope you enjoyed it and that our description has been clear enough (you see, us FPGA guys, we always have trouble synthesizing). The article has just covered the very first step towards implementing a 4G/5G FPGA-accelerated SDR UE on a RFSoC device. We firmly believe that the beauty of an embedded implementation mostly springs from the combination of heterogeneous design and programming techniques, engineering skill sets and… well, all of it actually working on the hardware. Hence, our intention here has been to show you a few stills quickly capturing some (hopefully) relevant aspects of our hands-on experience on getting the RFSoC to produce baseband I/Q samples in a flexible manner. On future posts we will continue this journey and cover in detail some aspects for which we have just scratched the surface here. In any case, if you don’t wait until the next post comes up, you can always throw us a few lines at the email address listed below and we’ll be glad to provide you with more information.
A PDF of this article is available to download here.