Architecture of an Ultrasound System for Continuous Real-time High Frame Rate Imaging

Enrico Boni, Member, IEEE, Luca Bassi, Alessandro Dallai, Valentino Meacci, Alessandro Ramalli, Member, IEEE, Monica Scaringella, Francesco Guidi, Stefano Ricci, Senior Member, IEEE, and Piero Tortoli, Senior Member, IEEE

Abstract—High frame rate (HFR) imaging methods based on the transmission of defocused or plane waves rather than focused beams are increasingly popular. However, the production of HFR images poses severe requirements both in the transmission and the reception sections of ultrasound scanners. In particular, major technical difficulties arise if the images must be continuously produced in real-time, i.e. without any acquisition interruption nor loss of data. This paper presents the implementation of the real-time HFR compounded imaging application in the ULA-OP 256 research platform. The beamformer sustains an average output sample rate of 470 MSPS. This allows continuously producing coherently compounded images, each of 64 lines by 1280 depths (here corresponding to 15,7 mm width and 45 mm depth, respectively), at frame rates up to 5.3 kHz. Imaging tests addressed to evaluate the achievable speed and quality performance were conducted on phantom. Results obtained by real-time compounding frames obtained with different numbers of steering angles between +7.5° and -7.5° are presented.

Index Terms—High frame rate, plane wave imaging, real-time, ULA-OP.

I. INTRODUCTION

Conventional ultrasound (US) imaging methods based on the transmission of focused beams are known to suffer for some inherent limitations. First, the quality of images is not uniform, since, even though dynamic focusing is applied in reception (RX), the resolution is always better in the area around the focus set in transmission (TX). Second, the formation of each single frame requires multiple TX-RX events, which lengthens the acquisition time while reducing the frame rate. Finally, as different image lines are taken at different time instants, possible artefacts can appear during the observation of fast morphological/hemodynamic events.

High frame rate (HFR) imaging methods [1], based on the transmission of multiple simultaneous focused beams (Multi Line Transmission - MLT) or of defocused beams such as those associated with diverging waves and plane-waves (PWs), can overcome the above limitations. However, undesired effects are unavoidably generated. In all of these cases, in fact, artefacts may be produced by the simultaneous reception of echoes from different points of the region of interest. Furthermore, when a given area is covered by a single defocused beam, the resolution and the signal-to-noise ratio (SNR) are necessarily worse than those obtained from the transmission of focused beams. These effects are typically mitigated by compounding [2]–[5], i.e. the combination of echoes received from a same area insonified at different steering angles, which unavoidably determines a frame rate reduction. In the MLT case, the major artefacts are due to cross-talk and may be reduced by proper TX-RX apodization [6], [7] or by novel beamforming schemes [8], [9].

The possible production of HFR images poses severe requirements both in the TX and the RX sections of an US scanner. In TX, the major constraint for plane wave imaging is related to the need of simultaneously exciting multiple transducer elements with the same signal: the power supply must be designed to sustain the corresponding high peak currents [10]. For MLT imaging, non-identical electric excitation pulses must be applied to the different transducer elements. These pulses are given by the sum of the pulses that would be applied on the individual elements when distinct focused transmit beams were separately emitted. Even though the problem may be counteracted by properly exploiting pulse wave modulation approaches [11], [12], the best and most flexible solution remains the use of arbitrary waveform generators coupled with high power linear amplifiers [13]. Finally, another requisite that must be satisfied by a scanner to produce high quality HFR images is full flexibility in setting the transmission sequence, which may alternate beams or plane waves steered at several angles [3], [4], [7].

In RX, the minimum scanner requirement is having generous memory to store the so-called raw channel data, i.e. the radiofrequency (RF) echo-data received by each active transducer element, over a suitable time interval. For example, the acquisition of raw channel data obtained at an 80 MHz analog-to-digital conversion rate and 12-bit resolution, determines the storage of about 120 MB/s/channel. By forwarding the data stored in the memory toward a host PC, the RX beamforming and image formation are demanded to

This work was supported by the national government and the European Union through the ENIAC JU project Devices for Neural Control and Neural Rehabilitation (DeNeCoR) under grant agreement number 324257, and by the Italian Ministry of Education, University and Research (PRIN 2010-2011).

The authors are with the Department of Information Engineering, University of Florence, Florence 50139, Italy.
II. METHODS

A. High frame rate imaging approach implementation

1) ULA-OP 256: reception architecture

ULA-OP 256 is a 256-channel research scanner built around multiple Front-End (FE) boards, each controlling 32 elements. Fig. 1 describes the ULA-OP 256 RX processing chain. In each FE board, after the transmission, the backscattered echoes are filtered, amplified and digitized by four 8-channel Analog Front End (AFE) devices (AFE5807, Texas Instrument, Austin, TX, USA). The digital samples are stored inside the FPGA and used to produce the “partially” beamformed lines (i.e. obtained from the contributions of only 32 elements, as detailed in the next Section). These RF data are sent to two FE DSPs where they are compounded, when requested, and demodulated. Base-band data are then sent to the DSP contained in the Master Control (MC) board, where the final beamforming sum is performed. On the same DSP, programmable elaboration modules may perform further data processing to produce the final frames. Such frames are sent through an USB 3.0 link to the PC, where they are presented on the monitor by the ULA-OP 256 software.

The raw and beamformed RF samples and the baseband data can always be stored into the available DDR memories, for subsequent download.

2) ULA-OP 256: implementation of HFR beamforming

The received echoes are digitized with 78.125 MHz sampling rate and 12-bit resolution. Each AFE transfers the samples to the FPGA through eight bit-streams (each at 937.5 Mbit/s rate) according to the low-voltage differential signaling (LVDS) format.

The 32-channel beamformer is implemented in the FPGA (ARRIA V GX Family, Altera, San Jose, CA, USA), which processes the data as shown in Fig. 2.

The DESER block de-serializes the bit streams gathered from 32 LVDS channels, and produces a single vector of 384 bits composed by the 32 samples (each with 12-bit resolution) acquired by the AFES at a given clock cycle. The vector data rate $F_v$ is typically coincident with the ADC sampling rate (78.125 MSPS). However, when, for a specific application, the highest sampling rate is not necessary, an optional decimator block (DECI M) is enabled to reduce the data rate with downsampling factor programmable between 2 and 4.

The 32-sample vector is stored (at $F_v$ rate) in a Dual Port Memory (DPM), which features a depth of 8192 words. The data is read back from the DPM at $F_m = 234.375$ MHz (i.e., the DPM is also used for crossing the clock domain) to four processing units (DAS) that, in parallel, implement the delay and sum beamformer.

The HFR beamformer can work in any of the following modalities:

- Single Buffer (SB). The data stored in the DPM at $F_v$ rate are DAS processed at $F_m$ rate to contribute to the simultaneous formation of 4 image lines. After the first elaboration is completed, the same 8192 stored vectors are re-elaborated with different focusing coefficients, to contribute to the formation of a new group of 4 lines. This

![Fig. 1 ULA-OP 256 reception processing chain.](image-url)
Three subsequent samples around the desired delays are inserted in a Circular Buffer to lengthen the dataset for each FE board, to:

- Double Buffer (DB). The DPM is split in two sections, of programmable dimensions, to be used as echo-pong buffers. While one buffer is used to write the new echo-data at Fs, the second buffer is used to process multiple times, at the higher FM rate, the data acquired in the previous pulse repetition interval (PRI).

Each DAS beamformer, for each FE board, includes 32 Delay and Apodization cells, as shown in Fig. 3. The channel data are gathered from the DPM and inserted in a Circular Buffer to perform the Delay operation. As soon as the channel data is available and the Beamformer Coefficients are ready, the Delay and Apodization calculations take place. The temporal resolution of the beamformer can be improved up to 1/(16*Fs) by using three subsequent samples around the desired delay location to perform quadratic interpolation of the acquired data (Interp block). Each interpolated sample is then multiplied by the apodization coefficient to produce the output of the cell.

All cells apply the same procedure over the data from different channels: the results are finally summed to produce the partially beamformed value related to 32 channels.

A DDR Memory Controller (DMC) and a Memory Manager (MM) (Fig. 3) manage the DDR banks dedicated to the FPGA. The DMC initializes the memory devices, produces the refresh commands at appropriate intervals, translates the read and write requests from the local interface to DDR commands. The MM schedule requests and provides the data to the FIFO Coeff to guarantee continuous beamformer operation.

The beamformed data are buffered in two FIFOs (FIFO Res) in Fig. 4 and channeled in two 4-lane SerialRapidIO (SRIO) links, which feature a total of 40Gb/s throughput rate. These links, through a SRIO Switch (80HCPS1432, Integrated Device Technology, San Jose, CA), deliver the data into the two onboard DSPs (320C6678 family from Texas Instruments, Austin, TX, USA). The data is moved into the DDR connected to the DSPs through dedicated DMA channels. Once the beamforming process ends, the DSPs have the data ready in their DDRES.

The two DSPs of the FE board are dedicated to specific processing tasks, such as quadrature demodulation and compounding. The multicore architecture of the processors is exploited to maximize their total throughput, which exceeds 1.4 billion samples per second. Each DSP embeds 8 cores: the primary DSP is subdivided into one master core and 7 slave (processing) cores, while the secondary DSP contains only slave cores. The DSPs are connected together by means of an exclusive 25Gbps Hyperlink bus that allows the master core in the primary DSP to communicate with the processing cores in the secondary DSP.

The master core is mainly used to schedule the processing jobs to the other cores. For each cluster of samples representing one (partially beamformed) RF line, the master core prepares a dataset of processing parameters, and assigns it to a slave core. The dataset is based on parameter sequences arranged by the MC board, and includes the compounding factor, the demodulation frequency, the filter length and the downsampling factors. The sequences are transmitted from the MC board to the FE DSP.

Once a slave core finds a new job in its queue, it processes the samples according to the parameters contained in the dataset. Using Direct Memory Access (DMA), the slave core retrieves a block of samples from the DDR memory and places it in a buffer inside its internal memory. When compounding is enabled, the core iteratively programs the DMA peripheral to...
transfer another block of samples into a different buffer, then sums the content of the two buffers together and leaves the result in the first buffer. Transfers and sums are repeated a number of times proportional to the compounding factor.

Demodulation is divided in two tasks: digital quadrature mixing with a fixed frequency and low-pass filtering along with downsampling. Both tasks are performed inside optimized functions making extensive use of intrinsic DSP opcodes which execute multiple operations with a single instruction.

The mixer multiplies the incoming RF samples with suitable sine/cosine coefficients to produce quadrature samples. The coefficients are calculated within the function itself, by means of iterative multiplications in the complex plane. All operations employ 32-bit values to guarantee cumulative amplitude errors lower than 1 ppm, and a frequency resolution of 1 Hz in the mixing process. The function outputs one quadrature sample per core clock cycle (1 ns), implying 6 multiplications performed through 1.5 instructions.

A cascade of four moving average filters, with two embedded downsamplers, operates low-pass filtering of complex samples. The length of each filter is adjustable up to a cumulative length of 256 samples for the cascade. Averaging filters employ fixed-point operations and 40 bit accumulators to prevent saturation, along with bit-shifts at the end of each stage. The complete cascade is set in a single loop, which takes 8 core-clock cycles to output one complex sample.

The slave cores generate up to 512 quadrature samples per line, and store them in a queue of shared memory inside the primary DSP. Whenever the queue is not empty and the SRIQ channel is ready to transfer data, the master core sends the samples to the MC board, one line per time. Since up to 8 FE boards concurrently transfer their samples toward a single device, the communication implements acknowledge messages to prevent data traffic congestion.

The last beamforming stage takes place inside one core of the DSP mounted on the MC board, which can sum more than 600 million complex samples per second. The result is transferred to the external DDR memory through DMA, and at the same time a new set of samples is processed, thus implementing a double buffering technique.

The external memory operates both as a temporary queue and as a large buffer for long acquisitions of pre- and post-beamforming data and subsequent download of data to PC. When real-time processing is in progress, another DSP core retrieves the quadrature demodulated samples from the external memory and processes them according to the system configuration set by the user. For standard B-Mode processing, the signal power is calculated, and thresholding and logarithmic compression are applied to reject noise and to map the signal amplitude to a 256 gray-scale image, respectively.

Once a frame is complete, it is transferred from the MC board to the host PC by means of a USB3.0 link. The PC recollects the streamed data and displays the frames on the monitor screen. It is worth highlighting that, in case the monitor is not able to show all the frames received, because of its limited refresh rate (typically 60 Hz), the frame discarding occurs only just prior to their presentation, thus in the PC and neither in the ULA-OP 256 system nor in the communication bus.

B. Experiments

1) Experimental setup

ULA-OP 256 was connected to the 192-element linear array probe LA533 (Esaote S.p.A., Florence, Italy), having a 110% bandwidth centered at 8 MHz and 245 µm pitch.

The TX signal was a 3-cycle sine burst at 8 MHz with Hamming tapering and peak amplitude of 80 Vpp. Both in TX and RX 128 elements were activated: the TX apodization was a Tukey window, while the RX apodization was a dynamic Sinc with Fp=2. ULA-OP 256 was programmed to transmit 1, 3, 7, 11, 15, 21 or 31 steered plane waves covering a 15°-wide sector.

In order to estimate the speed performance of the whole system, we evaluated: the maximum achievable pulse repetition frequency (PRFmax); the maximum continuous real-time B-mode imaging frame rate (FRmax); the average beamformer output bandwidth (BBF); the SRIQ data transfer bandwidths on the FE (BFE) and MC (BMC) boards; and the data transfer bandwidth to PC on the USB link (BUS).

The experimental evaluation of the quality of real-time images was based on the commercial tissue mimicking phantom 404GSLE (Gammex, Middleton, WI, USA), which includes wires as well as hyperechoic and anechoic cysts. For all of the modes, the quadrature demodulated baseband data (IQ) corresponding to 85 B-mode frames were acquired and post-processed to estimate the related quality performance metrics. The image quality obtained with the compounding method was compared with that obtained with 96-line standard linear scans (LS) focused at 10, 20, and 30 mm with TX Fp=2 and RX sinc dynamic apodization with Fp=2, that achieved 40 Hz frame rate.

2) Image quality metrics

Acquisitions were performed by investigating three different regions of the 404 GSLE phantom (see Fig. 5):

• Two anechoic cystic regions placed at depths of 1 cm (ROI1) and 3.5 cm (ROI2), respectively. Each region consists of three cysts with diameters of 1, 2, 4 mm, respectively;
• Seven nylon wires, which are spaced 5 mm apart (ROI3).
ROI1 and ROI2 were exploited to evaluate the contrast ratio (CR) at different depths, which is defined as:

$$\text{CR}_i = \frac{\int_{C_i} ds \cdot \int_{B_i} |IQ(S)|^2 ds}{\int_{B_i} ds \cdot \int_{C_i} |IQ(S)|^2 ds}, \quad i = 1, 2 \quad (1)$$

where $C_i$ correspond to square areas (1 mm$^2$) selected inside the anechoic cysts (see the yellow squares in Fig. 5), while $B_i$ are the background regions (1 mm$^2$) selected at the same depth of the cysts but outside the cysts (see the blue squares in Fig. 5). The extent at which the signal and the noise are individually affected by each TX-RX strategy, was also evaluated. The IQ data from the Ci regions were used to estimate the mean noise power. Those signals were first high-pass filtered (IQF) along the slow time to remove the contribution of possible steady artifacts due to the echoes from the surrounding tissue. The signal-to-noise ratio was defined as:

$$\text{SNR}_i = \frac{\int_{C_i} ds \cdot \int_{B_i} |IQ(S)|^2 ds}{\int_{B_i} ds \cdot \int_{C_i} |IQ_f(S)|^2 ds}, \quad i = 1, 2 \quad (2)$$

The point spread functions of the wires intercepted in ROI3 were exploited to assess the lateral resolution at different depths (Rd), which was here evaluated as the full width half maximum over the lateral direction.

CR, SNR and Rd were computed for all the acquired frames and finally averaged.

### III. RESULTS

#### A. Speed performance

TABLE I shows the measured overall speed performance of the whole system for the PW 1, 3, 7, 11 imaging modes. It is worth highlighting that, according to the specific TX/RX settings (number of elements, TX delays, TX signal, number of lines per frame), the beamformer can produce up to about 470 MSPS. Taking into account protocol and physical encoding overheads, the maximum data transfer bandwidth of each SRIQ channel is 1.75 GB/s. Thanks to the dual channel SRIQ transferring, the data transfer from the beamformer to the two FE DSPs is 3.5 GB/s, which represents the upper limit for $B_{\text{IOFE}}$.

The maximum data transfer from the 8-board ring to the single MC DSP is 1.75 GB/s, which correspondingly limits the $B_{\text{BOMC}}$. As a result, in PW 1 and PW 3 the bottleneck is $B_{\text{BOMC}}=1.73$ GB/s, which enables a frame rate of 1100 Hz. On the contrary, the bottleneck for PW 7 and PW 11 is $B_{\text{IO}}=467$ MSPS, which limits the maximum PRF to 3800 Hz.

#### B. Quality performance

TABLE II shows the quality performance metrics for each tested mode; blue/yellow cells highlight the plane wave imaging modes whose performance are better/worse than at least two of the reference linear scan modes. As expected, the higher is the number of transmitted plane waves the higher are the values of CR and SNR. This is always true for both ROIs except for the CR value obtained for PW 3 in ROI1, which is...
lower than expected, likely because the actual propagation speed differs from that specified in the datasheet of the phantom and it affects the computation of delays both in TX and in RX. Moreover, the results suggest that seven PWs are sufficient to achieve better SNR values (>34dB for ROI\(_1\) and >13dB for ROI\(_2\)) than those obtained with any of the reference linear scans. On the other hand, linear scans present the highest CR values in ROI\(_1\) (>20.6dB), while in ROI\(_2\) 7 PWs are sufficient to achieve better CR values (>11dB) compared to linear scans. PW imaging also performs better in terms of lateral resolution. Indeed, it presents the best average value (\(|E[R_q]|\)) and the narrowest ranges (\(|R_q|_{\text{max}} - |R_q|_{\text{min}}\)), which are better than 548 and 122 μm, respectively. Essentially, as expected, PW imaging presents roughly constant resolution with depth.

Fig. 6 shows, as an example, interleaved B-mode images obtained for LS 30 (left) and PW 7 (right), when simultaneously investigating ROI\(_1\) and ROI\(_2\). The qualitative comparison confirms that plane wave imaging presents better image quality and uniformity, with reduced CR for hypoechoic cysts at shallower depths.

IV. DISCUSSION AND CONCLUSION

This paper has presented the implementation of continuous, real-time, coherently compounded HFR imaging, in the ULA-OP 256 research platform. This result was achieved thanks to a two-stage beamformer that was fully implemented in hardware (FPGAs and DSPs), by dedicating special attention to the transfer and output bandwidth optimization.

The first stage, implemented on each FE board, beamforms the RF echoes of 32 channels by means of 4 beamforming instances working in parallel. Each instance can be reprogrammed multiple times within each PRI, so as to multiply the number of lines beamformed in each PRI. The use of ping-pong double buffers allows processing the echo data at a rate (234.375 MHz), much higher than the sampling frequency, thus maximizing the beamformer output rate up to 470 MSPS, on average. The first stage beamformer was implemented in the FE FPGAs (ARRIA V 5AGXF3H4F40C4, Altera, San Jose, CA, USA). The current utilization of FPGA resources is 72400 Adaptive Logic Modules (53%), 11 Mbit of memory (63%), 322 DSP blocks (31%).

The massive use of multi-core programming on the two 8-core FE DSPs allowed implementing fast digital quadrature demodulation, low-pass filtering and downsampling. One complex-sample per core clock cycle (1 ns) was thus obtained. The partially beamformed baseband data are transferred to the MC DSP by SRIO switches for the second stage beamformer, which can sum more than 600 million complex samples per second.

It is worth highlighting that the modular architecture of the system allows increasing the processing power with the active channels count. For example, according to TABLE I, PW7 and PW11 modes can run with 256 channels as well, without reducing the frame rate nor the base PRF. Having said that, other architectures could have been developed, e.g. the one, software-based, implemented in other research systems. In principle, in a software based architecture, a 16 lanes PCI-express link can transfer data from 128 channels in real-time, and such data can be beamformed through an efficient code inside the GPU. However, when more than 128 channels are needed, the PCI-express link should be proportionally expanded and additional GPU processing power should be installed.

The ULA-OP 256 system is equipped with 80 GB (expandable to 144 GB) of DDR3 memory, which can be used to store significant amount of RF data, both pre and post beamforming, as well as quadrature demodulated data. Depending on the pulse repetition interval and on the extension of the region of interest, up to 30 s of raw data can be saved. It is worth highlighting that all this processing power is packaged in a 34×30×26 cm rack. This makes the system mobile, which is important to facilitate the transportation to the laboratories of scientific partners.

The effectiveness of the whole system was evaluated in terms of speed performance (PRF\(_{\text{Max}}\) and FR\(_{\text{Max}}\)) and image quality performance (CR, SNR and R\(_{\text{RI}}\)). The tested modes, exploiting 128 active elements, highlight that the maximum frame rate (1100 Hz) is limited by B\(_{\text{ROMC}}\), while the achievable PRF (3800 Hz) is limited by B\(_{\text{BF}}\). Accordingly, in order to increase the FR\(_{\text{Max}}\), the transfer load on the SRIO switch present on the MC board must be lightened. This result can be achieved by reducing the frame size in terms of number of points or by reducing the number of active FE boards (which, in turn, reduces the number of active elements in RX). Similarly, in order to achieve higher PRFs, the load on the beamformers must be lightened by reducing the number of frame lines or by reducing the number of beamformed RF samples. For example, by reducing the number of active elements to 64 and the frame size to 64 lines × 256 gates, the system continuously sustains FR\(_{\text{Max}}\)=5300 Hz (PRF\(_{\text{Max}}\)=5300 Hz) when imaging with a single
plane wave transmission, while it sustains FR$_{{\text{Max}}}$$=2000$ Hz (PRF$_{{\text{Max}}}$$=6000$ Hz) when compounding 3 plane waves.

In terms of image quality, as expected, a higher number of compounded PWs allows achieving higher SNR and CR at the expense of FR$_{{\text{Max}}}$. In addition, PW imaging presents almost uniform resolution at different depths, better than the resolution of standard focused scans outside the focal region. In general, 7 PWs are sufficient to achieve better image quality and higher frame rate with respect to linear scans.

Since continuous HFR real-time imaging is finally feasible, algorithms that were so far tested only in post-processing, or in near real-time, can move toward more intensive clinical practice. Possible applications that could benefit from HFR imaging in real time span from breast cancer diagnosis [23]–[25] to cardiac tissue Doppler imaging [26] and intrinsic waves imaging [27]. In addition, blood flow imaging can benefit from real-time operation for: the reconstruction of 2D vector Doppler maps [28]–[31]; the simultaneous assessment of color flow and pulsed wave Doppler imaging [32], [33]; the reconstruction of complex-flow dynamics [34], and also for functional analysis of the brain [35], [36]. Finally, HFR is potentially of great importance for 3D imaging as it allows reducing the time required for volume acquisition [37]–[41].

The described parallel beamformer architecture is currently used in our laboratory for the real-time test of novel imaging/Doppler methods. One is the multi-line transmit (MLT) technique, which was already, so far off-line, shown capable of significantly increasing the frame rate of cardiac images without significantly compromising their quality [7]. With ULA-OP 256, frame rates up to 212 Hz are currently obtained in real-time [42], but work is in progress to produce at least 600 frames/s, which might push fast MLT cardiac imaging one-step forward to clinical routine. The described beamformer architecture will also enable real-time wide-angle field-of-view for tissue Doppler imaging, thus facilitating the assessment of myocardial deformation or of left ventricular dyssynchrony [26]. Finally, the ULA-OP 256 was tested in preliminary multi-line vector Doppler experiments [43], showing in real-time the distribution of velocity vectors over 8 parallel lines covering extended regions of interest.

REFERENCES

Luca Bassi was born in Borgo San Lorenzo (Florence), Italy, in 1978. He received the degree in Electronic Engineering in 2004, and the Ph.D. degree in Electronic Systems Engineering in 2008, both from the University of Florence. He currently holds a post-doc position at the Microelectronic Systems Design Laboratory of the University of Florence, his research activities are the development of hardware and software system for imaging ultrasound application with emphasis on the design of programmable architecture systems and the development of FPGA optimized code.

Luca Bassi was born in Borgo San Lorenzo (Florence), Italy, in 1978. He received the degree in Electronic Engineering in 2004, and the Ph.D. degree in Electronic Systems Engineering in 2008, both from the University of Florence. He currently holds a post-doc position at the Microelectronic Systems Design Laboratory of the University of Florence, his research activities are the development of hardware and software system for imaging ultrasound application with emphasis on the design of programmable architecture systems and the development of FPGA optimized code.

Alessandro Dallai was born in Florence, Italy, in 1979. His lively interest in electronics has prompted him to pursue the Laurea degree and the Ph.D. in electronic engineering, respectively received in 2004 and 2009 from the University of Florence. In 2005 he spent one year working with software-defined-radios for avionics. Currently, he is involved in multiple research programs in the department of Information Engineering at the University of Florence, mainly regarding the improvement of novel ultrasound systems, which in the past years contributed to the development of a worldwide appreciated platform for research on ultrasound. His academic interests focus on electronic system design and highly-optimized DSP firmware development.

Valentino Meacci received the master’s degree in electronic engineering in 2013 and the Ph.D. degree in information technology in 2016 from the University of Florence, Italy, where he is currently a Postdoctoral Researcher.

He is involved in the development of novel ultrasound systems, with a focus on electronic system design and highly optimized FPGA firmware.

Alessandro Ramalli (M’10) was born in Prato, Italy, in 1983. In 2008, he graduated in Electronics Engineering from the University of Florence. In 2012 he earned the PhD degree in Electronics System Engineering from the University of Florence and in Automatics, Systems and Images from the University of Lyon, by defending a thesis on the development of novel ultrasound techniques for imaging and elastography. Alessandro currently holds a postdoctoral position at the MSD Laboratory of the University of Florence, where he is involved in the development of the imaging section of a programmable

Doppler ultrasound signal processing, microemboli detection and classification.

Enrico Boni (M’12) was born in 1977 in Florence, Italy. He graduated in electronic engineering on 2001 at the University of Florence, Italy and received the PhD degree in Electronic System Engineering on 2005 from the University of Florence, Italy. He currently holds a Research position at the Microelectronic System Design Laboratory, Department of Information Engineering, University of Florence, Italy. His research interests include analog and digital systems design, digital signal processing algorithms, digital control systems,
open ultrasound system. His research interests include medical imaging, ultrasound simulation and elastography.

**Monica Scaringella** was born in Italy in 1976. She graduated in Electronic Engineering in 2002 at the University of Florence where she also received her Ph. D. in Materials Engineering in 2006 working on instrumentation for high energy physics experiments. She later focused on the development of novel radiation detectors for medical physics within the University of Florence and the Italian National Institute of Nuclear Physics. Since 2013 Monica is a research fellow at the Information Engineering Department of the University of Florence working on the development of high performance ultrasound imaging systems.

**Francesco Guidi** was born in Portoferraio (LI), Italy, in 1964. He graduated from the University of Florence, Italy, with the M.Sc. degree in electronics engineering and subsequently he received his Ph.D. degree in electronic systems engineering. After working in a national company on the design of a real time radiologic image processing system, he joined the National Institute of Nuclear Physics (INFN) where he was involved in the design of real time software for solid state particle detectors. Since 1992, Francesco has held a position at the Electronics and Telecommunications Department of the University of Florence. His research interests include the development of real-time methods for ultrasound blood flow estimation and the investigation of acoustic properties of ultrasound contrast agents.

**Stefano Ricci** (M’07-SM’16) received the degree in Electronic Engineering in 1997, and the Ph.D. degree in Electronic Systems Engineering in 2001, both from the University of Florence. Since 2006 he works as researcher at the Electronics and Telecommunications Department (recently changed in Information Engineering Department) of the University of Florence. His research activities are focused on the development of high performance ultrasonic systems and the development and test of new ultrasound methods for medical and industrial applications. Stefano Ricci is author of more than 60 publications in international conferences and journals.

**Piero Tortoli** (M’91-SM’96) received the Laurea degree in electronics engineering from the University of Florence, Italy, in 1978. Since then, he has been on the faculty of the Electronics and Telecommunications (now Information Engineering) Department of the University of Florence, where he is currently full Professor of Electronics, leading a group of over 10 researchers in the Microelectronics Systems Design Laboratory.

Professor Tortoli has served on the IEEE International Ultrasonics Symposium Technical Program Committee since 1999 and is currently Associate Editor for the UFFC Transactions. He chaired the 22nd International Symposium on Acoustical Imaging (1995), the 12th New England Doppler Conference (2003) and established the Artimino Conference on Medical Ultrasound in 2011. In 2000, he was named an Honorary Member of the Polish Academy of Sciences. Professor Tortoli’s research activity is centered on the development of ultrasound research systems and novel imaging/Doppler methods, on which he has published more than 200 papers.