White Paper # Advanced Error Analysis Offers New Troubleshooting Methods for High-Speed Data Communications ### Introduction #### 25 G - The new standard for I/O Over the past decade, 10 G has become the de facto standard for both long and short-reach high-speed (premium) data communications links. Significant resources have gone into optimizing integrated circuits (ICs) with 10 G input/output (I/O) since the late 1990s to establish a healthy ecosystem for deploying 10 G links cost-effectively. This scales well for short inter-chip interfaces through to long reach (LR 10 km) optical modules and beyond. The 10 G technology displaced the expensive (first) 40 G technology in 40 GE because of its much improved cost scaling. However, it required establishing a new defacto rate to meet the needs of newer standards, such as 100 GE. The ideal technology for high-speed communications interfaces is a mainstream option with the fewest parallel channels for more cost-effective implementation. Transport choices for 100 GE might include 10 x 10 G (used as initial host interface), $4 \times 25$ G (standard), $2 \times 50$ G, or $1 \times 100$ G. Clearly 50 G and 100 G I/O are extremely challenging and likely to carry a significant price-premium for several years so the choice was down to $10 \times 10$ G or $4 \times 25$ G. The option of 10 G could leverage the existing 10 G I/O technology and would build up a body of knowledge over the three generations of 10 G ICs while $4 \times 25$ G uses 40 percent of the components (hence reduced volume, cost, connector size, and PCB trace area). Trends indicated a move toward 100 GE based on $4 \times 25$ G; although, a $10 \times 10$ G host electrical interface was used on the first-generation (CFP) because 25 G technology was too novel for use as a widely deployed pluggable interface. Soon 25 G-based I/O will become the de facto I/O speed for many future technologies, including 100 G Ethernet, OTU4, and Infiniband. Also, you can find 25 G I/O on application specific integrated circuits (ASICs), clock and data recovery (CDR), and field-programmable gate arrays (FPGAs) today. # **Challenges** #### Signal integrity, crosstalk, CDR and FIFO, real data signals, and jitter The price/performance and power capability of today's third-generation 10 G I/O used for most high-speed data links can be deployed cost-effectively. Even with this established technology, first-generation 100 G based on 10 x 10 G presents many signal integrity and performance issues such as jitter tolerance and dynamic skew. The move to 25 G will require resolving many more issues before 100 G (using 4 x 25 G) can become a true mainstream technology. The major issues, especially with the first-generation 25 G I/O ICs include: - signal integrity - CDR performance - jitter tolerance - dynamic skew tolerance - pattern sensitivity These problems are difficult enough to cope with at lower bit rates; but at 10 G+, the conventional tools offer little help; so a new approach is needed, especially to accelerate troubleshooting and for fault-finding, in an effort to be first to market. Also, many modern parts show different performance under real traffic conditions, such as Ethernet or OTN, further compounding the issue. Classic PRBS test signals fail to reveal all of the issues or represent real use cases. Manufacturers must be able to test and validate real traffic to ensure a reliable product. ## **Conventional Approach** #### Legacy BERTs miss the mark Conventional BERTs may offer control pulse parameters such as voltage swing and transition. However, their limited diagnostic capabilities like error count and error sense cannot support framed Ethernet or OTN signals required for real-world validation and test (see Figure 1). Troubleshooting and validation requires end users to rely on their experience and intuition that can take a great deal of time to locate root causes, which can be especially challenging for CDR and FIFO slippage or pattern sensitivity cases. Dynamic skew variation presents more difficulties for conventional BERTs (and critical for the multi-lane buses used in 100 GE and OTU4). It is often extremely challenging for users to accurately manipulate the relative inter-lane skew to the UI fractions needed. Traditional, extremely expensive multi-box BERTs also have limited connectivity because they lack native support for real-world form factors like CFP2. Therefore, end users must connect the devices under test (DUT) with expensive phase-matched microwave cable pairs. Separate power supplies and laptops are also often required to control and power the DUT. Figure 1. Conventional BERTs often need multiple boxes and expensive interconnects to address real-world form factors like CFP2/4. ## Using the VIAVI Solutions ONT 100 G CFP2-based Module Troubleshooting CFP2, CFP4, and other technologies based on 25 G I/O using the VIAVI ONT 100 G CFP2-based module The ONT-600 100 G CFP2-based module, shown in Figure 2, accelerates 25 G I/O technology development and troubleshooting with all the features of a conventional four-channel BERT and new features for error analysis. The ONT CFP2 offers significant enhancements over legacy products because it combines native CFP2 form-factor support, dynamic skew, jitter injection, and real traffic capability (100 G Ethernet and OTU4). It is truly a one-box solution for next-generation 100 G and 25 G I/O with integrated applications for anything from chip to system testing. Native CFP2 support (including applications for MDIO and PSU margin testing) eliminates signal integrity as an issue, because test signals are delivered to the DUT exactly as needed. Figure 2. VIAVI ONT 100 G CFP2-based module The bit capture application, shown in Figure 3, captures the logical view for each of the 25 G lanes to a depth of 512 kbits. Operators can highlight bit errors and a wide variety of trigger options to quickly focus in on the issues. Also, the ONT can drive an external trigger so instruments like fast oscilloscopes can capture the 'physical' signals to quickly reconcile the physical and logical views. Figure 3. Bit capture application capture Dynamic skew lets operators move individual 25 G Tx lanes ±512 bits in 10 mUl steps relative to the reference, see Figure 4. Users can vary the rate from 10 mUl/s to as much as 10 Ul/s to validate the receiver's functional block de-skew functionality. This process is incredibly difficult and time consuming using traditional test methods. This test is typically performed with a gentle (~20 mUl/s) change rate; however, faster rates (up to 10 Ul/s) can be used to validate operational margin and potential failure modes. Figure 4. Dynamic skew application capture The advanced error analysis option, shown in Figure 5, lets operators set up the key parameters before the ONT starts deep advanced error analysis. The ONT captures the errors in a special 'error vector' format, where each error vector is 128 bits wide and up to 256k vectors can be captured and analyzed per lane. The unique application analyzes the error profile and distribution and then produces results that can be used to quickly identify the root cause. Figure 5. Advanced error analysis option Once the application processes these error vectors, it immediately displays meaningful results, similar to those shown in Figure 6, that far exceed those achieved with normal BER measurements. These results quickly reveal issues with CDRs and FIFO slips that conventional tools fail to reveal. Also, it groups errors according to their characteristics (burst length, distance, slip) to help operators immediately see the underlying causes. Figure 6. Error analysis – results summary Figure 7 shows the error distribution on all four lanes, each with a unique color, which can be turned on or off as required. Here, lane 1 has the highest error count; however, all lanes have a 'Poisson'-like distribution in error distance. This is expected for a classic random error with little pattern sensitivity. Periodic errors would show distinct distribution 'spikes'. Figure 7. Error distribution profile The screen in Figure 8 shows the top 10 patterns that lead to a bit error. In this case, most of the errors occur after a run of ones (1) followed by an isolated zero (0). This result could have been caused by an incorrectly set threshold/ slicer. Errors that are unrelated to pattern sensitivity typically have 'flatter' distribution patterns. (We cheated a little here because we set the lane 0 slicer level to an unrealistic +100 mV value to achieve the pattern sensitivity shown in Figure 8.) Figure 8. Top 10 error patterns overview The ONT CFP2-based module front panel, shown in Figure 9, gives the end user a wide array of inter-connect options. Figure 9. Front panel of CFP2 Jitter ranging from 10 kHz to 1 GHz can be injected on one lane (the lane can be carrying PRBS or even real framed traffic) and all other features like dynamic skew can be supported simultaneously. As Figure 10 shows, jitter to 1 UI at 10 MHz was injected with different amplitude profiles (sine, square, and Gaussian noise) letting us observe the corresponding jitter profile on an oscilloscope using jitter decomposition software. $Figure\ 10.\ ONT\ CFP2-based\ module\ being\ used\ together\ with\ its\ active\ electrical\ adapter\ to\ test\ the\ jitter\ injection\ feature$ This technique can be used to validate 25 G I/O jitter tolerance and to measure jitter transfer and component performance, such as CDRs with real traffic rather than classic PRBS. Framed signals, like OTU4 have subtly different spectral properties than PRBS so the ability to validate dynamic clock performance like clock offset, dynamic skew, and jitter with real signals is vital for product reliability in the field. Clock recovery performance and PLL bandwidth must be validated with real data. The base unit allows for jitter injection via the clock input, to validate low-frequency (DC to 10 MHz) jitter performance to a very high (100s UI) deviation. See Table 1 for a complete list of comparisons between conventional BERT and the ONT CFP2-based module for 25 G and CFP2 test validation. Table 1. Comparing conventional BERTs to an ONT CFP2-based module for 25 G and CFP2 test validation | Feature | Conventional BERT | ONT CFP2-based Module | Advantage | |---------------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------| | Pattern generation and checking | PRBS only and basic error counts | Framed and PRBS traffic; detailed error analysis | Real framed traffic must be used to ensure real-world performance | | Interconnect | Expensive, phase-<br>matched cables | Native support for CFP2 modules | Minimize issues with signal integrity | | Bit capture and analysis | Limited, if any | Deep capture with sophisticated graphical analysis | Can quickly identify error root causes | | Bit slip analysis | No | Comprehensive | Quickly solve issues with CDRs and FIFO slip | | Dynamic skew | May be possible | Fully supported with integrated application | Dynamic skew tolerance is critical for reliable operation | | CFP2 module test | No | Integrated 'one-button'<br>application which fully covers<br>data path, MDIO, and PSU +<br>control | Anyone can quickly and efficiently perform reliable, comprehensive module tests | | Jitter | Limited | Jitter inject to 1 UI, 1 GHz | Validate 25 G interface jitter tolerance | # **Summary** The 25 G I/O is set to be the next de facto standard for high-speed interfaces. Conventional techniques based on legacy BERT technology are ineffective for troubleshooting this technology. Issues like skew tolerance and CDR slippage remain elusive and could add significant delays to product delivery. The ONT CFP2 offers significant improvements over legacy products with insightful applications like dynamic skew and advanced error analysis that lead to complete coverage for error root causes. Also, support for true 100 GE and OTU4 signals assures real-world performance. # By Paul Brooks and Juan Masmela