ICE Help GENERAL
Summary: General (collection of concepts & features)
A brief description of the general features of the ICE family of DSP cards.
Contents
- 1 PERFORMANCE - Cost/Performance benefits
- 2 SCALABILITY - Cards, Chassis, and Interconnects
- 3 FLEXIBILITY - Programmable Hardware Concepts
- 4 TIMECODE - Handling Embedded TimeCode
- 5 OVERSAMPLING - Upsampling Techniques for Digital Tuners
- 6 RESAMPLING - Resampling Techniques for Digital Tuners
- 7 BOT - Bank Of Tuners on Processor Modules
- 8 FTT - Fast Tuner Transform Concept of Operation
- 9 DMA - DMA Concepts and Channel Allocation
- 10 CHAINING - DMA Chaining Concepts
- 11 SHARCMEM - SHARC/PPC Memory Allocation
- 12 MIDASDSM - Connecting to an STL Digital Switch Matrix
- 13 CLOCKING - Clock sources and selection
- 14 PLATFORMS - Notes on specific platforms
PERFORMANCE - Cost/Performance benefits
The ICE family of Digital Signal Processing boards are designed to deliver the highest performance at costs in line with Personal Computer budgets. For more information about what each card can do, see the HELP CARDS entry. For current pricing, visit www.ice-online.com.
SCALABILITY - Cards, Chassis, and Interconnects
The ICE family of DSP cards are sized to fit in a PC chassis. The ICE-PIC and ICE-MBT series are PCI devices. The ICE-SLIC series are Cardbus/PC-Card devices. The ICE-NIC series are external devices connected to a host via Gigabit ethernet.
FLEXIBILITY - Programmable Hardware Concepts
The 16-bit digital inputs are fed directly into an Field Programmable Gate Array. This part can be re-programmed to perform application specific front end bit processing. It is then fed into a SHARC or PowerPC DSP for further processing before it is DMA'd into the host computer. The DSP is programmed in C or assembly.
Standard configurations supported by the default boot code include 1,4,8, and 16 bit data packing, various acquisition triggers, and data gates.
Non-standard configurations might include feeding 8 pairs of clock and data into a 16 bit input module, or demultiplexing a serial bit stream for follow on controller processing.
The module sites include a set of master/slave pins which can be used to strap two modules to begin acquisition/playback on the same clock. The Series-3 and later cards have external access to these signals to synchronize multiple cards.
TIMECODE - Handling Embedded TimeCode
Digital time code embedded in the input stream is processed by library routines that run on the host computer. Double clutch timecode is handled automatically.
When acquiring data, the timecode bit from the raw input is processed in the FPGA with a defined Barker code. The timecode is tagged with the sample number and the last two stored in the FPGA's memory. Host code queries the FPGA for this information and maps the timecode to a specified index in the host acquisition buffer. This allows time tagging 8 or 16 bit packed data, as well as the on-board tuner output. The delay through the tuner chips is is compensated for in the host software. See HELP PIC_TC.
Digital IRIG-B input to the external trigger port is processed by the IOC FPGA into a barker code and 32 bits of data much like the other digital time code standards. The accuracy is about 100uS on most GPS receivers. The A2Dr7 modules have an optional 1PPS port that can be used to refine the measurement to the accuracy of the 1PPS +- 10nS.
If a computer has NTP enabled (Network Time Protocol), only the 1PPS is needed to provide and accurate time stamp.
SDDS embeds the timecode in a packet header. This is read by the IOC and handled downstream in the same way embedded serialized timecode is handled.
The PIC5 series can also process serialized SDN timecode embedded in the SDDS payload section. To enable this, simply specify TC=SDN0 (or TC=SDN3 for some tape playback scenarios).
The PIC4 series handles this case with special setup steps. The I/O Module must use the RXSDDSDATA flag to to eat the SDDS packet headers and present the PIC4 with normal 16 bit data. This is then processed by the normal IOC=II or IOC=IO FPGA load which handles the SDN timecode. Since the default download for an SDDS module is IIS or IOS, the IOC code must be specified in the card reset.
OVERSAMPLING - Upsampling Techniques for Digital Tuners
Digital Tuner chips typically have a fixed lower end to the decimation they support. This is usually limited by the number of filter taps it can compute per output sample. At low input clock rates, the chips multipliers are not used efficiently, unnecessarily limiting the output bandwidth. One technique to use more of the chip is to resample the input at a higher rate so there are more clock cycles available per output sample. The simplest form is to insert a fixed number of zeros between each input sample. This has the affect of duplicating the input spectrum N times, where N is the number of zeros inserted per sample.
To make software more generic, oversampling is applied to the tuner ports by setting the oversampling rate that the tuner inputs will be seeing before the tuner port is set up. The only affect on the tuner port will be to relax the minimum decimation. The gain loss from the zero insertion is compensated for in the pic_tuner library.
The oversampling circuit can also be used to shield the tuner chips from clock irregularities. When digital inputs are switched or tape playback machines loose signal, the clock presented to the ICE-PIC may contain glitches that the tuner chips cannot recover from. With an oversampling factor = 1, the input clock is conditioned by the IOC gate array to keep glitches from affecting the tuners.
An oversampling factor = 2, inserts 1 zero between each input sample.
The input clock must be < 20MHz to apply the OVSR=1 conditioning. The oversampled rate (inputrate*OVSR) must be < 40MHz on series 3 cards. The oversampled rate (inputrate*OVSR) must be <= 100MHz on series 4 cards.
RESAMPLING - Resampling Techniques for Digital Tuners
The GrayChip 4016 tuner chips have an optional digital resampler that can be applied after the Tune-Filter-Decimate stages. This can be used to create baud synchronous sample rates for demodulators or 8000Hz for VGC extraction. The filters from the GrayChip web site are available as Midas files in the DAT directory of the ice tree. The GC4016 user guide has a detailed discussion on the resampler algorithm.
In short, the tuner output is oversampled by inserting NDELAY-1 zeros between each tuner output sample. The resampling ratio is used to determine which of the NDELAY fractional sample points to use for each resampler output and runs an NTAP filter on that point. The filters in the ice dat directory are actually NDELAY*NTAP point symmetric filters. We call it an NTAP filter because only NTAP of the points need to be computed since only one in NDELAY taps have non-zero data values. The phase jitter introduced by this technique reduces the SNR to about 40dB.
The pic_loadfile(), NDELAY=n flag, RESAMP flag and pic_setKey(KEY_RATIO) function are used to setup the resampler. The resampler ratio is defined as desired output sample rate divided by the tuner output sample rate.
The NDELAY=n defaults to 32. If you are not using a filter built for NDELAY=32, the flag must be added to the config string when loading the filter. The filter file names from graychip use the naming convention, res_<NTAP>x<NDELAY>_<WIDTH>. For example, the file res_15x32_80 is a 15 tap 80% filter with 32x oversampling.
The PIC5 tuner has a 10 tap resampler inserted between the CFIR and PFIR filters with an NDELAY=2048. The CFIR and PFIR filters should be chosen to achieve optimal results. The CFIR is a decimate by 2 filter at 4 times the output rate so a 25% filter should be selected (the default is dfir_25). This presents a twice oversampled complex waveform to the resampler section. The resampler increment is a 28 bit counter with an automatic M over N circuit to preserve exact timing for most ratios. The accumulator register is reset every M samples to remove binary rounding errors. The M output samples for each N input samples actually used is displayed if the VERBOSE=2 flag is present. M and N are 16 bit integers. The output is then sent through the decimate by 2 PFIR filter for final output conditioning.
Note that if the real output mode is used with decimation=1, the output will be frequency shifted up by (Fso - Fsi)/4, where Fsi is the input frequency to the resampler and Fso is the output Frequency. This offset can be removed by tuning off of Fsi/4 by this amount.
The maximum output bandwidth of the PIC5 tuner is 64MHz. To preserve the whole band use decimation=1 with the AOVSR (auto-oversampling) flag. The tuner allows the center frequency to be adjusted for the new output rate. To disable the M over N circuit, user the NORESMON flag.
BOT - Bank Of Tuners on Processor Modules
There are 32 individual tuner channels on the DTDM/V6M/K8M processor modules. In normal mode, each channel has independent decimation, frequency, start/stop control, and DMA buffers.
A Tuner Bank is a block of tuners that share a DMA channel for efficiently handling a number of similarly configured channels. They must have the same decimation and are required to start/stop together. TunerBank=1 uses the 16 tuners on the Module=1 side, TunerBank=2 uses the 16 on other. TunerBank=3 uses the 32 tuners from both sides all being fed from Module=1 and returned in a single DMA buffer.
Tuner Banks are selected by specifying PORT=TBANKn instead of PORT=TUNERn. The DTDM/DTDMX Modules support up to 32 channels tunable anywhere in the spectrum. By default, the pic_ioport call will implement as many channels as are available on the named port. To use less channels, set KEY_CHNS=n before the call to pic_ioport() or add the CHNS=n flag to the config string, or use the /NCHN=n switch on SOURCEPIC.
To control individual channels from SOURCEPIC, set the CHAN key, before setting FREQ or GAIN. If CHAN is set to zero, the setting applies to all channels in the bank.
If channels are contiguous in the spectrum, using /DFREQ=dfreq with SOURCEPIC will set up the tuners equally spaced by dfreq Hertz starting at the <freq> parameter. Setting the FREQ in this mode, moves the whole block. You cannot tune individual channels. In this mode, the Fast Tuner Transform algorithm can be applied to increase the number of usable channels to 256 with FTT=2, or 4096 with FTT=3. See the FTT discussion for more details.
The frame or packet size for the output DMA buffer, KEY_PKTLEN, and the channel spacing, KEY_DFREQ, must be set ahead of the pic_ioport() call.
The output DMA buffer will contain KEY_PKTLEN bytes of data from channel 1, followed by channel 2, ... up to channel N, then start over at channel 1.
For more details, see the help on the FTT flag. For more details, see the help on the FTTM flag.
FTT - Fast Tuner Transform Concept of Operation
The DTDM/DTDMX Modules have 64Mby of DRAM and a fast memory crossbar that allows multiple reuse of the 8 graychips. The FTT is a multipass algorithm similar to a radix-16 FFT pass. A first Bank of 16 tuners selects 1-16 blocks of the input spectrum and streams them to circular buffers in memory. The second bank of 16 tuners then selects 1-16 blocks from each of these streams (much faster than real-time) and streams them back to memory. This is practical for 2 to 3 passes.
The FTT algorithm is enabled by adding the FTTM=2 or FTTM=3 flag in the device configuration stream and accessing a Tuner Bank. By default, the pic_ioport call will implement as many channels as possible given the port, decimation, channel spacing, and number of FTT passes (specified by FFTM=N). To use less channels, set KEY_CHNS=n before the call to pic_ioport().
For more details, see the help on the FTTM flag.
DMA - DMA Concepts and Channel Allocation
High speed data transfer is via the PCI controller's DMA engine which is given maximum hardware-level priority since the card has minimal buffer memory. The host computer typically allocates a circular buffer in memory to hold 1-2 seconds of data (to cover host application software latencies). The SHARC/PPC then processes DMA requests from 1 to 80 of it's input/output ports. All 80 DMA channels can be owned/controlled by different processes.
Acquisition/Playback can occur through the following device ports:
SERIAL1-2 : serial ports (PIC2 only) LINK1-6 : link ports (PIC2 only) TUNER1-32 : tuner channels MODULE1-2 : I/O Modules INTERNAL1-8 : internal algorithms EXTERNAL1-8 : internal algorithms (extended memory on PIC4/MBT4)
The port is usually specified in the hardware configured device alias. See HELP PIC_OPEN for details, HWCONFIG.KEY in the DAT area for examples.
There are 8 hardware DMA channels on the SHARC that are shared between the ports. This means that up to 8 hardware acquisitions/playbacks can be occurring simultaneously on a single ICE card. There are also internal algorithms executing on the SHARC that may also produce or consume DMA data buffers.
The FPGA on the 5 series cards allow each DMA channel to have its own port so there are no resource conflicts. The PPC is a controller only. Its DMA resources are not used to handle data.
A serial port is tied to its DMA channel. A link port can be associated with any DMA channel supporting a link buffer. The DMA channel will be determined automatically from the port name.
The user is responsible for managing any sharing of the serial port, tuner/serial port, link port, and module/link port DMA resources.
The DMA Channel mappings for ICE-PIC2 are:
Chan 1 Serial Port 1 Receive SERIAL1/TUNER1 Chan 2 Serial Port 2 Receive or Link Buf 1 SERIAL2/TUNER2/LINK1 Chan 3 Serial Port 1 Transmit SERIAL1 Chan 4 Serial Port 2 Transmit or Link Buf 2 SERIAL2/LINK2 Chan 5 Link Buffer 3 LINK3/MODULE1 Chan 6 Link Buffer 4 LINK4/MODULE2 Chan 7 Link Buffer 5 LINK5/MODULE1HS Chan 8 Link Buffer 6 LINK6/MODULE2HS
The DMA Channel mappings for ICE-PIC3 are:
Chan 2 Link Buffer 1 TUNER-A Chan 4 Link Buffer 2 TUNER-B Chan 5 Link Buffer 3 MODULE1HS Chan 6 Link Buffer 4 MODULE2HS Chan 7 Link Buffer 5 MODULE1 Chan 8 Link Buffer 6 MODULE2
The DMA Channel mappings for ICE-MBT2 and ICE-MBT3 are:
Chan 2 Link Buffer 1 TUNER-A Chan 4 Link Buffer 2 TUNER-B Chan 5 Link Buffer 3 TUNER-C/MODULE1HS Chan 6 Link Buffer 4 TUNER-D/MODULE2HS Chan 7 Link Buffer 5 TUNER-E/MODULE1 Chan 8 Link Buffer 6 TUNER-F/MODULE2
Each tuner chip uses one of the sharc link ports for acquiring the tuner outputs. The four channels in each tuner chip must have the same decimation. Tuner channels are allocated such that odd and even channel numbers are fed by modules 1 and 2 respectively. See the allocation chart below:
TUNER-A Channels 1,3,5,7 Link Port 1 TUNER-B Channels 2,4,6,8 Link Port 2 TUNER-C Channels 9,11,13,15 Link Port 3 TUNER-D Channels 10,12,14,16 Link Port 4 TUNER-E Channels 17,19,21,23 Link Port 5 TUNER-F Channels 18,20,22,24 Link Port 6
The ICE-MBT3 can also collect wide-signals bypassing the tuner chips. Since the wideband paths and the tuners share the link ports, resource contention occurs. If the wideband transfer is < 38Mby/sec, only link ports 5 or 6 are used. If the wideband transfer is >= 38 Mby/sec, Module 1 will take link ports 5 and 3, and Module 2 will take link ports 6 and 4. This means that tuners C through F may be unusable while processing wideband simultaneously.
The DMA Channel mappings for ICE-PIC4T and ICE-MBT4 are:
Chan 5 Link Buffer 1 MODULE1 Chan 6 Link Buffer 2 MODULE2 Chan 7 Link Buffer 3 MODULE1HS Chan 8 Link Buffer 4 MODULE2HS Chan 9 Link Buffer 5 TUNER-N odd Chan 10 Link Buffer 6 TUNER-N even
There is no link port sharing between tuners and modules on the series 4 cards. All odd tuners are multiplexed through DMA channel 9 and all even channels through DMA channel 10. The data is demultiplexed by the SHARC into separate host buffers.
The DMA Channel mappings for ICE-PIC5+ Input/Output are:
Chan 1 MODULE1 Chan 2 MODULE2 Chan 3 CORE1 / TUNER1 Chan 4 CORE2 / TUNER2 Chan 5 CORE11 Chan 6 CORE12 Chan 7 CORE21 Chan 8 CORE22 Chan 9 MCORE11 / TBANK11 / TUNER1-31 Chan 10 MCORE12 / TBANK12 / TUNER2-32 Chan 11 MCORE21 / TBANK21 / TUNER33-63 Chan 12 MCORE22 / TBANK22 / TUNER34-64
Access to the ICEMBT ports is made transparent via software such that the PICDRIVER and SOURCEPIC primitives may access a port on an ICE-MBT just as they would a port on an ICE-PIC.
CHAINING - DMA Chaining Concepts
When a DMA completes (dma->todo goes to 0), the controller checks the dma->chain field. If non-zero, the DMA structure's chain related fields are replaced by the values in the DMACHAIN structure pointed to by dma->chain. The new DMA will then be processed without interrupting the input/output stream.
The DMACHAIN structure has the following fields:
haddr - the HOST buffer physical address in words hsize - the HOST buffer physical size in words todo - the number of buffers to process, or DMA_ONESHOT,DMA_CONTINUOUS,DMA_SPIN chain - pointer to the next DMACHAIN structure
The chain field for the last element in the chain must be zero. Users should use the pic_dmachain() routine to populate the chaining registers. Note that dmafunc(p,dmac,DMA_STATUS) offset values are referenced to the initial buffer start.
SHARCMEM - SHARC/PPC Memory Allocation
The controller chip on series 2 and 3 cards, has two 128kBy blocks of internal memory. The lower half is used for the sequencer logic and user programs. The upper block contains the circular buffers for DMA channels. The DMA block is divided as follows:
word addr range usage
0x28000-29FFF Module-1 0x2A000-2BFFF Module-2 0x28000-2BFFF Module-1 VHS 0x2C000-2FFFF Module-2 VHS
0x2C000-2CFFF Tuner-A (MBT2/MBT3) 0x2D000-2DFFF Tuner-B (MBT2/MBT3) 0x2E000-2EFFF Tuner-C (MBT2/MBT3) 0x2F000-2FFFF Tuner-D (MBT2/MBT3) 0x28000-28FFF Tuner-E (MBT2/MBT3) 0x2A000-2AFFF Tuner-F (MBT2/MBT3)
0x2E000-2EFFF Tuner-1 (PIC2/PIC3) 0x2F000-2FFFF Tuner-2 (PIC2/PIC3)
0x28000-28FFF Internal-1 0x29000-29FFF Internal-2 0x2A000-2AFFF Internal-3 0x2B000-2BFFF Internal-4 0x2C000-2CFFF Internal-5 0x2D000-2DFFF Internal-6 0x2E000-2EFFF Internal-7 0x2F000-2FFFF Internal-8
The SHARC controller chip on series 4 cards, has two 256kBy blocks of internal memory. The lower half is used for the sequencer logic and user programs. The upper block contains the circular buffers for DMA channels. The DMA block is divided as follows:
word addr range usage
0x48000-49FFF Module-1 0x4A000-4BFFF Module-2 0x48000-4BFFF Module-1 VHS 0x4C000-4FFFF Module-2 VHS
0x4C000-4DFFF Tuner-A (PIC4/MBT4) 0x4E000-4FFFF Tuner-B (PIC4/MBT4)
0x48000-48FFF Internal-1 0x49000-49FFF Internal-2 0x4A000-4AFFF Internal-3 0x4B000-4BFFF Internal-4 0x4C000-4CFFF Internal-5 0x4D000-4DFFF Internal-6 0x4E000-4EFFF Internal-7 0x4F000-4FFFF Internal-8
0x50000-51FFF External-1 or ITDEC Channel-1 0x52000-53FFF External-2 or ITDEC Channel-2 0x54000-55FFF External-3 ... 0x56000-57FFF External-4 0x58000-59FFF External-5 0x5A000-5BFFF External-6 0x5C000-5DFFF External-7 0x5E000-5FFFF External-8
Note that some of the memory buffers overlap and cannot be used simultaneously. Currently no internal checks are made to notify users of overlap.
MIDASDSM - Connecting to an STL Digital Switch Matrix
The MIDAS suite of hardware usually consists of a Digital Switch Matrix from Signal Technologies Laboratories. The 16+clock bit digital signals are brought in/out of the switch matrix on a 36 strand twisted pair ribbon cable. These cables connect to a switch "transition panel" usually at the back of the equipment rack. The SMS or SDN cables attach to the opposite side of the transition panel.
A transition panel will also exist near the ICEPIC's computer. This panel has a 40 pin interface and is available through ICE or STL. High density ribbon cables that attach the ICEPIC to the panel are available through ICE.
A diagram of the connectors is posted on the www.ice-online.com website.
CLOCKING - Clock sources and selection
Most IO Modules provide their own clock, either derived from the data or from an external source.
The two IO Module sites on ICE cards can operate independently or from a global muxed clock. The global clock is necessary when:
- Multiplexing data from the A and B ports
- VeryHighSpeed mode when the resources from both ports are bridged
- Synchronizing sampling clocks to both modules
- Driving a module without its own clock source (i.e. D2E,D2T)
The IOC code _II is for 2 independent input modules, each with their own clock. The IOC code _IIX is for 2 inputs with a global muxed clock. The IOC code _IO or _OI is for 1 input and 1 output. The input gets its
clock from the module, the output from the global muxed clock.
The IOC code _OO is for 2 outputs with a global muxed clock.
To set the source for the global muxed clock, add the MUXCLK=s flag to the card configuration string handed to the pic_open() call.
There 7 possible sources for the muxed clock signal:
s=N No MUXCLK s=I Internal clock = 40MHz/N where (N=1,1024) s=X External clock SMB on series 3/4 card edge s=A Module A input clock (or s=1) s=B Module B input clock (or s=2) s=C Alternate Crystal CCLK on series 3/4 cards s=D Alternate Crystal CCLK/N where (N=1,16) s=P Programmable Clock on series 4 cards (.1 to 105 MHz) s=PX Programmable Clock using the external reference (PREFX)
When using the global clock, the CLKI flag can be used to invert the clock. The DEGLITCH flag will run the A and B sources through a deglitching circuit.
PLATFORMS - Notes on specific platforms
Compaq ES40 Server PCI slot configuration
TOP BUS0 SLOT7 5V/64b/33MHz BUS0 SLOT8 5V/64b/33MHz BUS0 SLOT9 5V/64b/33MHz BUS0 SLOT10 5V/64b/33MHz BUS1 SLOT1 5V/64b/33MHz BUS1 SLOT2 5V/64b/33MHz BUS1 SLOT3 5V/64b/33MHz BUS1 SLOT4 5V/64b/33MHz BUS1 SLOT5 5V/64b/33MHz BOT BUS1 SLOT6 5V/64b/33MHz
Compaq ES45 Server PCI slot configuration
TOP HOSE2 SLOT7 3V/64b/66MHz HOSE2 SLOT8 3V/64b/66MHz HOSE0 SLOT4 5V/64b/33MHz HOSE3 SLOT10 3V/64b/66MHz HOSE3 SLOT9 3V/64b/66MHz HOSE0 SLOT3 5V/64b/33MHz HOSE1 SLOT6 3V/64b/66MHz HOSE1 SLOT5 3V/64b/66MHz HOSE0 SLOT2 5V/64b/33MHz BOT HOSE0 SLOT1 5V/64b/33MHz