



#### Trigger and Data Acquisition at colliders

#### G. De Lentdecker Université Libre de Bruxelles

# **DAQ ARCHITECTURE**



- Sow we will make a break in the data flow:
  - We have to introduce a couple of concepts about DAQ architecture and trigger before resuming with the data flow

#### **Basic DAQ – periodic trigger**



#### **Basic DAQ: real trigger**

A "trigger" is a system that rapidly decides, based on simple criteria, if an interesting event took place and to initiate the data acquisition. Events are asynchronous and unpredictable



#### **Basic DAQ: real trigger**

- Solution  $\mathbf{S} \in \mathbf{E}$  and  $\mathbf{E}$  is the second se
  - Stochastic process





To avoid this unpredictable state, a busy logic is introduced:



#### **Dead-time**

- f = average frequency of occurrence of the physics process
- $\mathbf{P} \mathbf{v}$  = average frequency at which the system can acquire events
- τ = time to process one event, without being able to handle
   other triggers
- $\rightarrow v\tau =$  fraction of time the DAQ is busy; (1- $v\tau$ )=DAQ is free



#### **Reducing the deadtime**



#### **Simple DAQ in collider mode**



#### **Simple DAQ at LHC**





## Why a trigger ?

- Today a typical PC runs at several GHz, so couldn't we just use the LHC Clock (40 MHz) to trigger the DAQ and use a large number of (relatively) cheap PC for data processing ?
- The answer is no, of course:
  - No (affordable) DAQ system could read out O(10<sup>7</sup>) channels at 40 MHz → 600 TBit/s to read out – even assuming binary channels!
  - In addition most of these millions of events per second are totally uninteresting: one Higgs event every O(100) seconds
- $\mathbf{P} \rightarrow \mathbf{W}$  we need event selection, that is a "trigger"
- Hence new questions arise:
  - What trigger rate can we afford ?
  - How do we select the data ?
  - How do we make sure that the values from the many different channels refer to the same original event (collision)?



Total collision rate ~ 1 GHz
 @ L = 10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup> and 14 TeV

#### "Interesting" physics is about 6-8 orders of magnitude smaller

New physics is 9 orders of magnitude smaller

We "just" need to identify and select these rare processes from the overwhelming background before

reading out and storing the whole event.



- Usually the reduction of the data rate to a reasonable rate of O(100 Hz) that can be archived cannot be achieved in one step
- Therefore the trigger is split in several levels (usually called level-1, 2, 3,...). Each level performs a reconstruction of the data before applying a selection.
- Subsequent trigger level reduces the data rate → subsequent trigger levels have more time to perform the data processing.
- Usually the first level (LvI-1) is hardwired
  - Short latency: a few µs.
  - In HEP LvI-1 usually uses data from calorimeters and muon detectors
- The second level (Lvl-2)
  - Processor based (standard CPU's or dedicated custom/FPGA processing)
  - Latency : a few ms.
- - The reconstruction program is run in parallel on a large number of CPUs O(100-1000).
  - Having more time (~1s) the algorithms can be more complex and can use all the detector data, including the tracker data.

### **Multilevel trigger**



#### 

- latency ~ 3µs
- O(10<sup>7</sup>) channels

Front end pipelines > Work with local information to avoid many interconnections

#### Must be fast and robust

- Look for simple signatures
- implemented in hardware
- Use Calo and muon data

### **Fortunately Mother Nature is kind**

- p-p collisions mostly produce low p<sub>t</sub> particles ~1 GeV/c
- Physicists are interested in heavy particles (W, Z, Top, Higgs, Z', W', sparticles,...) often decaying in high pt leptons or jets (p<sub>t</sub> > 25 GeV/c)
- Therefore Level1 triggers will often look for such high p<sub>t</sub> leptons and jets (as well as large MET):



## **ATLAS Level1 Calo Trigger**

- 7000 trigger towers (EM+HA) 0.1x0.1 in eta, phi space
- On FEE, analogue signals of individual cells are summed to form tower signal
- Signal transmitted to Pre-processor (PPM): digitization and E<sub>t</sub> estimate at 40 MHz
- Tower data are transmitted to cluster processor (CPM); there are 4 crates
- e/γ processing on large FPGA
- e/γ candidates sent to L1 central trigger processor 1.5 µs after bunch crossing
- Note the remaining L1 latency is due to signal transmission !
- After Level1 has made his decision, this information has to be sent back to all the FEE



#### **Level1 Trigger distribution**



Now that the magic box tells for each Bunch Crossing if the event has to be kept or rejected, this information has to be sent back to the FEE so that they can send or discard their data.

Synchronisation at the output of the pipeline is not trivial => needs timing calibration

### **Level1 Timing Issues**

- Need to synchronize all detectors components to better than 25ns
  - The system need a reference clock
- Sut for some detectors, signal development/time of flight > 25 ns
  - Integrates more than one bunch crossing information
  - Particles travel 7.5 m in 25 ns
  - Need to identify each probably crossing



### Signal path during L1 Trigger



### LHC Clock

- - The Timing, Trigger and Control (TTC)
- TTC is based on the LHC Clock
  - The clock is extracted by monitoring the passage of the proton bunches
- It is sent to millions of components in ALICE, ATLAS, CMS & LHCb (by optical fibers)
- General Sector Sect
- Before starting to data taking, all the components had to be aligned in time
- System has to be robust :
  - A loss of synchronization may be very difficult to be detected...

#### Where are we so far ?

- Event-data are now digitized, pre-processed and tagged with a unique, monotonically increasing number
- The event data are distributed over many electronics boards ("sources")
- For the next stage of selection, or even simply to write it to tape we have to get the pieces together
  - $\rightarrow$  Need and Event Builder (EVB)
- But don't forget:
  - At LHC the Level1 will typically accept 1/10000 events
    - Output L1 rate is still very high ~100kHz
  - Amount of data is still very large:
    - Typical event size ~1Mbyte (after zero suppression or data compression)
- → need a bandwidth of 100 Gbytes/s (=800 Gbits/s) to extract the data from the Front End.

#### **Event Builder Basic's**





## **Timing synchronization**

#### Example : CMS CSC chambers

- Perform Timing scan:
  - Scan a range of timing delays
  - Find best alignment:
    - highest % of Anode Trigger
       Primitives





- Let's do a parenthesis here on common buses used in HEP, because even old ones are still very popular in our labs...
- - A bus is a common electrical connection between several electrical devices
  - A bus can either allow signals to be transferred between devices, the summing (mixing) of output signals from the devices or the distribution of input signals or power amongst the devices.
  - A bus often takes the form of a wire or printed circuit conductor that terminate at multiple connectors which allows devices to be plugged into the bus
  - A common example is the PCI bus in PCs:



### **Buses in HEP (example VME)**



- Buy or build readout boards with some inputs
- Put many of these multiport modules together in a common chassis or crate
- The modules need
  - Mechanical support
  - Power
  - A standardized way to access their data (our measurement values)
- All this is provided by standards for (readout) electronics such as VME (IEEE 1014)

#### **VME Example:**

- - mechanical standard
  - electrical standard for power on the backplane
  - signal and protocol standard for communication on the bus



#### **Some famous buses in HEP**



NIM, 1964



CAMAC, 1969



VME 6U, 1981



FASTBUS, 1986



VME 9U, 1994



CompactPCI, 1995

### **Bus communication in a VME crate**

- A bus connects two or more devices and allows them to communicate
- The bus is shared between all devices on the bus → arbitration
   is required
- Devices can be masters or slaves (some can be both)
- Devices can be uniquely identified ("addressed") on the bus



#### The VME bus



- In a VME crate we can find three main types of modules
  - The controller which monitors and arbitrates the bus
  - Masters read data from and write data to slaves
  - Slaves send data to and receive data from masters

#### Addressing of modules

- In VME each module occupies a part of a (flat) range of addresses (24 bit to 32 bit)
- Address range of modules is hardwired (conflicts!)

#### **VME** specifications

Mandatory bus

Extension bus



- <23:0> 24-bit Address Bus
- 9 lines for Arbitration Bus
- Clock, Control, Status signals
- Power: +5V, +12 V, -12V
- GND: 5 lines



- <31:16> 16-bit Data Bus Extension –
   <31:24> 8-bit Address Bus Extension
- Additional power lines and GND lines

#### **VME Bus at Work**



- Arbitration: Master asserts<sup>(\*)</sup> BR# (Bus Request), Controller answers by asserting BG# (Bus Grant)
- If there are several masters requesting at the same time the one physically closest to the controller wins
- The winning master drives BBSY\* high to indicate that the bus is now in use

(\*) assert means driving the line to logical level 0 (VME control lines are inverted or active-low)

#### VME data transfer

- The Master writes data and address to the data / address bus respectively
- It asserts the lines DS\* (DATA Strobe) and AS\* (ADDRESS Strobe –not shown) to signal that the data and address are valid
- The slave reads and acknowledges by asserting DTACK\* (DATA Transfer Acknowledge)
- The master releases DS\*, AS\* and BSBSY\*, the cycle is complete
- Note: there is no clock! The slave can respond whenever it wants.
  VME is an asynchronous bus



#### Data Transfer



- Theoretically ~ 16 MB/s can be achieved
  - assuming the databus to be full 32-bit wide
- Better performance by using block-transfer
  - After an address cycle several (up to 256) data cycles are performed
  - Performance goes up to 40 MB/s
- Since 1994 VME64 (64bits)
  - use the address bus also for data
  - Twice the bandwidth of VME (up to 80 MB/s)
- More recently VME64x
  - New 160-pin 5-row connector
  - Twice the bandwidth of VME64 (up to 160 MB/s)

#### **Disadvantages of parallel buses**

- A bus is shared between all devices (each new active device slows everybody down)
  - Bus-width can only be increased up to a certain point (128 bit for PCsystem bus)
  - Bus-frequency (number of elementary operations per second) can be increased, but decreases the physical bus-length
- Nbr of devices and physical bus-length is limited (scalability!)
  - For synchronous high-speed buses, physical length is correlated with the number of devices (e.g. PCI)
  - Typical buses have a lot of control, data and address lines (look at a SCSI or ATA cable)
- Buses are typically useful for systems < 1 GB/s</p>

#### **Network based DAQ**

- Network technology solves the scalability issues of buses
  - In a network all devices are equal ("peers")
  - In a network all devices communicate directly with each other
    - no arbitration necessary
    - bandwidth guaranteed
  - data and control use the same path
    - much fewer lines (e.g. in traditional Ethernet only two)
  - At the signaling level buses tend to use parallel copper lines. Network technologies can be also optical, wire-less and are typically (differential) serial

#### **LHC Front-End examples:**

CMS FE model:

