From: hadaq Date: Sun, 26 Jun 2011 15:12:26 +0000 (+0000) Subject: EB docu extended. Sergey X-Git-Url: https://jspc29.x-matter.uni-frankfurt.de/git/?a=commitdiff_plain;h=9b4e1cbc8c793e4dae2e4b7b854a8f410e682bf5;p=daqdocu.git EB docu extended. Sergey --- diff --git a/evtbuild.tex b/evtbuild.tex index c9917fe..96e02bd 100644 --- a/evtbuild.tex +++ b/evtbuild.tex @@ -1,6 +1,6 @@ \section{Event Building} -The four servers lxhadeb01, lxhadeb02, lxhadeb03, lxhadeb04 are the HADES Event Builders. +The four servers lxhadeb01, lxhadeb02, lxhadeb03, lxhadeb04 are the HADES Event Builders plus lxhadeb05 which is a spare server. The Event Builder (EB) is aimed to receive subevents from subsystems and to build a complete event out of them. EB consists of two parts: daq\_netmem (the receiving part) and daq\_evtbuild (the building part). A communication between daq\_netmem and daq\_evtbuild is done via a shared memory. Number of opened in a shared memory buffers (queues) is equal to a number of subsystems. A completed event can be written to different mass storage systems. @@ -137,6 +137,8 @@ The script reads several configuration files. \subsection{Monitoring Event Builders} +There is a special script which can monitor an activity on the open EB's ports. Usually event builder writes a message to the log file that given port did not receive any data: \textbf{Jun 13 15:27:38 lxhadeb02p DAQ: NETMEM-2 daq\_netmem: source 4, port 50006: no data received}. In addition to this you can check yourself all the ports on (for example) lxhadeb02 by executing \textbf{/home/hadaq/bin/scan\_active\_ports.pl -e 2}. The script will read /tmp/eb2\_192.168.100.12.txt with all the ports and report the actual port number, the IP of the sender as well as the sender's port number. The file /tmp/eb2\_192.168.100.12.txt is copied to EB server by the /home/hadaq/trbsoft/daq/evtbuild/start\_eb\_gbe.pl during the last EB startup. + As has been already said the monitoring of EBs is based on the IOC processes running on each EB server. \\Usually the monitoring is already running at vncserver lxhadesdaq:3. Before starting the monitoring one should set two environment variables: \\\verb!export EPICS_CA_ADDR_LIST=192.168.101.255! @@ -144,25 +146,601 @@ As has been already said the monitoring of EBs is based on the IOC processes run \\The monitoring can be started by executing: \\\verb!lxhadesdaq:/home/scs/Desktop/DAQ/EB_Monitor.desktop! - \subsection{Monitoring Event Builder's Logs} -To learn more details about the running Event Builders on can look at -the output of the monitoring stript which scans the log files of all the -EBs. -\\For example the log file of EB 1 running on the lxhadeb01 can be found on -\\\verb!lxhadesdaq:/home/hadaq/oper/oper_1/eb1_log.txt!. -\\The script can be started by executing: -\\\verb!lxhadesdaq:/home/hadaq/Desktop/DAQ/EBLog_Watch.desktop! -\\Usually you can find the output of this script at vncserver lxhadesdaq:3. +To centralize and simplify DAQ log monitoring, we have configured syslog system to monitor many DAQ processes, TRBs, EBs. There is a central syslog server on lxhadesdaq. There two different (in terms of configuration) syslog daemons on our machines: 'old' syslogd and new extended syslog-ng. All log messages arrive to a file lxhadesdaq:/home/hadeslog/messages. There are also two parsers which read the log file, filter the messages related to DAQ system and to EB system and provide formatted colored output to the user. See fig.~\ref{fig:ebsyslog}. + +\begin{figure} + \centering + \includegraphics[width=0.7\textwidth]{eb_syslog_2.png} + \caption[EB system]{Central logging} + \label{fig:ebsyslog} +\end{figure} + +Here is the sequence of the calls: +\begin{itemize} +\item trbsoft/daq/control/monitor/LoggerWatch -> /home/hadaq/bin/log\_watch\_eb +\begin{itemize} +\item tail -n 300 --follow=name /home/hadeslog/messages | /home/hadaq/tools/colorizelog.pl --or NETMEM --or EVTBLD +\end{itemize} +\item trbsoft/daq/control/monitor/EBLogWatch -> /home/hadaq/bin/log\_watch +\begin{itemize} +\item tail -n 1000 --follow=name /home/hadeslog/messages | /home/hadaq/tools/colorizelog.pl --exclude NETMEM --exclude EVTBLD +\end{itemize} +\end{itemize} + +\subsection{Event Building compiling quide} + +\textbf{hadaq} module needs \textbf{allParam} and \textbf{compat} modules. + +Get from CVS all the modules: +\begin{itemize} +\item cvs -d :ext:hadaq@lxi027.gsi.de:/misc/hadesprojects/daq/cvsroot checkout hadaq +\item cvs -d :ext:hadaq@lxi027.gsi.de:/misc/hadesprojects/daq/cvsroot checkout allParam +\item cvs -d :ext:hadaq@lxi027.gsi.de:/misc/hadesprojects/daq/cvsroot checkout compat +\end{itemize} + +Let's assume that the base directory where all the modules are located is /home/hadaq/daqsoftware/. + +RFIO is not in the automake since this feature is rarely needed (during beamtime), thus configure will create Makefile without RFIO libs. To compile with RFIO do the following after running configure: +\begin{itemize} +\item In evtbuild.c uncomment: \#define RFIO +\item Check if rawapin.h, rawcommn.h, rawclin.h are in "include" dir +\item Add two libs to Makefile: LIBS = ... -lrawapiclin -lrawservn (or -lrawapiclin64 -lrawservn64) +\end{itemize} + +Compile for 32-bit platform: +\begin{itemize} +\item \textbf{Configure allParam:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/i686-pc-linux-gnu/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build allParam and install:} make; make install +\item \textbf{Configure compat:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/i686-pc-linux-gnu/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build compat and install:} make; make install +\item \textbf{Configure hadaq:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/i686-pc-linux-gnu/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build hadaq and install:} make; make install +\end{itemize} + +Compile for 64-bit platform: +\begin{itemize} +\item \textbf{Configure allParam:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build allParam and install:} make; make install +\item \textbf{Configure compat:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build compat and install:} make; make install +\item \textbf{Configure hadaq:} CPPFLAGS="-I/home/hadaq/daqsoftware/include" LDFLAGS="-L/home/hadaq/daqsoftware/lib" ./configure --prefix=/home/hadaq/daqsoftware +\item \textbf{Build hadaq and install:} make; make install +\end{itemize} + +Possible problems during configuration: +\begin{itemize} +\item Error message: \textbf{checking for library containing conParam... no configure: error: Parameter library not found} +\begin{itemize} +\item None out of tcl tcl8.3 tcl8.2 tcl8.0 tcl7.4 libs specified in configure.in was found. Fast fix: ln -s /usr/lib/libtcl8.4.so /usr/lib/libtcl.so (put whatever version you have instead of libtcl8.4.so). +\item Watch out where your tcl includes are. If necessary add: -I/usr/include/tcl8.4 +\end{itemize} +\end{itemize} + +\subsection{Event Building troubleshooting} + +\begin{itemize} +\item Error message: \textbf{lxhadeb04 EB 4 daq\_evtbuild: evtbuild.c, 558: fopen: failed to open file /data04/data: Input/output error} +\begin{itemize} +\item Reason: most likely /data04 hard disks is broken +\item Solution: restart ./daq\_disks --exclude 4 on lxhadeb04 to exclude /data04 and include MULTIDISK: 5 in lxhadesdaq:/home/hadaq/trbsoft/daq/evtbuild/eb.conf for the corresponding Event builder to make sure that the first hld file will be written to /data05 and not to /data04 (instead of 5 it can be any other number of disk but 4) +\end{itemize} +\item Error message: \textbf{netmem.c, 645: NetTrans\_create: failed to create UDP:0.0.0.0:50534: Address already in use} +\begin{itemize} +\item Reason: port 50534 is already used by other application (most likely by EPICS IOC) +\item Debug: lsof -i | grep 50534 => ebctrl 25724 scs 3u IPv4 8662658 UDP *:50534 +\item Solution: Close all EBs, close all IOCs, start EBs, start IOCs, close EBs, start EBs. By doing this sequence we start IOCs after EBs because IOCs are able to dynamically pick up unused UDP ports. Then we restart again EBs because for a proper start they need running IOCs. +\end{itemize} +\item Error message: \textbf{No space left on device} +\begin{itemize} +\item Reason: This error occurs when the event builder application tries to open more than 128 sets of semaphores (when the standard setting is kernel.sem="250 32000 32 128"): +\begin{itemize} +\item 250 - SEMMSL - The maximum number of semaphores in a sempahore set +\item 32000 - SEMMNS - The maximum number of sempahores in the system +\item 32 - SEMOPM - The maximum number of operations in a single semop call +\item 128 - SEMMNI - The maximum number of sempahore sets (128 sets mean 64 shared memory segments since two semaphore sets are required per memory segment. In this case, daq\_evtbuild -m 65 will lead to an error) +\end{itemize} +\item Solution: sysctl -w kernel.sem="250 128000 32 512" +\end{itemize} +\item Error message: \textbf{File exists} +\begin{itemize} +\item Reason: This error occurs when semaphores remained from previous execution of daq\_evtbuild are not properly cleaned. +\item Solution: Use ipcrm -s semid (or /home/hadaq/bin/ipcrm.pl). +\end{itemize} +\item Warning message: \textbf{UDP receive buffer length smaller than requested buffer length} +\begin{itemize} +\item Reason: requested UDP socket buffer length is larger than allowed by default kernel settings. +\item Solution under 'root': sysctl -w net.core.rmem\_max=10485760 +\end{itemize} +\end{itemize} + +\subsection{Event Building software} + +In the current HADES setup the Event Building system is a set of 16 processes distributed over 4 servers. +As shown in \ref{fig:ebproc} the data is received by the Receiver (daq\_netmem) and placed in double buffer. +There is a separate shared memory segment (double buffer) for each incoming data stream. +The shared memory segments are openned by daq\_evtbuild therefore daq\_evtbuild should be started first. Each shared memory segment is controlled by ShmTrans structure which contains pointers to HadTuQueue sctructures. Each HadTuQueue structure controls a part of a double buffer for writing/reading. All the data coming to daq\_netmem are packed according to HadTuQueue format with a header consisting of a size and decoding. +Therefore the buffer consists of HadTuQueue which contains the HadTuQueues containing the data (subevents), see fig.~\ref{fig:ebstruct} and fig.~\ref{fig:ebqueue}. At the request from the Builder (daq\_evtbuild) the pointers to the double buffer are swapped and the Builder can read the new data from the buffer, build the events and write the events to the hld file or send the events to the Data Movers via RFIO mechanism. This is how it works in short. + + +\begin{figure} + \centering + \includegraphics[width=0.7\textwidth]{eb_proc.png} + \caption[EB system]{EB process} + \label{fig:ebproc} +\end{figure} + + +Before we move to the details let's take a look at the structure of the most important source code: +\begin{itemize} +\item evtbuild.c - main body for daq\_evtbuild. All the main high level function calls and the logic of the event building are done here. +\item netmem.c - main body for daq\_netmem. All the main high level function calls for the data receiving are done here. +\item args.h/c - structure and functions for parsing the arguments for daq\_evtbuild. +\item hadtuqueue.h/c - structure and functions to manipulate HadTUQueue (Hades Transport Unit Queue). This queue is used to transport subevents. + \begin{itemize} + \item conHadTuQueue(HadTuQueue *my, void *mem, size\_t size) - Constract hadTuQueue to control the buffer which will begin at a memory address where mem pointer points to. + \item conHadTuQueue\_voidP(HadTuQueue *my, void *mem) - Constract hadTuQueue for reading the buffer (called by daq\_evtbuild). As the queue itself was already created before, the header of the queue is read and the corresponding data of the hadTuQueue scructure are set. + \item HadTuQueue\_push(HadTuQueue *my) - Used for writing to the buffer. Move run pointer to point at the free memory after the last element of the queue and update the size of the queue. + \item HadTuQueue\_pop(HadTuQueue *my) - Used for reading the buffer. Move run pointer to point at the next element of the queue. + \item HadTuQueue\_empty(HadTuQueue *my) - Check if the run pointer reached the end of the queue. + \item void *HadTuQueue\_front(HadTuQueue *my) - Get pointer to the subevent inside the queue. In fact, it returns run pointer if queue is not empty and NULL otherwise. + \end{itemize} +\item hadtu.h/c - low level functions to manipulate events and queues. +\item evt.h/c - functions to manipulate events. +\item subevt.h/c - functions to manipulate subevets. +\item nettrans.h/c - structure and functions for the data transfer over network. + \begin{itemize} + \item NetTrans *NetTrans\_create(char *name, size\_t bandwidth, Worker *worker) - Open UDP ports and prepare for data receiving. + \item NetTrans\_multiReceive(NetTrans *my[], fd\_set * fdSet, int nrOfMsgs) - Identify which descriptors are ready for reading. + \item assembleMsg(NetTrans *my, void *hadTu, size\_t size) - Copy the data to the buffers. Perform subevent reassembling if the subevent was fragmented by a sender. + \item int openUdp(NetTrans *my, unsigned long addr, int port, int fl) - this is an internal function which has an important setting on MTU (Maximum Transmition Unit) size: my->mtuSize = 63 * 1024 + \end{itemize} +\item shmtrans.h/c - structure and functions for the data transfer over shared memory. + \begin{itemize} + \item ShmTrans* ShmTrans\_create(char *name, size\_t size) - Create shared memory segment with a name and a size and get a pointer to it. + \item ShmTrans* ShmTrans\_open(char *name, size\_t size) - Get a pointer to an existing shared memory with name and size. + \item ShmTrans\_recv(ShmTrans *shmem) - Get a pointer to the first element of the hadTuQueue in the buffer. If we run through the hole buffer already, the switch of buffers is requested. + \item ShmTrans\_send(ShmTrans *shmem) - Increment the run pointer to point at a free memory in the buffer after the last copied message. + \item ShmTrans\_requestSpace(ShmTrans *shmem) - Here we switch the buffers (wrQueue and rdQueue pointers to buffers, see \ref{fig:ebstruct}) if it was requested. + \item ShmTrans\_tryAlloc(ShmTrans *shmem, size\_t size) - Get a pointer to a free memory after a last inserted message. If the space left is less than size, return NULL. + \item ShmTrans\_free(ShmTrans *shmem) - Move run pointer of hadTuQueue structure (which rdQueue pointer points to) to the beginning of the next internal hadTuQueue. + \end{itemize} +\item psxshm.h/c - structure and low level functions to manipulate shared memory. +\item worker.h/c - structure and functions for writing/reading statistics to/from shared memory. +\item stats.h/c - structure and functions for additional statistics gathering. +\item debug.h/c - structure and functions for debugging. +\item readout.c - main body for the readout (used on VME CPUs, now obsolete for the data taking but can be used for the readout emulation). +\item libhadaq.a - library which includes readout.c, worker.c, evt.c, subevt.c, shmtrans.c, hadtuqueue.c, psxshm.c, hadtu.c +\end{itemize} + +\begin{figure} + \centering + \includegraphics[width=1\textwidth]{ebqueue_crop.png} + \caption[EB system]{Shared memory and Hades Transport Unit Queue} + \label{fig:ebqueue} +\end{figure} + +\begin{figure} + \centering + \includegraphics[width=1\textwidth]{eb_shmtrans_struct.png} + \caption[EB system]{ShmTrans and HadTuQueue structures} + \label{fig:ebstruct} +\end{figure} + +\textbf{Let's now see the main steps for daq\_netmem in more detail:} +\begin{itemize} +\item First we open UDP ports and do all the necesary preparations for receiving the data from the network (for each incoming data stream): + \begin{itemize} + \item NetTrans\_create(); - Here we also initialize statistics of packets and messages for daq\_netmem + \begin{itemize} + \item openGeneric() -> openUdp(); + \begin{itemize} + \item rcvBufLenReq = 1 * (1 $<<$ 20); - Requested UDP socket buffer length, 1MB is quite enough. + \item setsockopt(... \&rcvBufLenReq ...) + \item getsockopt(... \&rcvBufLenRet ...); - In case rcvBufLenRet is less than rcvBufLenReq you will get a warning: \textbf{UDP receive buffer length smaller than requested buffer length}. To fix it you have to execute under 'root': \textbf{sysctl -w net.core.rmem\_max=10485760} + \item bind() + \item my->mtuSize = 63 * 1024; - This is an important number which defines Maximum Transfer Unit size for incoming UDP packets. + \end{itemize} + \end{itemize} + \end{itemize} +\item And then we open shared memory segments for each incoming data stream: + \begin{itemize} + \item ShmTrans\_open(); + \begin{itemize} + \item PsxShm\_open(... O\_RDWR ...); - Get the pointer to already created by daq\_evtbuild shared memory, allocate neccessary memory for structures. + \item conHadTuQueue(my->wrQueue ...); - Constract hadTuQueue structure to control writing to buffer + \item conHadTuQueue(my->rdQueue ...); - Constract hadTuQueue structure to control reading from buffer + \item sem\_open(); - Get semaphores created by daq\_evtbuild. + \end{itemize} + \end{itemize} +\item hadTuSize[i] = 204800; - Since we do not know in advance how large is the incoming data we have to allocate big enough portion of memory. Here we assume that the incoming 'hadTuQueue' is smaller than 200kB. +\item while(1) - now we start endless while loop +\item ShmTrans\_requestSpace() -> switchStorage() - Swap pointers to the read/write parts of double buffer if there was a request from daq\_evtbuild. +\item ShmTrans\_tryAlloc() -> HadTuQueue\_alloc(my->wrQueue ...); - Try to allocate memory in the write part of double buffer for the next incoming 'hadTuQueue'. Return NULL pointer if there is not enough space left. +\item NetTrans\_multiReceive() -> select(); - Find which descriptors are ready for reading and return fd\_set*. +\item assembleMsg() + \begin{itemize} + \item recvGeneric() -> recvfrom(); - Receive the data from the ready descriptor and place the data to shared memory. + \item Reassemple the subevents which were fragmented by the sender (if the subevent was too big (above water mark) and did not fit the UDP packet). + \end{itemize} +\item ShmTrans\_send() -> HadTuQueue\_push(my->wrQueue); - Increment the run pointer to point at the free space after the recently copied data. +\item End of while(1) +\end{itemize} + +\textbf{Let's now see the main steps for daq\_evtbuild in more detail:} +\begin{itemize} +\item ShmTrans\_create(); - Create shared memory segments for each incoming data stream. + \begin{itemize} + \item PsxShm\_unlink() + \item PsxShm\_open(... O\_CREAT | O\_RDWR ...) + \begin{itemize} + \item shm\_open(); - Establish a connection between a shared memory object and a file descriptor. + \item mmap(); - Map into memory. + \end{itemize} + \item conHadTuQueue(my->wrQueue ...); - Constract hadTuQueue structure to control writing to buffer + \item conHadTuQueue(my->rdQueue ...); - Constract hadTuQueue structure to control reading from buffer + \item sem\_open(... O\_CREAT | O\_EXCL ...); - Create semaphores to control the access to shared memory. + \end{itemize} +\item Worker\_addStatistic(); - Add statistics for monitoring EBs. +\item rfio\_openConnection(); - Open connection to the Data Mover. +\item while(); - Start endless while loop; +\item if(theArgs->epicsCtrl){ runNr = getRunId(theArgs); } - In case of parallel event building when RUN ID is generated and distributed by Epics IOC, the RUN ID (runNr) is read from shared memory (daq\_evtbuild1.shm) by getRunId()->Worker\_getStatistic(). +\item runNr = tv.tv\_sec - TIMEOFFSET; - The runNr is calculated as seconds from Epoch - 1200000000. +\item openFile(); - open a file on a local disk. + \begin{itemize} + \item changeDisk()->Worker\_getStatistic(); - In case of theArgs->multiDisks argument, the disk number is read from the shared memory (daq\_evtbuild1.shm). There is a special daemon running on each server (daq\_disks -s 10). This daemon scans all the data disks, identifies the less filled disks and writes their numbers to the shared memory. + \end{itemize} +\item storeRunInfoStart(); - Store RUN start time to a file. This information is passed to Oracle DB. +\item evt = newEvt(); - Allocate memory and setup the header for a new event. +\item Loop over all buffers +\item ShmTrans\_recv(); - Get a pointer to the internal hadTuQueue to be read. + \begin{itemize} + \item HadTuQueue\_front(my->rdQueue); - Get a pointer to the next internal hadTuQueue to be read or return NULL if end of queue/buffer is reached. + \item switchStorage(); - If we reached the end of the buffer for reading (end of external hadTuQueue) we request a double buffer switch (swap of pointers to read/write parts). + \end{itemize} +\item conHadTuQueue\_voidP(); - constract hadTuQueue structure to manipulate the internal hadTuQueue/buffer. +\item subEvt = HadTuQueue\_front(hadTuQueue[i]); - get a pointer to a subevent from the internal hadTuQueue/buffer. +\item currId = SubEvt\_trigType(); - Get trigger type from the subevent of the first data source. The event builder startup script ensures that the first data source is always CTS. +\item currId = currId | (DAQVERSION $<<$ 12); - Add to the event ID the DAQ version number needed by unpackers. +\item if(trigNr == currTrigNr) {evt = Evt\_appendSubEvt(evt, subEvt)} - Append subevent to the event. +\item writeFile(theArgs, evt); - Write event to a file. +\item newRunId = getRunId(); - Get RUN ID from shared memory (daq\_evtbuild1.shm). The new RUN ID is generated by master IOC. +\item writeFile(); closeFile(); - Close the file in the following cases: + \begin{itemize} + \item if(!(theArgs->epicsCtrl) \&\& (*theStats->bytesWritten) >= theArgs->maxFileSz)) - Single event builder without IOC control AND maximum file size is reached. + \item if(theArgs->epicsCtrl \&\& runNr < newRunId) - IOC controlled event builder AND new RUN ID was generated by master IOC. + \item if((theArgs->epicsCtrl \&\& (*theStats->bytesWritten) >= 1900000000) - IOC controlled event builder AND the maximum possible file size was reached. This should not happen as master IOC generates new RUN ID when the file size reaches 1.5~GB. + \item if(theArgs->epicsCtrl \&\& newRunId == 0) - IOC controlled event builder AND RUN ID became zero due to whatever reason. This is another special case which should not happen. + \end{itemize} +\item End of while loop. +\end{itemize} + +\subsection{Event Building control by EPICS IOC} + +Offline analysis places the following demands: +\begin{itemize} +\item Synchronization of hld files: all EB processes should open and close hld files at the same time (jitter of a couple of seconds is allowed). +\item All the hld files collected in parallel must have the same RUN ID. +\end{itemize} + +These two requirements are realized by means of the EPICS system. In addition to 16 EB processes there are 16 IOCs running on the servers. The IOCs communicate with daq\_evtbuild and daq\_netmem via /dev/shm/daq\_evtbuild1.shm and /dev/shm/daq\_netmem1.shm shared memory segments respectively. I have used master/slave model. The first IOC is the master IOC. The master IOC generates the new RUN ID when the hld file size reaches 1.5~GB. This new RUN ID is distributed to all Event Builders via corresponding shared memory segments. When all event builders get new RUN ID they close hld files and open new hld files. See fig.~\ref{fig:ebsys}. + +\begin{figure} + \centering + \includegraphics[width=1\textwidth]{eb_ioc.png} + \caption[EB system]{EB system} + \label{fig:ebsys} +\end{figure} + +The IOC source code and adl files for MEDM monitoring can be checked out from CVS: +\begin{itemize} +\item cvs -d :ext:hadaq@lxi027.gsi.de:/misc/hadesprojects/daq/cvsroot checkout ebctrl +\end{itemize} + +The structure of the module is as follows: +\begin{itemize} +\item ebctrl/ioc/ - The IOC code. +\item ebctrl/mon/ - The adl files for MEDM. +\end{itemize} + +\begin{itemize} +\item \textbf{Data bases of records: ebctrl/ioc/ebctrlApp/Db/} + \begin{itemize} + \item evtbuild.db - Records for daq\_evtbuild monitoring as well as for the RUN ID. + \item genrunid.db - Record to generate RUN ID. + \item netmem.db - Records for daq\_netmem monitoring. + \item totalevtstat.db - Records to monitor total statistics. + \item portnr1.db, portnr2.db - Records to monitor port numbers opened by daq\_netmem. + \item trignr1.db, trignr2.db - Records to monitor trigger numbers from each data source. + \item trigtype.db - Records to monitor trigger types. + \item errbit1.db, errbit2.db - Records to monitor error bits from each data source. + \item errbitstat.db - Records to monitor erro bit patterns (up to 5 patterns). + \item cpu.db - Records to monitor CPU usage on the event builder servers. + \end{itemize} +\item \textbf{Source code of EPICS IOC: ebctrl/ioc/ebctrlApp/src/} + \begin{itemize} + \item evtbuild.c -> evtbuild\_proc(); - Read shared memory (/dev/shm/daq\_evtbuild1.shm) and fill corresponding records with monitoring information. See comments in evtbuild\_proc() for details. + \item genrunid.c -> genRunId\_proc(); - Generate RUN ID if IOC is a master. + \item writerunid.c -> writeRunId\_proc(); - Write RUN ID to shared memory (/dev/shm/daq\_evtbuild1.shm). + \item netmem.c -> netmem\_proc(); - Read shared memory (/dev/shm/daq\_netmem1.shm) and fill corresponding records with monitoring information. See comments in netmem\_proc() for details. + \item totalerrbitstat.c, totalbytewrate.c, totalbytewrit.c, totalevtcrate.c, totalevtdataerr.c, totalevtdisc.c, totalevtdrate.c, totalevtscomp.c, totalevttagerr.c - Total statistics calculated from individual EB statistics. + \item portnr1.c, portnr2.c - Read shared memory (/dev/shm/daq\_netmem1.shm) and fill the records with port numbers opened by daq\_netmem. + \item trignr1.c, trignr2.c - Read shared memory (/dev/shm/daq\_evtbuild1.shm) and fill the records with trigger numbers from each data source. + \item trigtype.c -> trigtype\_proc(); - Read shared memory (/dev/shm/daq\_evtbuild1.shm) and fill the records with trigger types. + \item errbit1.c, errbit2.c - Read shared memory (/dev/shm/daq\_evtbuild1.shm) and fill the records with error bits from each data source. + \item errbitstat.c -> errbitstat\_proc(); - Read shared memory and fill the records with error bit patterns. In addition, convert error bits to meaningful error message and execute errbit2logger() to send this message via specially configured \textbf{syslog} to central server lxhadesdaq. + \item cpu.c -> cpu\_proc(); - Read PID and Core number from shared memory (/dev/shm/daq\_evtbuild1.shm), calculate total Core CPU usage getCoreCpuUsage() and Core CPU usage by processes getProcCpuUsage(), and fill the records with the results. Do the same for daq\_netmem process. + \end{itemize} +\item \textbf{Two uptodate libraries must be placed to: ebctrl/ioc/lib/linux-x86\_64/ (pay attention to an architecture)} + \begin{itemize} + \item libhadaq.a + \item libcompat.a + \end{itemize} +\item \textbf{The corresponding startup file st.cmd should be placed to ebctrl/ioc/iocBoot/iocebctrl/} +\end{itemize} + +All the IOCs can be started/stopped via trbsoft/daq/evtbuild/start\_eb\_gbe.pl script: +\begin{itemize} +\item start\_eb\_gbe.pl -i start -n 1-16 +\item start\_eb\_gbe.pl -i stop -n 1-16 +\end{itemize} + +start\_eb\_gbe.pl reads the following configuration files: +\begin{itemize} +\item trbsoft/daq/evtbuild/eb.conf - Different settings for EBs +\item trbsoft/daq/main/data\_sources.db - Subevent IDs and buffer sizes for all active data sources. +\item trbsoft/daq/hub/register\_configgbe\_ip.db - IP addresses of EBs and port numbers of active data sources. +\item trbsoft/daq/cts/register\_cts.db - Obsolete configuration file for EB. +\end{itemize} + +The paths to data\_sources.db, register\_configgbe\_ip.db and register\_cts.db are taken from trbsoft/daq/evtbuild/eb.conf: +\begin{itemize} +\item DATA\_SOURCES: ../main/data\_sources.db +\item GBE\_CONF: ../hub/register\_configgbe\_ip.db +\item CTS\_CONF: ../cts/register\_cts.db +\end{itemize} + +When starting IOC the script executes writeIOC\_stcmd() subroutine which writes a command file for IOC/tmp/st\_eb01.cmd, /tmp/st\_eb02.cmd, etc. The following is a part of writeIOC\_stcmd() subroutine: + +\begin{verbatim} +#!../../bin/linux-x86_64/ebctrl + +## You may have to change ebctrl to something else +## everywhere it appears in this file +## Set EPICS environment + +< envPaths + +epicsEnvSet(FILESIZE,"$maxFileSize") +epicsEnvSet(EBTYPE,"$ebtype") +epicsEnvSet(EBNUM,"$ebNr") +epicsEnvSet(ERRBITLOG, "1") +epicsEnvSet(ERRBITWAIT, "30") +epicsEnvSet(EPICS_CA_ADDR_LIST,"192.168.103.255") +epicsEnvSet(EPICS_CA_AUTO_ADDR_LIST,"NO") +epicsEnvSet(PATH,"/home/scs/base-3-14-11/bin/linux-x86_64:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:.") + +cd \${TOP} + +## Register all support components +dbLoadDatabase("dbd/ebctrl.dbd") +ebctrl_registerRecordDeviceDriver(pdbbase) + +## Load record instances +dbLoadTemplate "db/userHost.substitutions" +dbLoadRecords("db/evtbuild.db","eb=$ebnum") +dbLoadRecords("db/netmem.db","eb=$ebnum") +dbLoadRecords("db/errbit1.db","eb=$ebnum") +dbLoadRecords("db/errbit2.db","eb=$ebnum") +dbLoadRecords("db/trignr1.db","eb=$ebnum") +dbLoadRecords("db/trignr2.db","eb=$ebnum") +dbLoadRecords("db/portnr1.db","eb=$ebnum") +dbLoadRecords("db/portnr2.db","eb=$ebnum") +dbLoadRecords("db/trigtype.db","eb=$ebnum") +dbLoadRecords("db/cpu.db","eb=$ebnum") +dbLoadRecords("db/errbitstat.db","eb=$ebnum") +$comment_totalevt dbLoadRecords("db/totalevtstat.db") +$comment_genrunid dbLoadRecords("db/genrunid.db","eb=$ebnum") + +## Set this to see messages from mySub +var evtbuildDebug 0 +var netmemDebug 0 +var genrunidDebug 0 +var writerunidDebug 0 +var errbit1Debug 0 +var errbit2Debug 0 +var trigtypeDebug 0 +var cpuDebug 0 +var errbitstatDebug 0 +$comment_totalevt var totalevtscompDebug 0 +cd \${TOP}/iocBoot/\${IOC} +iocInit() + +## Start any sequence programs +#seq sncExample,"user=scsHost" +\end{verbatim} + +where: +\begin{itemize} +\item \$maxFileSize - Master IOC generates new RUN ID whe this maximum file size is reached. This setting is from eb.conf (EB\_FSIZE: 1500). +\item \$ebtype - Type of IOC: master/svale. There is only one master IOC which corresponds to the EB process with the smallest number. +\item \$ebnum - Simply a number of the EB process. +\item dbLoadRecords("db/totalevtstat.db") - This record is loaded only for master IOC. +\item dbLoadRecords("db/genrunid.db","eb=\$ebnum") - This record is loaded only for master IOC. +\end{itemize} + +During the startup of IOCs the start\_eb\_gbe.pl script copies st\_eb01.cmd files to the corresponding Event Builder servers - scs@lxhadeb01.gsi.de:/home/scs/ebctrl/ioc/iocBoot/iocebctrl/. +Then the startIOC() subroutine executes remotely on EB server via ssh the folloing command: \textbf{bash; . /home/scs/.bashrc; cd \$ioc\_dir; screen -dmS \$screen\_name ../../bin/linux-x86\_64/ebctrl \$stcmd}. Where \$ioc\_dir = /home/scs/ebctrl/ioc/iocBoot/iocebctrl/, \$screen\_name = ioc\_eb02, \$stcmd = st\_eb02.cmd. + +The real IOC startup output copied from the screen: + +\begin{verbatim} +#!../../bin/linux-x86_64/ebctrl +## You may have to change ebctrl to something else +## everywhere it appears in this file +## Set EPICS environment +< envPaths +epicsEnvSet(ARCH,"linux-x86_64") +epicsEnvSet(IOC,"iocebctrl") +epicsEnvSet(TOP,"/home/scs/ebctrl/ioc") +epicsEnvSet(EPICS_BASE,"/home/scs/base-3-14-11") +epicsEnvSet(FILESIZE,"1500") +epicsEnvSet(EBTYPE,"slave") +epicsEnvSet(EBNUM,"2") +epicsEnvSet(ERRBITLOG, "1") +epicsEnvSet(ERRBITWAIT, "30") +epicsEnvSet(EPICS_CA_ADDR_LIST,"192.168.103.255") +epicsEnvSet(EPICS_CA_AUTO_ADDR_LIST,"NO") +epicsEnvSet(PATH,"/home/scs/base-3-14-11/bin/linux-x86_64:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:.") +cd /home/scs/ebctrl/ioc +## Register all support components +dbLoadDatabase("dbd/ebctrl.dbd") +ebctrl_registerRecordDeviceDriver(pdbbase) +## Load record instances +dbLoadTemplate "db/userHost.substitutions" +dbLoadRecords("db/evtbuild.db","eb=eb02") +dbLoadRecords("db/netmem.db","eb=eb02") +dbLoadRecords("db/errbit1.db","eb=eb02") +dbLoadRecords("db/errbit2.db","eb=eb02") +dbLoadRecords("db/trignr1.db","eb=eb02") +dbLoadRecords("db/trignr2.db","eb=eb02") +dbLoadRecords("db/portnr1.db","eb=eb02") +dbLoadRecords("db/portnr2.db","eb=eb02") +dbLoadRecords("db/trigtype.db","eb=eb02") +dbLoadRecords("db/cpu.db","eb=eb02") +dbLoadRecords("db/errbitstat.db","eb=eb02") +# dbLoadRecords("db/totalevtstat.db") +# dbLoadRecords("db/genrunid.db","eb=eb02") +## Set this to see messages from mySub +var evtbuildDebug 0 +var netmemDebug 0 +var genrunidDebug 0 +var writerunidDebug 0 +var errbit1Debug 0 +var errbit2Debug 0 +var trigtypeDebug 0 +var cpuDebug 0 +var errbitstatDebug 0 +# var totalevtscompDebug 0 +cd /home/scs/ebctrl/ioc/iocBoot/iocebctrl +iocInit() +Starting iocInit +############################################################################ +## EPICS R3.14.11 $R3-14-11$ $2009/08/28 18:47:36$ +## EPICS Base built Mar 26 2010 +############################################################################ +cas warning: Configured TCP port was unavailable. +cas warning: Using dynamically assigned TCP port 60481, +cas warning: but now two or more servers share the same UDP port. +cas warning: Depending on your IP kernel this server may not be +cas warning: reachable with UDP unicast (a host's IP in EPICS_CA_ADDR_LIST) +iocRun: All initialization complete +## Start any sequence programs +#seq sncExample,"user=scsHost" +epics> +\end{verbatim} + +To stop IOCs the '\textbf{start\_eb\_gbe.pl -i stop -n 3-6}' script finds all running IOCs (subroutine findRunningIOC()), creates an expect script /tmp/ioc\_exit.exp and executes this script with two arguments (IP of the EB server where IOC runs and IOC screen name) for each IOC process indicated by \textbf{-n 3-6} argument (from 3 to 6). The /tmp/ioc\_exit.exp then logs on EB server, connects to the corresponding screen session and exits the IOC. + +\subsection{Event Building interface to RFIO server} + +HADES Event Builders will write the full data to Tape via Data Movers. Some fraction of data we want also to write to Lustre. The latter feature can be integrated into the RFIO server on Data Movers. The following three parameters should be passed to gstore: + +\begin{itemize} +\item Lustre path. (For example "/lustre\_alpha/hades/beam/sep08/d", where "/lustre\_alpha/hades/beam/sep08" is the existing path and "d" is a prefix of the non-existing directory. The RFIO server should add to this prefix a time stamp in a form: yydddhhmm (yy - year, ddd - day of the year, hh - hour, mm - minutes). Then, a complete path should look like "/lustre\_alpha/hades/beam/sep08/d090231634" If directory "d090231634" does not exist the RFIO server should create it. +\item Number of files per directory. This is needed to avoid huge amount of files in one directory. +\item Fraction of files in the main stream to be written to Lustre. +\begin{itemize} +\item 0 - nothing +\item 1 - each file +\item 2 - every second file +\item 3 - every third file and so on +\end{itemize} +\end{itemize} + +Additionally if connection to tape breaks the RFIO server will automatically start writing each file to Lustre independently of the third parameter setting. + +This function implements the extended RFIO interface: \\ +FILE* rfio\_fopen\_gsidaq( char *pcFile, char *pcOptions, int iCopyMode, char *pcCopyPath, int iCopyFraction, int iMaxFile, int iPathConvention) +\begin{itemize} +\item \textbf{pcFile} : base name ("rfiodaq:gstore:") +\item \textbf{pcOptions} : some options ("wb") +\item \textbf{iCopyMode} : +\begin{itemize} +\item \textbf{0} : standard RFIO, ignore following arguments. +\item \textbf{1} : copy the data to pcCopyPath after the file is written to a write cache (this is for the high data rates). +\item \textbf{2} : for lustre only +\end{itemize} +\item \textbf{pcCopyPath} : +\begin{itemize} +\item lustre path ("/lustre/hades/daq/test"). If the path does exist it will be created according to parameter iPathConvention. +\item \textbf{"RC"} : read cache +\end{itemize} +\item \textbf{iCopyFraction} : +\begin{itemize} +\item \textbf{0} : write only to a tape. +\item \textbf{i} (>0) : copy each i-th file to lustre (pcCopyPath). If migration to a tape fails, ignore iCopyFraction and copy each file to lustre. +\end{itemize} +\item \textbf{iMaxFile} : +\begin{itemize} +\item \textbf{0} : no file number limit. +\item \textbf{i} (>0) : maximum number of files to be written to a directory (files already sitting in the directory are ignored). When iMaxFile is reached, a new directory at the same level is created according to a parameter iPathConvention. +\end{itemize} +\item \textbf{iPathConvention} : +\begin{itemize} +\item \textbf{0} : default convention "/hadaqtest/test" => "/hadaqtest/test", next "/hadaqtest/test" => "/hadaqtest/test1", next "/hadaqtest/testi" => "/hadaqtest/test(i+1)" +\item \textbf{1} : HADES convention "/hadaqtest/test" => "/hadaqtest/testyydddhhmm" +\end{itemize} +\end{itemize} \section{Storing data to Oracle database} +\begin{itemize} +\item All the scripts can be checked out from CVS: +\begin{itemize} +\item CVS/Root: :ext:hadaq@lxi001.gsi.de:/misc/hadesprojects/daq/cvsroot/ +\item CVS/Repo: trbsoft/daq/oracle +\end{itemize} +\item Or can be found on lxhadesdaq:/home/hadaq/trbsoft/daq/oracle/ +\end{itemize} + +\subsection{Before running scripts} + +Before we sart inserting current boards to the Oracle Data Base we have to update the Oracle Data Base with all the existing information (in case there are new boards or different subevents). + +\begin{itemize} +\item First we collect the information about all existing subevents and boards: +\begin{itemize} +\item daq2stdout.pl -p db -o subevtid > subevtids.txt +\item daq2stdout.pl -p db -o board > boards.txt +\end{itemize} +\item Second, we insert this information to Oracle Data Base: +\begin{itemize} +\item board2ora.pl -t subevtid -f subevtids.txt +\item board2ora.pl -t board -f boards.txt +\end{itemize} +\end{itemize} + +Important: at the beginning of scripts daq2ora\_client.pl and board2ora.pl you have to comment/uncomment the following lines: +\begin{itemize} +\item \#my \$database = 'db-hades'; +\item my \$database = 'db-hades-test'; +\end{itemize} +where 'db-hades' is a production data base and 'db-hades-test' is a test data base. + \subsection{Time stamp and current info on boards} -The DAQ startup script writes the ascii file with all the active boards in the system and the time stamp. +The DAQ startup script (startup.pl) writes the ascii file with all the active boards in the system and the time stamp. The daq2ora\_client.pl script reads this ascii file on lxhadesdaq e.g. \\\verb!~/oper/daq2ora/daq2ora_2010-08-30_12.49.50.txt! -\\and stores the info in the Oracle database. The info consists of board unique Id, TRB-Net address, subEvent Id, TDC RC Mode, threshold version for RICH and MDC as well as MDC TDC mask version. +\\and stores the info in the Oracle database. The info consists of board unique Id, TRB-Net address, subEvent Id, TDC RC Mode, threshold version for RICH and MDC as well as MDC TDC mask version. See fig.~\ref{fig:daq2ora1}. + +\begin{figure} + \centering + \includegraphics[width=1\textwidth]{daq2ora_2.png} + \caption[EB system]{DAQ,TRB information to Oracle data base} + \label{fig:daq2ora1} +\end{figure} \begin{description} \item[Script options] : @@ -184,8 +762,18 @@ The daq2ora\_client.pl script reads this ascii file on lxhadesdaq e.g. \\\verb!~ \subsection{RUN Start/Stop info} -The event builder writes the data at RUN start and RUN stop to the ascii file (for example /home/hadaq/\-oper/oper\_1/\-eb\_runinfo2ora\_1.txt is the file on lxhadesdaq writen by event builder 1 running on lxhadeb01. lxhadesdaq mounts /home/hadaq/oper/oper\_1/ from lxhadeb01). The runinfo2ora.pl script reads the files and writes the data in the Oracle database. +\begin{figure} + \centering + \includegraphics[width=1\textwidth]{daq2ora_1.png} + \caption[EB system]{RUN ID, Start/Stop time information to Oracle data base} + \label{fig:daq2ora2} +\end{figure} + +The event builder writes the data at RUN start and RUN stop to the ascii file (for example /home/hadaq/\-oper/oper\_1/\-eb\_runinfo2ora\_1.txt is the file on lxhadesdaq writen by event builder 1 running on lxhadeb01. lxhadesdaq mounts /home/hadaq/oper/oper\_1/ from lxhadeb01). The runinfo2ora.pl script reads the files and writes the data in the Oracle database. See fig.~\ref{fig:daq2ora2}. \newline \newline -The script can be executed: -lxhadesdaq: /home/hadaq/oper/runinfo2ora.pl -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_1.txt -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_5.txt -f ... \ No newline at end of file + +The script can be executed on lxhadesdaq with all the files from 16 EBs: +/home/hadaq/oper/runinfo2ora.pl -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_1.txt \\ -f /home/hadaq/oper/oper\_2/eb\_runinfo2ora\_2.txt -f /home/hadaq/oper/oper\_3/eb\_runinfo2ora\_3.txt -f /home/hadaq/oper/oper\_4/eb\_runinfo2ora\_4.txt -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_5.txt -f /home/hadaq/oper/oper\_2/eb\_runinfo2ora\_6.txt -f /home/hadaq/oper/oper\_3/eb\_runinfo2ora\_7.txt -f /home/hadaq/oper/oper\_4/eb\_runinfo2ora\_8.txt -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_9.txt -f /home/hadaq/oper/oper\_2/eb\_runinfo2ora\_10.txt -f /home/hadaq/oper/oper\_3/eb\_runinfo2ora\_11.txt -f /home/hadaq/oper/oper\_4/eb\_runinfo2ora\_12.txt -f /home/hadaq/oper/oper\_4/eb\_runinfo2ora\_13.txt -f /home/hadaq/oper/oper\_2/eb\_runinfo2ora\_14.txt -f /home/hadaq/oper/oper\_3/eb\_runinfo2ora\_15.txt -f /home/hadaq/oper/oper\_1/eb\_runinfo2ora\_16.txt + +