CIDF Specification: Version 0.5 Page 1 THE COMMON INTRUSION DETECTION FRAMEWORK CIDF working group http://seclab.cs.ucdavis.edu/cidf/members.html Contents 0: Preamble 0.1 Introduction 0.2 Organization of this document 1 Architecture 1.1 Introduction 1.2 Functional decomposition (E-boxes, A-boxes etc). 1.3 Layering scheme 1.4 Naming and locating components 2 Gidos and S-expressions 2.1 Introduction 2.2 Different types of gidos 2.3 Gido rationale and requirements 2.4 S-expression gidos 2.5 Recommended standard gidos 2.6 S-expression guidelines 3 Encoding Gidos in Bytes 3.1 Introduction 3.2 Gido header 3.3 S-expression encoding 4 CIDF Communication 4.1 CIDF message layer formats 4.2 CIDF Message Processing 4.3 CIDF Directory Services 5 APIs 5.1 Introduction 5.2 General APIs 5.3 Crypto APIs 5.4 Event generator API 5.5 Event analyzer API 5.6 GIDO database API 5.7 Response unit API A. Primitive Type Definitions B. SIDS List C. LDAP Background CIDF Specification: Version 0.5 Page 2 ======================================================================== = 0.1: Introduction to CIDF ======================================================================== The goal of the Common Intrusion Detection Framework is a set of specifications which allow * different intrusion detection systems to inter-operate and share information as richly as possible, * components of intrusion detection systems to be easily re-used in contexts different from those they were designed for. The CIDF working group came together in January 1997 at the behest of Teresa Lunt at DARPA in order to develop standards to accomplish the goals outlined in the previous section. She was particularly concerned that the various intrusion detection efforts she was funding be usable and reusable together and have lasting value to customers of intrusion detection systems. During the life of the effort, it became clear that this was of wider value than just to DARPA contractors, and the group was broadened to include representatives from a number of government, commercial, and academic organizations. After the first few months, membership in the CIDF working group was open to any individuals or organizations that wished to contribute. No cost was involved (except to defray meeting expenses). Major decisions were made at regular (every few months) meetings of the working group. Those decisions were made by rough consensus of all attendees. That is, the meeting facilitator attempted to reach consensus, but in situations where only one or two individuals were protesting a decision, they were overruled in the interest of efficiency. No decisions were taken in the face of opposition from a sizeable minority, rather the issue was tabled for further consideration. Meetings were fun and the working group had a good time doing this (well, most of them, anyway). In between meetings, most of the writing was done by small subgroups or individuals. Their text was brought back for approval/changes at meetings. Discussions were also carried on in the working group mailing list, but few decisions were made that way. CIDF Specification: Version 0.5 Page 3 ======================================================================== = 0.2: Organization of the CIDF Spec ======================================================================== This section describes the organization of the CIDF specification as it appears in the rest of this document. CIDF basically consists of the following things: 1) A set of architectural conventions for how different parts of intrusion detection systems can be modeled as CIDF components. 2) A way to represent gidos (generalized intrusion detection objects). Gidos can * describe events that have happened in the systems mo by an IDS, * instruct an ids to carry out some action * query an ids as to what has happened. * describe and IDS component. 3) A way to encode gidos into streams of bytes suitable for transmission over a network or storage in a file. 4) Protocols for CIDF components to find each other over a network and exchange gidos. 5) Application Programming Interfaces to re-use CIDF components. Each of these major areas thus forms one section (numbered as shown above) of this document. The organization of the individual sections is described at the front of that section. 0.2.1: Format This document complies with the requirements for RFC 1543, the format for ASCII Internet RFCs. In summary, this means that lines are at most 72 characters long and that they are terminated with a carriage-return, line-feed pair. Pages are at most 58 lines long and are terminated with a form-feed character. Paragraphs are single spaced and are separated by blank lines. Lines in the text beginning with "#" denote editorial comments which should be removed before the final version. The document is also divided into sections which are further divided into subsections, subsubsections, and so on. The numbering convention is as "3.4.1", which describes the first subsubsection of the fourth subsection of the third section. Appendices are lettered, and so an Appendix subsection might be B.4.2. CIDF Specification: Version 0.5 Page 4 ======================================================================== ======================================================================== = = 1: CIDF Architecture = ======================================================================== ======================================================================== = ======================================================================== = 1.1: Introduction ======================================================================== This section introduces the architectural framework that CIDF assumes will structure an intrusion detection system. This scheme is basically a framework around which the APIs and the communication protocols are organized. It is not mandated that CIDF-conformant intrusion detection systems must be organized in exactly this way. But they must support interfaces that are so organized. Section 1.2 introduces the various different kinds of components that CIDF believes are needed in IDS systems. Section 1.3 covers the communication layering scheme, and section 1.4 discusses how components are named and located. CIDF Specification: Version 0.5 Page 5 ======================================================================== = 1.2: CIDF Functional Decomposition ======================================================================== CIDF defines interfaces to several different types of CIDF components. The four recognized kinds of components are * Event generators * Analyzers * Databases * Response units These components are pieces of an intrusion detection system. All these kinds of components deal in *gidos* (generalized intrusion detection objects) which are represented via a standard common format. Gidos are data which is moved around in the intrusion detection system. Gidos can represent events that occurred in the system, analysis of those events, instructions about responses to be carried out, of queries about events. Figure 1.1 presents a schematic view of these components in a hypothetical intrusion detection system. The solid boxes labeled E_1, E_2, A_1, A_2, D_1, etc represent the various components of some hypothetical intrusion detection system. It is convenient to think of these as objects in the object-oriented programming sense (this does not dictate an implementation in an object-oriented language or framework). ** Figure 1.1 goes here!!! ** See http://seclab.cs.ucdavis.edu/cidf/proposal/pictures/architecture.1.eps Whether the individual components are separate processes or images, or merely conceptually separate parts of the code in a single image is not specified - both possibilities are covered by the CIDF specification. Note also allows for components to be aggregated together to masquerade as a single component. In other words, a large number of (possibly distributed) components can be tied together and present themselves to the outside world as a single component, accessed via the same CIDF interfaces. ##################################################################### # # Stuart comment: # It is not clear at present how this last requirement is to # be achieved. # ##################################################################### 1.2.1 Configuration and Directory Service CIDF Specification: Version 0.5 Page 6 The gray box (labelled C in Figure 1.1) represents the configuration and directory services that tie components together via their standard CIDF interfaces. A component initiating communication may avoid using these services if it knows how to address its target directly, or uses non- CIDF means to do so. Otherwise, these services allow a component either to look up its target explicitly or to derive its communication "partners" by looking up "gido classes". Gido classes specify types of data that may be exchanged between components. Components that wish to receive certain kinds of gidos describe what they want; components producing event records describe what it is they produce. The directory service, augmented by intelligence local to each component, then takes care of associating GIDO producers with appropriate GIDO consumers. In this mode of use, components are thus relieved of the burden of identifying or locating their partners in the intrusion-detection system. 1.2.2 Event Generators The boxes labelled E_i in Figure 1.1 are event generators. Their role is to obtain events from the larger computational environment outside the intrusion detection system (symbolized by the fat arrows coming from outside the dashed box), and provide them in the CIDF standard gido format to the rest of the system. For example, event generators might be simple filters that take C2 audit trails and convert them into the standard format. Another event generator may passively monitor a network and generate events based on the traffic thereon. A third might be application code in an SQL database program which generates events describing database transactions. It seems that these components are likely to be re-usable in that CIDF has a standard data format, and so converting features of typical computational environments into that format will be a task that many groups will need to perform. Hence, it is useful to specify an interface for how event generators are configured and used. Event generators provide events as soon as they occur (with the possible exception of transport queuing). Storage of events is handled in event databases. 1.2.2 Event Analyzers Analyzers are labeled by A_i in Figure 1.1. They are the components we typically think of in the intrusion detection context. They obtain gidos from other components, analyze them, and return new events (which hopefully represent some kind of synthesis or summary of the input). CIDF Specification: Version 0.5 Page 7 Thus for example, an analyzer might be a statistical profiling tool which examines whether events being supplied to it now are statistically unlikely to be from the same time series as events supplied to it in the past. Another example is a signature tool which examines sequences of events looking for particular patterns which represent known misuse of the system. Another example would be a correlator which simply examines events and attempts to determine whether they are causally related to one another, and then puts them together into composite events which can be further analyzed. Simple analyzers might be just filters that throw away events that match certain patterns, or caches that only forward events dissimilar from recently seen events. Again, event analyzers are assumed to immediately pass through gidos (with the exception of some processing delay). No provision is made for storage of gidos by analyzers. 1.2.3 Event Databases Databases are labeled by D_i in Figure 1.1. These components simply exist to give persistence to CIDF gidos where that is necessary. The interfaces allow other components to pass gidos to the database, and to query the database for gidos that it is holding. Databases are not expected to change or process the gidos in any way (or at least to maintain the illusion that they don't). It is not assumed that the database is a complex application (such as a relational database). It may simply be a file. 1.2.4 Response Units Response units are the soldier ants of the CIDF ant-heap. They carry out response instructions - gidos which instruct them to act on behalf of other CIDF components. This is where functionality such as killing processes, resetting connections, etc. would reside. Response units are not expected to produce output except as acknowledgements. CIDF Specification: Version 0.5 Page 8 ======================================================================== = 1.3: Communication Layers ======================================================================== 1.3.1: Background CIDF supports both interoperability and reusability of components. As such, a component may be communicating with another across the network, or as part of the same executable. In addition, to the extent feasible, CIDF avoids specifying a particular language or choice of network protocols. To support this flexibility, the design is structured in layers. Figure 1.2 shows the layers. ######################################################################## # # Stuart notes: # it's presently an open question whether a given CIDF module must allow # *both* re-use and interoperability, or only one of them. # ######################################################################## ------------------ | APIs | |----------------| | Gido layer | |----------------| | message | | layer | |----------------| | (negotiated) | | transport | | layer | ------------------ Figure 1.2 1.3.2: API Layer At the top of figure 1.2 is an API layer indicating code-based interfaces to the layers below. Application programmers require a clean and uniform way to call upon functions that are either local or remote and do not wish to bother with the details of exactly how that function is provided. APIs hide information and simplify a programmer's task. If the underlying structure of one of the lower layers is changed, the programmer does not have to rewrite the application program. The specification is in-principle neutral regarding the language used for APIs. Of course, the APIs must be instantiated for any specific language, and the instantiations will be different for different languages. However, the semantics of what is being passed across the interface will be common, and to the extent feasible, the APIs will be conceptually similar. The APIs are discussed in detail in Section 5 of this document. CIDF Specification: Version 0.5 Page 9 1.3.3: Gido layer Independent of programming language, network protocols, etc, CIDF defines common formats for intrusion detection data. This data comes in discrete packages called gidos (generalized intrusion detection objects). The organization of the data, its semantics for an IDS component, and a way to encode it in bytes are all defined at this level. The rationale for this is to separate the issue of how data is organized and what it means (gido layer) from how it is gotten in and out of components (API layer) and moved across networks (API layer). In the case of components that are linked together into a single executable, there may be no layer below the gido layer. Gidos are discussed in sections 2 and 3 of this document. 1.3.4: Message layer Gidos must be moved across networks. Certain features of this process must be present for CIDF purposes and may not be provided by underlying transport mechanisms (such as cryptography, CIDF addressing, etc). The CIDF message layer is intended to provide this functionality. This layer is addressed in section 4. Use of this layer is mandated for CIDF components that are to be interoperable across a network. 1.3.5: Transport layer The figure below illustrates the notion of two independently developed CIDF modules that build to a common interface specification. CIDF supports For the two modules to communicate, they are required to employ the same transport protocols that will establish the communication channel and handle message passing. The introduction of the transport layer is handled during the integration phase, as module developers negotiate and agree upon a common transport channel. For example, both developers may agree that sockets will be used for this communication session. Other developers may decide they wish to employ secure RPC for a different session. CIDF provides the flexibility to use different transport mechanisms, and a negotiation mechanism to choose amongst them. The reason for having an independent transport layer below the message layer is that our only requirement is that the components understand the messages. This is independent of the way in which messages are transmitted. Different applications will require different transport mechanisms. All components are required to support a default transport mechanism, namely UDP. This is necessary in order to guarantee that two components can talk at least enough to negotiate about which other transport mechanism they might prefer. ------------------------------------------------------------------------ Interoperation Among Independently-Developed Intrusion-Detection Modules CIDF Specification: Version 0.5 Page 10 +-------------+ +---+ +---+ +-------------+ | Intrusion | | T | | T | | Intrusion | | Detection | | R | | R | | Detection | | Module X | | A | communication | A | | Module Y | | | | N | interface | N | | | | Developer 1 | | S | <-------------------->| S | | Developer 2 | | | | P | negotiated | P | | | | Language A | | O | during | O | | Language B | | OS X | | R | integration | R | | OS Y | +-------------+ | T | phase | T | +-------------+ ^ +---+ +---+ ^ / \ / \ | | | | | Build-to +------------------+ Build-to | | | | | +-----------------| Common Interface |------------------+ | Specification | +------------------+ ------------------------------------------------------------------------ - Figure 1.3 CIDF Specification: Version 0.5 Page 11 ======================================================================== = 1.4: Naming and Locating Components ======================================================================== 1.4.1: Background In an intrusion-detection system of any scale, naming components has the potential to become a boundless headache. Components that "know" the identity of other components will require modification to work with other partners if redeployed in new contexts. Such components might not be informed at all of changes in the system (such as the addition or removal of components of interest to them) that could affect their own operation. So each component has the option of specifying classes of gidos that it is interested in, rather than naming other components. A producer of gidos can announce the classes of gidos it produces. A gido consumer can request the classes of gidos it wants to receive. Communication between components is then characterized, not by the endpoints involved (e.g., addresses or other identifying information), but by the data the communication is to carry. Components do also have the option of naming other components explicitly, either with or without the use of the CIDF infrastructure described in the next sections. 1.4.2: Associations Enabling the data-directed communication described above is the role of the CIDF Directory Component (CDC), which will form and maintain associations between components. To discuss associations further, it is useful to divide CIDF components into gido producers (E-boxes, A-boxes, D-boxes) and gido consumers (A-boxes, D-boxes, R-boxes). Also note that gidos may enter the ID & R system from other sources (e.g., humans) and leave the system bound for other recipients. A component contacts the CDC to announce its presence and ask for associates. This call returns a set of communications endpoints identifying potential partners for the transfer of gidos of interest to the caller. Each producer-consumer pair subsequently established is the basis for an association. The caller has the option of being notified (via a callback) when new potential partners enter the system or when old ones leave. Note, though, that individual components or CIDF platforms may choose not to support dynamic addition of associations, e.g. due to resource constraints. Components may also restrict the number of concurrent associations they will enter into. CIDF Specification: Version 0.5 Page 12 Associates are sought by a single API call, and individual associations may be torn down with a second call. However, each of these calls may induce a larger number of lower-level interactions. At the message layer, setting up an association involves directory operations (optionally including authentication). Maintaining a request for associates may involve keepalive functions also implemented in the message layer. In the API layer occurs negotiation between producer and consumer to determine what kinds of gidos the consumer will receive. The specification returns to these lower-level operations in later sections. 1.4.3: Gido Classes For data-driven communication to work on a large scale, ways of classifying gidos must be established in advance. Every legal class of gido will fall under a category, which is a set of values for a particular attribute of the gido. The attribute need not necessarily be an explicit field in the gido; it could be an attribute of the gido producer, or of the host the producer monitors. When an attribute does form part of a gido, it corresponds to a semantic ID (SID), as defined elsewhere in this document. So a category can denote something like: * an IP subnet * a DNS domain or subdomain * a physical subnet * a functional grouping of hosts, like: + a department + a project Wildcarding will be allowed, but not arbitrary wildcarding. For instance, the last two elements of an IP address (only) might be allowed to be wildcarded. Each kind of category is hierarchical and will be used to organize the set of CDC servers in the system. Each category will be the responsibility of a single server. A gido class may specify more than one category; it may also specify attributes that are not categories at all. These will be applied in negotiating what a given producer will actually send to a consumer. The CDC can be significantly simplified by building it atop LDAP- compliant infrastructure. Basically, the CDC then becomes a set of LDAP-compliant servers (DSA's), plus an LDAP client (DUA) and additional intelligence local to each component. This "normal" client environment will have to be replaceable where required with a simpler equivalent. Appendix C says more about our proposed use of LDAP. 1.4.4: Limitations CIDF Specification: Version 0.5 Page 13 Though the CDC can do a great deal to make connections between components intuitive and flexible, two key limitations of the approach (or indeed of any approach built atop a hierarchical directory service) should be noted. Looking up a target in a hierarchical directory service is appropriate if the target of the lookup is susceptible to hierarchical naming and if its part of the directory hierarchy is believed to be trustworthy (or at least not believed a priori to be untrustworthy). However, there are interesting classes of gidos for which one or both of the above assertions are not true. First are gidos that concern things lacking hierarchical names. Some examples are: * public keys * programs or other bad bundles of bits * attack profiles, like Stephanie Forrest's tuples of system calls Second are gidos that describe some characteristic of an attack or of an attacker. If one wants to know about attacks emanating from a given subnet, or authored by a given principal, use of a hierarchical directory service to locate related gidos would lead one to a server operating out of the (hypothetical) attacker's domain, and hence likely to be compromised. In either case, to cope with gidos that describe an object lacking a hierarchical name, or for which the name leads into an administrative domain that cannot be trusted to provide accurate information, a hierarchical directory service seems inappropriate. CIDF Specification: Version 0.5 Page 14 ======================================================================== = 2.1: Introduction to the gido format ======================================================================== 2.1.1: Overview This document specifies a standard gido format for use by CIDF components. These components shall use this standard for disseminating event records, analysis results, and countermeasure directives, to IDS modules. The document both defines the syntactic structure of these messages, and provides a method for defining the semantic content necessary for interpreting the various data elements embedded within the structure. 2.1.2: Organization This section is organized as follows. Section 2.2 discusses the distinctions and similarities between various kinds of IDS data, and explains why all these types of data are encoded under a single GIDO specification. Section 2.3 summarizes the strategy followed in developing this encoding specification. Section 2.4 (along with Appendices A and B) defines the GIDO structure, syntax, and semantics encoding/decoding scheme. Section 2.5 identifies the recommended set of GIDOs (primarily internal status information) that all CIDF-compliant modules should be able to produce. Section ??.?? enumerates a variety of example message encodings that help illustrate the usage of this specification. ### Examples needed. ### CIDF Specification: Version 0.5 Page 15 ======================================================================== = 2.2: GIDOs: The Various Kinds. ======================================================================== Under the CIDF data sharing model, components receive an input stream, use this input to drive their internal analytical processing, and pass the results to other components within an overall intrusion detection architecture. The output of one component may be the input of another component. Therefore, this specification closely coordinates the structures of event records, analysis reports, and countermeasure prescriptions. This adoption of a single standard for both E-, A-, and R-boxes provides significant advantages in the reduction of interface complexity. In addition, this approach provides great flexibility as intrusion-detection objectives move from component analysis, to systems analysis, to system of systems analysis. However, this relationship between event records and analysis results does not necessarily extend beyond the specification of identical gido structures. Event records, analysis results, and countermeasure prescriptions remain dissimilar in significant ways: o Event records represent the operational activity of the analysis target, and may be produced in large volumes. Minor losses of event records, while potentially damaging, will not necessarily imply a significant compromise to operational security. o Analysis results represent significant conclusions derived from an analytical review of an event stream, and should represent a significant reduction in volume from that of the event stream. Minor losses of analysis results are far more critical to the operation security of the target system than event records. o Countermeasure results likewise should be low volume and sensitive to loss. Thus, while gidos encode events, analysis results, and countermeasure prescriptions identically, other processing layers such as transport may handle them differently. For example, specifications for event transport may derive requirements that emphasize performance (e.g., stateless UDP transmission), while analysis results dissemination protocols may emphasize ensured delivery and accurate reassembly over issues of performance (e.g., TCP transmission). Protocols for event dissemination and analysis results reporting may also handle other issues differently, such as security requirements. CIDF Specification: Version 0.5 Page 16 ======================================================================== = 2.3: GIDO format requirements and rationale ======================================================================== The GIDO structure contains the actual data representing the event record, analysis results, and countermeasure directives produced by their respective CIDF components. The encoding scheme requires the ability to express complex, self-defining data structures, while providing efficient high-volume transmissions of predefined structures. This specification derives its payload format from the general specification of S-expressions. S-expressions are a self-defining formatting scheme for representing arbitrarily complex data structures. This message encoding specification employs a very simplified form of S-expressions for event record, analysis report, and countermeasure directive representation. However, S-expressions in general provide an impressive degree of reasoning and formalism that, in future revisions, will support the construction of highly complex messages. The goals of this message encoding scheme are: -- generality: Payloads should be expressed efficiently, and capable of representing arbitrarily complex data. -- self-defining: Extensions to payload formatting should be semantically defined within the payload itself. Consumers should be able to learn or adjust to alterations in the expected format or comprehend entirely new payload format. -- simplicity: The encoding scheme should produce messages that do not force complex parsing logic upon IDS module developers. The encoding scheme should be easily understandable and ideally message should be human readable. -- efficiency: Payload expressions should represent data compactly. The overhead of semantic self-definitions should be removable when predefined messages are transported in bulk. -- flexible: Payload expressions must be open to modification and extensions to new data types, semantic information, and new data structures. -- independent of call semantics: Payload expression must be supportive of both embedded data (call by value) messages and data independent (call by reference) messages. 2.3.1 Mostly legacy text about schedule and goals ################################################################## # # Stuart note: # It seems to me this section is mostly dead wood, but I don't # have the temerity to remove it altogether while editing. If # anyone wants it, send me a better version or I'll axe it next # time I edit the document. ################################################################## CIDF Specification: Version 0.5 Page 17 A key motivation behind this specification is to provide DARPA IDS researchers an immediate encoding standard from which to implement CIDF- compliant event and analysis boxes. The specification must also be flexible enough to be extended toward future, more sophisticated protocols, such as those that might provide dynamic registration and subscription (including the eventual development of CIDF Registration Modules). This subgroup proposes a two-phase plan for developing this specification. First, there will be a short-term effort focusing on the development of an initial message specification. This phase will define the syntactic structure of event and analysis message, and will enumerate a core set of semantic identifiers that will allow client modules to parse and understand arbitrary data embedded within messages. Second, a longer-term phase will focus on specifying a sophisticated negotiation protocol between the various CIDF modules and their clients. This protocol will support the dynamic negotiate of new message formats, and dynamic subscription to events and analysis results of interest, as well as countermeasure directives. ##################################################################### # Editor's Comment: # Our subgroup is currently working on a discussion of our planned # direction that our dynamic negotiation protocol will take. However, # we don't expect to introduce this aspect of the specification before # the 2.0 version of this specification. ##################################################################### Phase One: Core Event and Analysis Report Encoding Specification Time Frame: 1-3 months, draft 1.0 available December 1997. Phase one involves the development of the final draft of this document. This message specification defines the data structure(s) used to encode events and analysis results. The message structure consists of a constant header definition, followed by a variable payload field that will encode the event record or analysis report. The specification will enumerate core sets of data type primitives and semantic identifiers (SIDs) for data fields. Each field in the message payload will be encoded using one of the available data type primitives, and will have an associated SID that indicates its semantic content. Client modules can parse the SID to determine how to interpret the field. During Phase one, the event subgroup will: CIDF Specification: Version 0.5 Page 18 o specify an extensible, self-defining message structure capable of representing arbitrarily complex data structures. o specify a list of primitive data types for representing data within the message fields. o specify a list of semantic identifiers that are associated with message fields. These SIDs will provide client IDS modules a means of interpreting the content and syntactic structure of each field. o enumerate examples of how various types of events (e.g., audit data, SNMP messages, MIB data, packet data, DBMS transactions), analysis results (e.g., signature reports, statistical profiles, GrIDS graphs), and countermeasure directives can be encoded in this structure. o define an example set of core messages that all CIDF-compliant modules may use to interoperate. ##################################################################### # Editor's Comment: # We had originally conceived of a mandatory set of messages that # modules could use to express warnings, errors, and status information. # However, it was later decided that the imposition of mandatory # messages was outside the scope of our current effort. ##################################################################### o publish by 15 December 1997, a 1.0 specification of this CIDF Event Record and Analysis Report Encoding Specification. Phase Two: CIDF Dynamic Message Negotiation Protocol Time Frame: 6 months and beyond, draft 2.0 available TBD Phase two will develop a dynamic message negotiation protocol, which will allow IDS clients to negotiate and subscribe to customized CIDF messages with minimal human interaction. Using this protocol, a client IDS module instantiated into a new environment can discover E-boxes, A- boxes, or R-boxes operating within the same environment. The client module can then, for example, probe a target box to determine the target's event generation capabilities or analysis reporting capabilities. From there the client would negotiate with the target box to subscribe to a subset of the target box's messages. ##################################################################### # Editor's Comment: # To date we have omitted discussions of the D-box and the occasionally # mentioned registration box. ##################################################################### ##################################################################### # Editor's Comment: # In addition, we have begun work to extend this specification to # the encoding of countermeasure measure directives produced by # R-Boxes. Though not fully addressed in this draft, the intention # is to ultimately define SIDS and examples of directives that # are efficiently encoded under this format. ##################################################################### CIDF Specification: Version 0.5 Page 19 ======================================================================== = 2.4 GIDO Format ======================================================================== 2.4.1 Preamble In addition to questions of encoding format, this specification also enumerates a set of CIDF-compliant default primitive data types and semantic-identifiers (SIDs) used when expressing individual payload fields. The primitive data types, presented in Appendix A, define the available encoding used for field representation. They are the default list of primitive data types; this list may grow as needed in future revisions. Semantic-identifiers (SIDs), in Appendix B, provide standard identifiers that gido consumers may use to interpret the various data fields within a payload expression. As a rule, one SID is associated with each data field in the payload expression, and consumers use this semantic identifier when determining whether and how to process the field. 2.4.2 GIDO Grammar Following is the grammar for the language in BNF. Terminal symbols are represented in upper case. Literal characters are enclosed in quotes ("). ::= | "(" item-list ")" | ::= "(" ")" | "(" "@" DATA-LOCATOR | "(" "def" SID SEMANTICS ")" ::= | "(" ")" ::= | ::= SID | TYPE | NAME ::= | ::= DATA | "(" ")" Using this grammar, data fields are coupled with semantic identifiers parenthetically. A SID indicates how its associated data element is syntactically represented as well as the data element's semantic content. A collection of parenthetical SID/Data tuples can themselves be grouped together in outer parentheses, indicating an explicit *association* of the SID/Data tuples (i.e., they represent attributes of a larger element in the expression). SID grouping is discussed further, with illustrations, in Section ??.??. A DATA-LOCATOR provides information for locating additional expressions that are described by the given . A SID is a unique token for a semantic identifier. TYPE is one of the primitive types specified in Appendix A. NAME identifies a named element of a structure. DATA is a data literal. CIDF Specification: Version 0.5 Page 20 2.4.3: GIDO S-expression Examples The following sections illustrate four ways of using S-expressions to encode payload data structures. Other, more complex S-expressions are possible. However, to reduce parsing complexity these four methods are recommended, and employed throughout the remainder of this document. Using these four basic payload forms, gido consumers can decode and comprehend the individual fields of gidos, even though application developers may vary in the ways they order the payload or include/exclude various field elements. 2.4.3.1: Embedded Semantics and Data Payload Example This form is used for expressing field-oriented lists of data, where the data is embedded within the message. The format consists of a series of tuples, one tuple per data field. Each tuple consists of a semantic identifier followed by its associated data item: Format: (SID-1 data-exp-1)(SID-2 data-exp-2) . . . (SID-N data-exp-N) In this format, the semantic definition of each data field is embedded as the first token in the field, providing a self-defining message format. A consumer can parse the message for those SIDs it understands and desires to analyze, and discard data fields containing unknown or unwanted SIDs. As discussed in Appendix B, each SID has an associated data type, which completes the self-definition of the message. Thus, by parsing the SID tokens, the consumer knows both how to interpret each data element, and how the data elements are syntactically represented. 2.4.3.2 Separation of Data from Semantics Payload Example This form is used for expressing field-oriented lists of data, where the data is not necessarily embedded within the message. Rather, the message contains the semantics for interpreting the data, and the data- location field points to the actual storage location where the consumer may find the target GIDO. Format: (@ sid-exp-1 DATA-LOCATOR ) Note: the data location is indicated by [data-location], where the data is encoded as (data-exp-11 data-exp-2 ... data-exp-N This message form is detected by the "@", which indicates that the message is a "call by reference" expression. The message continues with a sid-exp that defines the form and semantics of the data. The actual data structure is then located by consulting the DATA-LOCATOR. Using this format, E- and A-boxes concisely specify the semantics of the event record or analysis report, and then point the consumer to where the record or report is located. An example DATA-LOCATOR is presented in Section 2.4.3.2.1. CIDF Specification: Version 0.5 Page 21 ##################################################################### # Editor's Comment: The DATA-LOCATOR is inherently a very # implementation dependent mechanism. In a future revision, we # will discuss guidelines or recommendations for how developers # should present DATA-LOCATORs in CIDF message payloads. ##################################################################### 2.4.3.2.1: Data Locator Example This section suggests an example data locator implemented as a URL. In this example, a data locator is implemented as a URL whose content is interpreted according to the preceding SID-stream. For instance, if the interpreter sees ( * ( IPAddr Port ) http://host.net/data.cidf ) and the data at that location is ( 0x80098057 23 ) then the interpretation of this data is IP address 128.9.128.87, port 23 On the other hand, ( * ( UID Instance ) http://host.net/data.cidf ) might be interpreted as UID 0x80098057, instance 23 (or whatever). Protocol Sketch: The idea is that the data at the location should be encoded as: application/x-cidf-compressed In other words, one or more S-expressions reside at that location in one lump. The interpreter should make an HTTP request to the named location, expecting a document of either type. As a fallback, the type application/octet-stream can be used to express the compressed CIDF S-expressions. This allows us to leverage the existing HTTP work, while supporting widely distributed data management. Security Considerations: We might need to address the desire to encrypt and sign the data to prevent corruption and information leakage. This can be done within the content (using something like MOSS?) or it might involve SHTTP. Not sure what the best thing to do is here. 2.4.3.3 Pre-defined Constant Payload Format CIDF Specification: Version 0.5 Page 22 This form allows for semantics of predefined message structures to be conveyed to consumers once. From that point forward, consumers can receive and interpret raw data structures without the overhead of embedded SIDs. This form is highly efficient for transporting high- volumes of the same message type. This form is also used for enumerating a pre-defined set of CIDF E/A-box messages (see Section 2.5). A gido producer begin the message exchange by sending the consumer a message definition statement. The "def" defines a new SID that can be used subsequently. SID indicates the semantic identifier being defined. SIDs are special identifiers in the language. Attempting to define a SID that is already defined is an error. NAME provides an identifier for the SID that is used to display the SID. sid-exp-1 defines the SID in terms of SIDs and TYPEs that are already defined. sid-exp-1 may only contain SIDS that have been predefined either because they are included in an appendix to this document or they have been defined in a prior definition. SEMANTICS provides any additional semantics not already provided in sid-exp-1. Currently SEMANTICS is a literal character string that provides a natural language description of the additional semantics. Format1: (def SID sid-exp-1 SEMANTICS) ##################################################################### # Editor's Comment: The event subgroup has not resolved the # issue of scope for dynamically defined SIDS. ##################################################################### 2.4.3.4 Arbitrary Payload Forms and Extensions Payload expression can also contain complex data structures such as variant records or arrays. In fact, future extensions of the message format could employ the full expressive power of S-expressions, providing arbitrarily complex records, arrays, arrays of records, messages within messages, verbatim content, etc. (Full enumeration of these advanced formats should be completed in a future revision of this specification.) The following are examples of available formats: Record Format Example: ((record (SID-1 SID-2 SID-3)) (data1 data2 data3)) Array of Records Format Example: (((array (record (SID-1 SID-2 SID-3 SID-4))) ((data1 data2 data3 data4) (data1 data2 data3 data4) (data1 data2 data3 data4))) CIDF Specification: Version 0.5 Page 23 Verbatim Format Example. All ASCII literals such as spaces, tabs, and carriage returns are embedded within the verbatim string): (verbatim "root:*:0:0:Johnny &:/root:/bin/tcsh Super:*:0:0:Bourne-again Superuser:/root: daemon:*:1:1:Owner of system processes:/root: bin:*:3:7:Binaries Commands Source,,,:/:/none games:*:7:13:Games pseudo:/usr/games: man:*:9:9: Man Pages:/usr/share/man: ingres:*:267:74:& Group:/usr/ingres:/bin/csh nobody:*:65534:65534:Unprivileged :/none:/none ftp:*:20:20:Anonymous :/homes/ftp:/usr/bin/false +:*:0:0:::") CIDF Specification: Version 0.5 Page 24 ======================================================================== = 2.5: Example CIDF Module GIDO Sets ======================================================================== This section enumerates example sets of internal status messages that each CIDF-compliant E-, A-box, and R-box may choose to support. These message sets are not mandatory, but recommended as a consistent way of conveying internal module information. ##################################################################### # Editor's Comment: Recommendations for R-box message sets are # forthcoming. ##################################################################### 2.5.1 Recommended E-Box Message Set E-boxes can employ the following messages for basic internal information transfer to consumers. These messages are all formatted using pre- defined constant payload expressions (see Section 2.4.3.3, Format1), and contain E-box internal operation information. (See Appendix A for the SID to data type listing, and Section 3.2.3 for the list of Class ID codes.) Message ID: EB-Owner Description: Returns the hostname of the machine where the E-box is running, the machine's IP address, the port number assigned to the E-box (-1 if NA), E-Box process ID, identification of E-box developer, and revision number of the E-box. Priority: 5 Msg. Format: (def EB-Owner (struct HostName IP_Address Port PID DeveloperID RevisionNo)) Message ID: EB-Target Description: Returns the hostname of the monitoring target, the IP address of the target, the port number assigned to the target if a network service, the process ID of the target, and an identifier indicating the type of event stream through which the target is being monitored. Priority: 5 Msg. Format: (def EB-Target (struct HostName IP_Address Port PID EventStreamID)) Message ID: EB-Status Description: Returns a timestamp indicating uptime for the E-box, the transfer messages to the consumer (synchronous polling, asynchronous forwarding, trap, other), events parsed per second, bytes parsed per second, records sent since uptime, bytes sent since uptime, internal E-Box errors produced since uptime. Priority: 5 Msg. Format: (def EB-Status (struct UpTime ReportMethod RecsPerSec BytesPerSec SentRecsCnt SentByteCnt ErrorCount)) CIDF Specification: Version 0.5 Page 25 Message ID: EB-Transport Description: Returns an identifier for the current transport mechanism being used, the revision number of the transport software, and the list of available transport mechanisms for this E-box. Priority: 5 Msg. Format: (def EB-Transport (struct CurrentTrans RevisionNo AvailableTrans)) Message ID: EB-Error Description: Returns an internal error code produced by the E-Box, a textual description of the error, and a severity code (e.g., fatal, non-fatal, potential data loss). Priority: 3 Msg. Format: (def EB-Error (struct EB-ErrorCode ErrorDesc Severity)) Message ID: EB-Warning Description: Returns an internal warning code produced by the E-Box and a textual description of the warning. Priority: 4 Msg. Format: (def EB-Warning (struct EB-WarnCode WarnDesc)) Message ID: EB-FilterStatus Description: Returns the current filter array that identifies which of the available events the E-box is currently generating and returning to the consumer. Priority: 5 Msg. Format: (def EB-FilterStatus CurrentFilterArray) 2.5.2 Recommended A-Box Message Set A-boxes can employ the following message for basic internal information transfer to their consumers. These messages are all formatted using pre-defined constant payload expressions (See Section 2.4.3.3, Format1), and contain A-box internal operation information. Message ID: AB-Owner Description: Returns the hostname of the machine where the A-box is running, the machine's IP address, the port number assigned to A-box (-1 if NA), A-Box process ID, identification of A-box developer, and revision number of the A-box. Priority: 5 Msg. Format: (def AB-Owner (struct HostName IP_Address Port PID DeveloperID RevisionNo)) Message ID: AB-Target Description: Returns the hostname of the analysis target, the IP address of the target, the port number assigned to the target if a network service, the process ID of the target, and the module identity of the E-box through which the target's operational activity is being monitored. Priority: 5 Msg. Format: (def AB-Target (struct HostName IP_Address Port PID ModuleIdentity)) CIDF Specification: Version 0.5 Page 26 (Question: ModuleIdentity assumes a single E-to-A relationship. Need to handle multi-E-box analyses?) Message ID: AB-Status Description: Returns a timestamp indicating uptime for the A-box, the transfer messages to the consumer (synchronous polling, asynchronous forwarding, trap, other), event records parsed per second, bytes parsed per second, reports sent since uptime, bytes sent since uptime, internal A-Box errors produced since uptime. Priority: 5 Msg. Format: (def AB-Status (struct UpTime ReportMethod RecsPerSec BytesPerSec SentRecsCnt SentByteCnt ErrorCount)) Message ID: AB-Transport Description: Returns an identifier for the current transport being used, the revision number of the transport software, and the list of available transport mechanisms for this A-box. Priority: 5 Msg. Format: (def AB-Transport (struct CurrentTrans RevisionNo AvailableTrans)) Message ID: AB-Error Description: Returns an internal error code produced by the A-Box, a textual description of the error, and a severity code. Priority: 3 Msg. Format: (def AB-Error (struct AB-ErrorCode ErrorDesc Severity)) Message ID: AB-Warning Description: Returns an internal warning code produced by the A-Box and a textual description of the warning. Priority: 4 Msg.Format: (def AB-Warning (struct AB-WarnCode WarnDesc)) Message ID: AB-FilterStatus Description: Returns the current filter array that identifies which of the available analysis reports the A-box is currently building and returning to the consumer. Priority: 5 Msg. Format: (def AB-FilterStatus CurrentReportingArray) CIDF Specification: Version 0.5 Page 27 ======================================================================== = 2.6 Semantic Identifiers ======================================================================== 2.6.1 Rules and Guidelines for Defining SIDs Other specifications MAY define SIDs for use with the CIDF framework. If a CIDF component generates or uses those SIDs, those SIDs MUST be defined in conformance to the rules here and SHOULD be defined in conformance with the guidelines here. o Every SID MUST have a unique name. o Every SID's definition MUST include precise syntax. o Every SID's definition SHOULD include precise semantics. o The SID description must fully explain the intended use of SID (i.e., the intended data arguments must be described) ##################################################################### # Editor's Comment: The Event Subgroup is investigating naming # conventions and rules for SID enumeration to eliminate the # potential for SID ID reuse. ##################################################################### Specifiers SHOULD avoid defining a SID whose meaning overlaps another, unless one SID P is strictly more specific than another SID Q. If P is strictly more specific, then whenever (P X) is true, (Q X) must also be true. A SID, its arguments, and the parentheses that surround them are together called a "sentence". If one sentence contains another, the outer sentence is called the "super sentence" and the inner one is called the "sub sentence". A SID MUST be so defined that when the SID occurs in an event record, the truth of its sentence is independent of the peer sentences, the super sentence's peer sentences, the super super sentence's peer sentences, and so on. Thus, a sentence cannot *modify* the meaning of a peer sentence. It can only augment the the peer sentence. (The logical relationship between peer sentences is conjunction.) This is critical because a consumer may ignore some peer sentences. Specifiers should be wary when defining a set of closely related SIDs, since a consumer may understand some of the SIDs and not others. If two data items can be properly understood together but cannot be properly understood singly, then it is advisable to define a single SID that takes both data items as arguments. 2.6.2: Rules and Guidelines for Using SIDs Whenever a component puts a SID into an event record/GIDO, the SID MUST be used with the number of arguments (usually one) that the SID's definition calls for. The SID's argument(s) MUST have the syntax and meaning that the SID's definition calls for. Otherwise the component is OUT OF CONFORMANCE with the SID's definition. A component that generates GIDOs MUST generate them in conformance with all of the SID definitions in this specification. CIDF Specification: Version 0.5 Page 28 Whenever the above rule permits, a component generating a GIDO SHOULD use a SID from this specification and SHOULD avoid the SIDs defined in the Syntactic Tokens section. If the only suitable SID in this specification is in the Syntactic Tokens section, then an implementation MAY use it or define a new SID; defining a new SID is usually better. If a component generating GIDOs uses a SID from a particular specification, and if that specification defines two applicable SIDs, one of which is strictly more specific than another, then the component SHOULD use the more specific one. If CIDF component X creates a GIDO and CIDF component Y later has a copy of the GIDO and passes it verbatim to CIDF component Z, then Y MAY do so even if the GIDO violates the above rules and guidelines. This provision frees D-boxes and such from having to thoroughly understand and validate every GIDO they process. Further, Y may combine the GIDO with other GIDOs into a compound GIDO and still, as long as the original GIDO is included verbatim and clearly ascribed to its originator, Y MAY send it even if the record violates the above rules and guidelines. ##################################################################### # Editor's Comment: The Event Subgroup has not yet specified # compound GIDOs, but intends to propose such structures shortly. ##################################################################### However, if the CIDF component modifies any part of the GIDO, then it is responsible for the GIDO's compliance with the above rules and guidelines. 2.6.3: Rules for Interpreting SIDs If a component receives a GIDO containing a SID whose definition the component does not know, then the component MUST ignore the SID's subsentence. The component MUST process the GIDO as if the subsentence were not there. This rule allows new producers to add new information to a GIDO with the assurance that they will not break legacy consumers. CIDF Specification: Version 0.5 Page 29 ======================================================================== ======================================================================== = = 3: Encoding Gidos = ======================================================================== ======================================================================== = ======================================================================== = 3.1: Introduction to Gido Encoding ======================================================================== In encoding a gido into actual bytes for storage, tranmission, etc, two things are involved. Firstly, every gido is accompanied (in perpetuity) by a static format header which contains basic information about that gido. This header format is described in section 3.2. Secondly, the S-expression which forms the payload of the gido must also be encoded. The method for doing this is covered in section 3.3 CIDF Specification: Version 0.5 Page 30 ======================================================================== = 3.2: Gido Header ======================================================================== 3.2.1: Introduction The header definition, presented in this section, consists of a series of constant fields that gido consumers can reliably parse to read basic data common across all gidos. The gido s-expression payload, presented in a preceding section, contains the actual IDS component-specific data structures, including semantic identifiers that allow gido consumers to decode and interpret individual fields. The gido header is used to convey information about the gido itself, rather than details of the event, analysis report, or response prescription (which are captured in the payload). Each CIDF-compliant gido generated by any component MUST contain these fields in this order (for this version). Consult Appendix A for details on type definition. 3.2.2: The Header Fields 1. Version ID (type revision). Indicates the format revision used to encode this gido. Initially, the Version ID will indicate CIDF Version 1.0 (major = 1, minor = 0). This Version ID will be incremented as future versions are introduced. All current and future versions of this specification must reserve the first field of the gido header for the Version ID. Gido consumers may reliably use this field to detect the format of the remainder of the gido. ##################################################################### # Editor's Comment: This field suggests that CIDF revision # identifiers will follow a major.minor format. The CIDF working # group must decide if this is the proper revision format, and must # then define the meaning of major and minor revision indicators. ##################################################################### 2. Gido Length (4 octets, big-endian). Indicates the byte length of the entire gido, including this header but excluding any optional digital signature. This field may be used to cross- check gido completeness. CIDF Specification: Version 0.5 Page 31 3. Time Stamp (8 octets, big-endian). The time stamp field comprises two subfields. The first four octets indicate the seconds since 00:00:00 UTC, January 1, 1970. The second four octets provide millisecond granularity. The purpose of this field is to enable CIDF components to determine the age of the information contained in the gido. This time is used to sort gidos based on their age in support of discarding "old" information or finding all information in a database between two times. This time refers to the time of the latest system event that contributed to the gido. For example, if the information is an analysis result, the header time stamp field refers to this time of the latest system event that was used in computing the analysis result. For gidos containing single events or sequences, the time stamp would be the time the last event in the gido occurred. For internal gidos (e.g., database queries or intermediate analysis results) the time stamp value should be set to the value that most closely represents the age of information contained in the gido. 4. Thread ID (4 octets). Used to identify gidos with some common thread; all gidos about a given event (e.g., first report followed by successive updates) would share the same Thread ID. 5. Class ID (2 octets). Formerly named Priority. Indicates the category that the event, analysis, or response generator believes the gido falls under. Class IDs are defined in Section 3.2.3. This field is intended to allow receivers to process high-priority gidos in a given field of expertise before all others. Note that some codes are reserved for user-defined Class IDs; the receiver must check to see if prior agreement exists between sender and receiver on these codes. 6. Originator ID (unknown type). A unique identifier associated with the component generating this gido. ##################################################################### # Editor's Comment: The format and semantics of the Originator ID # is an open issue that requires resolution by the CIDF working group. # Specifically, how will CIDF modules be uniquely identified from other # CIDF modules? ##################################################################### 7. Flags (1 octet). The bits of this flags octet are to be interpreted according to the following table: Bit Meaning --- ------- 0 (LSB) set = optional signature present (see below). clear = no optional signature 1-7 (MSB) reserved (MSB = most significant bit) CIDF Specification: Version 0.5 Page 32 The gido payload, plain or compressed, immediately follows the header. If bit 0 in Flags is cleared, indicating no optional signature, the gido ends with the payload (indicated by the Gido Length header field). Otherwise, if bit 0 is set, indicating that a digital signature of the content is present, this signature is contained in a structure following the gido payload. Recall that the Gido Length header field indicates the end of the gido payload, not including the signature structure. The signature structure has the following fields in it: 1. Signature Length (2 octets). Indicates the length, including this field (signature length), of the signature structure, in octets. 2. Key ID (type unknown). Uniquely identifies the key used to generate the signature. This ID may be understood only by a given receiver if the gido is to be sent one-to-one. This field also implies the signature algorithm. ##################################################################### # Editor's Comment: This issues is tied up with that of originator-id # ##################################################################### 3. Signature data. The entire gido represented by the Gido Length header field is passed through a gido digest, resulting in a short, fixed-length quantity. This quantity is then signed using the applicable encryption/signature algorithm, and the result of this operation placed in this field. 3.2.3 Class ID Codes CIDF Specification: Version 0.5 Page 33 The following default Class ID codes are defined for events and analysis results. Under this scheme, class ids 0 thru 15 are reserved for CIDF event priorities, and 16 thru 31 are reserved for analysis report priorities. In addition, class ids 32 thru 127 are reserved for future CIDF extensions. IDS developers may use the remaining range (128 thru 255) for application-specific purposes. (Default Event Class IDs) 00 - Complete Event 01 - Intermediate Event 02 - Incomplete Event 03 - E-box Internal Error Report 04 - E-box Internal Warning Report 05 - E-box Internal Status Message 06 - Reserved for E 07 - Reserved for E : 15 - Reserved for (Default Analysis Class IDs) 16 - Critical Security Violation 17 - Potential Security Violation 18 - Suspicious Report 19 - Warning Report 20 - Intermediate Result 21 - Informational Report 22 - A-box Internal Error Report 23 - A-box Internal Warning Report 24 - Reserved for A 25 - Reserved for A : 31 - Reserved for A (Reserved Priority Code Range) ##################################################################### # Editor's Comment: Class ID code range 32-48 is reserved for # R-Box countermeasure directives. ##################################################################### 32 - Reserved for future use 33 - Reserved for future use : 127 - Reserved for future use (Undefined Priority Codes) 128 - Undefined : : (Undefined values may be employed for : application-specific purposes.) 255 - Undefined CIDF Specification: Version 0.5 Page 34 ======================================================================== = 3.3: Encoding S-Expressions ======================================================================== GIDO payloads consist of S-expressions. However, these S-expressions are translated to an octet encoding format for efficient transmission or storage. ##################################################################### # Editor's Comment: The subgroup considered the optional use of plain # text message bodies, that would support verbose but human-readable # encodings. Plain text, human- # readable encodings would still retain the full expressive power # of the octet encoding presented here. We may reconsider the # utility of plain text messages for purposes such as debugging. ##################################################################### The octet encoding of message payloads support highly efficient transmissions of messages. This section describes how to transform an S-expression into the appropriate octet encoding. This encoding is designed to meet the following objectives: * It must indicate the structure, so that a component ignorant of the elements within the S-expressions will still be able to parse the S-expressions. * It must allow for pre-defined and distributed-out-of-band SIDs. * It must allow for both abstract and concrete syntax. ##################################################################### # Stuart Note: What does this last requirement mean? Surely the # byte encoding is a concrete syntax - by definition. ##################################################################### * It should be as compact as possible. 3.3.1: Octet Codes The following codes will be used to represent various octet values in the succeeding encoding specifications. They are *not* S-expression atoms. Code Value Interpretation ---- ----- -------------- SEP 0xff Used as separator. SOPEN 0xfe S-expression open. PTR 0xfd Pointer (referred to as @). SID 0xfc Prelude to SID 2-octet code. TYPE 0xfb Indicates concrete syntax type. 3.3.2: Encoding of S-Expression Grammar What follows is the grammar for CIDF S-expressions. After each line we give the encoding applicable to that line. ::= E() = E() CIDF Specification: Version 0.5 Page 35 ::= ( ) E() = E() ::= E() = E() E() ::= ( ) E() = SOPEN E(length{E() E()}) E() E() E(length{X}) = short_encode(X) ::= ( @ ) E() = SOPEN PTR E() E() E() = ascii_encode() ::= ( def ) E() = SOPEN E(length{E(def) E() E() E() E(def) E() E() E() E() = SID sid_encode() E() = ascii_encode() ::= E() = sid_encode() ::= ': E() = TYPE type_encode() sid_encode() ::= ( ) E() = SOPEN E(length{E()}) E() ::= E() = E() ::= E() = E() E() ::= E() = E() ::= E() = E() E() ::= E() = E() ::= ( ) E() = SOPEN E(length{E() E()}) E() E() 3.3.3: Auxiliary Functions CIDF Specification: Version 0.5 Page 36 The following functions are used in the above syntax and encoding: ascii_encode() returns the ASCII-encoding of . short_encode() returns the big-endian expression of . (E.g., short_encode(1234) = 0xd204.) sid_encode() returns the 2-octet code for . type_encode() returns the SEP-terminated code for . 3.3.4: Encoding Data Data may be encoded in one of two ways. If the applicable SID was concretely typed (that is, preceded with a ': sequence), then the data is encoded exactly as specified by the type; e.g., a ulong is encoded as four octets in big-endian order. Otherwise, the data is encoded in ASCII. This means that octet streams that use the 8th bit must be marked as byte streams using some declaration such as 'array(32,byte):. ASCII-encoded data is variable-length; hence, it must be SEP-terminated. 3.3.5: SID Codes SIDs are encoded as 2-octet values. A list of pre-defined SIDs is given in the appendix; if one exists for the purpose, it SHOULD be used. However, this encoding furnishes the ability to define new SIDs should no applicable one exist, using the "def" operative. As noted in Section 3.3.2, this requires one to define a new SID code. These SID codes may be unrestricted, but they should conform to the following standard: * The code is a 2-octet value, as stated above. * The MSB (bit 7) of the first octet is the DYNAMIC bit. If this bit is set, this is a dynamically-defined SID, and the code for the actual SID is given by bit 5 of the first octet through the LSB (bit 0) of the second octet. If it is clear, this is a statically-defined SID, and the code for the SID is as given in the appendix. * If the DYNAMIC bit is set, then the next bit (bit 6 of the first octet) is the EXPERIMENTAL bit. If this bit is set, then the SID is ephemeral and cannot be relied on in future encodings. If it is clear, then this is a stable SID. CIDF Specification: Version 0.5 Page 37 ======================================================================== ======================================================================== = = 4: CIDF Communication = ======================================================================== ======================================================================== = ======================================================================== = 4.1: Message Layer ======================================================================== 4.1.1: Rationale for Message Layer The CIDF message layer was developed to solve problems of synchronization (i.e., blocking vs. non-blocking processes) and problems of different data formats for different operating systems. It also solves the problem that different groups will use different programming languages. In other words, the use of a messaging format achieves the following goals: * Independent of blocking/non-blocking processes * Data format independent * Operating system independent * Programming language independent 4.1.2: Objectives of the CIDF Message Layer The top-level objectives for the CIDF message layer are to * Provide an open architecture. * Avoid imposing architectural constraints or assumptions on the systems or modules. * Allow messaging independent of language, operating system, and network protocol. * Support easy addition of new components to the CIDF. * Support security requirements for authentication and privacy. * Support devices that don't want to fully support CIDF. 4.1.3: Message Format This message structure resides on top of the negotiated transport layer service. Note that all reserved fields are set to 0 on transmission and ignored on receipt. CIDF Specification: Version 0.5 Page 38 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Control Byte | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time Stamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Data (variable) | ~ ~ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Privacy Trailer* (variable) | +-+-+-+-+-+-+-+-+ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * if privacy option is used Options all have a common type-length-value format described below. * Version - 1 octet. CIDF message-layer version (1 for this initial version). * Control Byte - 1 octet. Used by the message layer to support reliable transmission, flow control, and security association management. - Acknowledgement of a delivered message (1). - Message received, but not delivered because of lack of resources (2). - Message received, but the supplied security association was not available to all processing (4). * Checksum - 2 octets. A checksum across the entire CIDF message, prior to application of cryptographic mechanisms (i.e., privacy and authentication transforms). The checksum is computed as specified in the TCP standard (RFC 793). CIDF Specification: Version 0.5 Page 39 * Next Header - 1 octet. Defines the type of either the next message layer option or application. The following are the currently defined types. - Application Header (1) - Route List (4) - Privacy Header (50) - Authentication Header (51) * Length - 4 octets. Length of the CIDF message, including message header. * Sequence Number - 4 octets. Message layer sequence number used for message reliability (acknowledgement and duplicate removal) and to support protection against message replay. * Time Stamp - 4 octets. Used to provide loose time synchronization between CIDF communicating parties and to support tardy delivery detection (from denial of service). * Destination Address - 4 octets. IP address of the target of this message. This field identifies the eventual recipient of the CIDF message and is used to route CIDF messages through intermediate CIDF nodes that cannot be traversed by normal network routing (e.g., firewalls). 4.1.4: Message Layer Protocol Options Except for the CIDF privacy option, CIDF message options use the following format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Option Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header - 1 octet. Defines the type of either the next message layer option or application, with the same permitted values as defined above. * Length - 1 octet. Specifies the number of 32-bit words for this option, including the next type and length fields. * Option Data - variable length. The option data field is always padded to a 32-bit aligned size. 4.1.4.1: Route List Option CIDF Specification: Version 0.5 Page 40 Route List is a variable length field that specifies the CIDF nodes through which the message is to be routed for source routing, and through which the message has been routed for recorded routing. The Subtype field indicates whether this is a source or record route. The Route List has the following format. The route list option is used when the message destination and source are separated by CIDF nodes that cannot be traversed by normal network routing (e.g., firewalls). 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | Subtype | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Route Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header and Length are defined above. * Subtype - 1 octet. Specifies whether this is a recorded route or a source route. - Recorded Route (1) - Source Route (2) * Index - 1 octet. Index into the array of addresses specifying the current address to be processed. For source routing, this is the address of the next CIDF hop. For recorded routes, this is the address of the last transmitting CIDF node. * Route Data - variable length. This field is an array of Internet addresses. Each internet address is 32 bits or 4 octets. For a source route, if the index is greater than the length, the source route is empty and the routing is to be based on the destination address field. For a recorded route, if the index is greater than the length, the recorded route list is full. 4.1.4.2: Privacy Option The CIDF privacy option supports both unicast or multicast privacy. For multicast privacy, one node of the multicast group is selected to generate the keys. The keys are then distributed to each multicast group member. For unicast privacy, each node generates its own privacy keys which are distributed to the remote party. CIDF Specification: Version 0.5 Page 41 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Generator Identity | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Security Parameters Index (SPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Data* (variable) | ~ ~ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Padding (0-255 bytes) | +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Pad Length | Next Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * (foot note) if the cryptographic algorithm requires use of an initialization vector, then that vector is placed as clear text between the SPI and Payload Data. * Key Generator Identity - 4 octets. This value identifies the CIDF entity that generated the key. The initial use of this field is to specify either the key generator's IP address or for multicast applications the multicast address for the multicast group using this security association. * Security Parameters Index (SPI) - 4 octets. The SPI is an arbitrary 32-bit value that uniquely identifies the Security Association for this message, relative to the key generator identity. * Padding - variable length. The transmitter may add up to 255 bytes of padding if required to support the block size of the cryptographic algorithm. Padding is required to ensure that after the privacy option is applied, the message ends on a 4-byte boundary. * Pad Length - 1 octet. The number of padding bytes immediately preceding it. The range of valid values is 0-255, where a value of zero indicates that no Padding bytes are present. * Next Header is defined above. 4.1.4.3: Authentication Header Option CIDF Specification: Version 0.5 Page 42 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Generator Identity | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Security Parameters Index (SPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication Data (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Next Header and Length are defined above. * Key Generator Identity - 4 octets. This value identifies the CIDF entity that generated the key. The initial use of this field is to specify either the key generator's IP address or for multicast applications the multicast address for the multicast group using this security association. * Security Parameters Index (SPI) - 4 octets. The SPI is an arbitrary 32-bit value that uniquely identifies the Security Association for this message, relative to the key generator identity. * Authentication Data - variable number of 32-bit words. The data (e.g., digital signature or keyed hash) used to provide cryptographic authentication. 4.1.5: Cryptographic Mechanisms The CIDF message layer protocol provides data integrity and source authentication services for the negotiation phase of CIDF communication. This enables components to reliably establish communications with minimal security overhead. During the negotiation phase, the client and server determine the specific cryptographic services to be provided for further communication. The message layer provides the cryptographic mechanisms as options, enabling use of lower-level services (e.g., IPSEC), CIDF-specific mechanisms, or no cryptographic services, depending on application requirements. The mechanisms used are determined by the client based on the mechanisms supported by the server. The message layer mechanisms provide the fields necessary to (1) determine the cryptographic services applied (if any), (2) determine the cryptographic context, and (3) provide timeliness and replay protection. 4.1.6: Negotiation Mechanism 4.1.6.1: Introduction CIDF Specification: Version 0.5 Page 43 Our approach is to use the simplest reliable transport mechanism available (i.e., reliable CIDF messaging over UDP) as the default CIDF transport protocol. This simple protocol can then be used to negotiate a more or less complex protocol for those components requiring additional transport-layer services. This allows simple devices to participate easily, while allowing complex devices to take full advantage of other transport-layer mechanisms. The message layer provides optional services to compensate for weaknesses in the transport layer. The combination of the CIDF message layer with transport-layer options provides a range of communication capabilities that can be used to support different application requirements. The following types of transport/messaging are initially envisioned: * No assured delivery over a connection-less transport. That is, the CIDF message layer without acknowledgement and retransmission directly over UDP. * Assured delivery over a connection-less transport. That is, the CIDF message layer with reliable delivery (acknowledgement, retransmission, and duplicate removal) over UDP. * Assured delivery over a connection-oriented transport. That is, the CIDF message layer directly over TCP. * Object-oriented transport. That is, the CIDF operations over CORBA. To enable support for components that must use minimal communication infrastructure, the default transport mechanism is based on UDP. The following sections define the default transport layer protocol, CIDF security services, and the transport negotiation mechanisms. 4.1.6.1.1: Rationale for negotiated transport layer The simplest approach would be to mandate the use of a single transport protocol. But there is no one protocol that can adapt to the varying requirements of all anticipated CIDF applications. Depending on whether an application is concerned with real-time traffic or simple accrual of a database of events, different transport mechanisms are appropriate. Specifically, some CIDF applications require a very light-weight communication channel that does not have the resource usage required by current TCP implementations, while other applications require a flexible and robust communication channel such as TCP. Other requirements include application-specific support for multicast, which is not supported by TCP. Therefore, we have requirements for connectionless communication, reliable connectionless communication, and reliable connection-oriented communication. Additionally, we have varying requirements for security services. In some applications and environments, the infrastructure provides adequate security services. In other applications, we require CIDF-layer security services for authentication, privacy, or both. CIDF Specification: Version 0.5 Page 44 Nevertheless, communications clearly cannot begin between two specific components until a channel is agreed upon. At the very least, this implies that if we don't agree on a single channel for all transport, we need to agree on a single channel for transport negotiation. This channel needs to be widely supported and freely available. Components are allowed to share data on whatever channel they wish, but they must support channel negotiation on the common mechanism. To support this range of requirements we provide a protocol based on the reliable UDP variant of CIDF that enables applications to agree upon the desired transport protocol, plus the desired CIDF message-layer security services. This exchange is only necessary if the participants have not previously agreed upon a transport mechanism through external mechanisms (e.g., local configuration settings or through the CIDF directory service). 4.1.6.2: Default Transport Layer The default transport layer protocol for CIDF messages is reliable CIDF messaging over UDP. Other transport layer protocols may be used following a negotiation using the default of protocols and services required and supported by the CIDF client and server. Until we acquire a well-known CIDF port number, we will use 0x0CDF as the CIDF port. The CIDF message layer will listen on the CIDF well-known port for incoming CIDF messages. 4.1.6.3: Conformant transport options * CIDF message layer without acknowledgement and retransmission directly over UDP. * CIDF message layer with acknowledgement and retransmission over UDP. * CIDF message layer directly over TCP. 4.1.6.4: Option Negotiation Message Formats The negotiation for more advanced communication services occurs over a UDP channel using only the CIDF message layer with authentication mechanisms enabled. This enables components that do not support TCP to participate in CIDF. Negotiation occurs by the client querying the server's capabilities. In response, the server specifies the class of CIDF operations supported, message services supported, and whether extensions are supported. The client then selects the services and message mechanisms. This information can also be provided by the directory server. The CIDF transport negotiation protocol resides directly over the CIDF message layer. The query-response data format is shown below. We assume that for cryptographic services, the negotiation of the specific algorithms and modes is handled by the key distribution mechanism. CIDF Specification: Version 0.5 Page 45 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Request (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Type - 1 octet. Specifies the type of request. For option negotiation messages, this value is 1. * Length - 1 octet. Specifies the number of 32-bit words for this message, including the type and length fields. Option Requests are formatted as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Request | Length | Option | Selection | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Parameters (variable) | ~ ~ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Request - 1 octet. Specifies the type of request. The following request types are currently supported. - Want (1) - Preferred service. - Can (2) - Sender is capable of using this service. * Length - 1 octet. Specifies the number of 32-bit words for this option request, including the request and length fields. * Option - 1 octet. The option being negotiated. The following option types are currently supported. - Transport (1) - Privacy (2) - Authentication (3) * Selection - 1 octet. The option value being negotiated. The meaning of this fields depends on the option being negotiated. The following selection values are currently supported. For Transport negotiation. - None (0). Used to reject communication with another CIDF node when no acceptable options are received. - UDP (1) - Reliable UDP (2) - TCP (3) CIDF Specification: Version 0.5 Page 46 For Privacy negotiation. - None (0) - IPSEC (1) - SSL (2) - CIDF (3) For Authentication negotiation. - None (0) - IPSEC (1) - SSL (2) - CIDF (3) Currently, the only option parameter specified is the selection of TCP/UDP port number for transport negotiation, which is formatted as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Transport Port Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Type - 1 octet. Specifies the type of option parameter. For port numbers, this value i