I've been looking through the dfilter code a bit, and would like to
propose some fairly substatial changes.
Currently we have ranges for only FT_BYTES type fields. We do not
have non-ranged relations for FT_BYTES. I would
in general say that ranges would be useful for any variable
of a type which represents a sequence . This would include FT_STRING,
FT_ETHER, FT_IPv4, FT_IPv6, and FT_BYTES. Of these I would say minimally
we should extend ranges to FT_STRING. I would also like to see FT_BYTES
comparable whole in the same sort of way that FT_STRINGS are.
Currently ranges are of one of two forms:
1) [i:j]
Where i is the offset and j is the length.
or
2) [i ]
Where i is the offset and the length is implied to be to
the end of the RHS value given (or the length of the field
types being compared).
The problem with 2 is that the range operators [] doesn't
actually bind to the LHS variable to which it appears to be
attached, but rather binds to both the LHS (through the offset) and
to the RHS (through the implied length of the RHS).
I find this quite counter intuitive. Second it would
be nice to use the range operator [] to be able to specify a
single particular element in a sequence for use in a relation.
For example it would be nice if
bootp.hw.addr[3]
actually refered to the element at offset 3 in the sequence of bytes
making the variable boot.hw.addr, instead it appears to compare whatever the
RHS value in the relation is to the bootp.hw.addr variable, starting at
the third element and continueing out to the length of the RHS. This
seems highly counterintuitive to me.
I would propose a move towards a python like standard for ranges through
the following:
1) Ranges of the form [i] denote the element of the
sequence at offset i, so
bootp.hw.addr[3]
would refer to the third byte in the bootp.hw.addr variable.
2) Ranges of the form [i:] denote all elements in the sequence from
the offset i to the end of the sequence, so
bootp.hw.addr[3:]
would refer to the all bytes in the bootp.hw.addr variable
from the offset 3 to the last byte(inclusive) in the bootp.hw.addr
variable.
3) Ranges of the form [i:j] denote the elements of the sequence from
the offset i to the offset j-1 (if j is positive). So
bootp.hw.addr[3:6]
would denote the byte sequence from the byte at offset 3
bootp.hw.addr to the fifth byte (inclusive) of the bootp.hw.addr
variable.
4) Ranges of the form [i:-j] where j is positive will denote
the elements of the sequence from offset i to an offset
j from the END of the sequence. So
bootp.hw.addr[3:-1]
would refer to the byte sequence from the byte at offset 3
to the next to last byte in the sequence. So given a
bootp.hw.addr with a length 6, bootp.hw.addr[3:-1] would be
equivalent to bootp.hw.addr[3:5]
5) Ranges of the form [-i:j] will simply denote offset into
the bytes before the variable field in the frame as they
currently seem to do.
6) Range will be bound to the variable directly to their left.
There will no longer be binding partially to the variable
in a relation and partially to the value in a relation.
7) Variables for sequence types without any range operator
will refer to the entire sequence making up that variable.
So
bootp.hw.addr
will refer to all bytes in bootp.hw.addr and any relation
will be between all bytes in bootp.hw.addr and whatever
else is being compared to. So
bootp.hw.addr == 0:8:20:A:A:A
would only be true for proto_trees for which bootp.hw.addr
was exactly 0:8:20:A:A:A
8) All variables which are composed of sequences shall filter
according to this standard (or whatever standard comes out
of the ensuing disagreement from this email). Consistency
is important. This definitely includes FT_STRINGS, FT_BYTES,
and probably includes FT_ETHER, FT_IPv4, and FT_IPv6. Suggestions
as to other types which people consider a sequenced type
which they think should have these principles applied to them
would be welcome.
I think that this set of actions would significantly clean up the
filtering process and how it deals with sequenced types. I am willing to
code towards this, but there are substantial enough changes proposed
here that I didn't want to make them without at least putting out a
RFC so people could raise complaints.
Ed