Ethereal-dev: [Ethereal-dev] a modest proposal (range filtering RFC)

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Ed Warnicke <hagbard@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 17 Dec 2000 18:37:32 -0500 (EST)
I've been looking through the dfilter code a bit, and would like to 
propose some fairly substatial changes.

Currently we have ranges for only FT_BYTES type fields. We do not 
have non-ranged relations for FT_BYTES.  I would 
in general say that ranges would be useful for any variable 
of a type which represents a sequence .  This would include FT_STRING,
FT_ETHER, FT_IPv4, FT_IPv6, and FT_BYTES.  Of these I would say minimally 
we should extend ranges to FT_STRING.  I would also like to see FT_BYTES 
comparable whole in the same sort of way that FT_STRINGS are.

Currently ranges are of one of two forms: 

1)	[i:j]
	Where i is the offset and j is the length.

or

2)	[i ] 
	Where i is the offset and the length is implied to be to 
	the end of the RHS value given (or the length of the field 
	types being compared).  


The problem with 2 is that the range operators [] doesn't 
actually bind to the LHS variable to which it appears to be 
attached, but rather binds to both the LHS (through the offset) and 
to the RHS (through the implied length of the RHS).
I find this quite counter intuitive.  Second it would 
be nice to use the range operator [] to be able to specify a 
single particular element in a sequence for use in a relation.
For example it would be nice if 

bootp.hw.addr[3] 

actually refered to the element at offset 3 in the sequence of bytes
making the variable boot.hw.addr, instead it appears to compare whatever the 
RHS value in the relation is to the bootp.hw.addr variable, starting at 
the third element and continueing out to the length of the RHS.  This
seems highly counterintuitive to me.

I would propose a move towards a python like standard for ranges through 
the following:
1)	Ranges of the form [i] denote the element of the 
	sequence at offset i, so 
	bootp.hw.addr[3] 
	would refer to the third byte in the bootp.hw.addr variable.

2)	Ranges of the form [i:] denote all elements in the sequence  from
 	the offset i to the end of the sequence, so 
	bootp.hw.addr[3:]
	would refer to the all bytes in the bootp.hw.addr variable 
	from the offset 3 to the last byte(inclusive) in the bootp.hw.addr 
	variable. 

3)	Ranges of the form [i:j] denote the elements of the sequence from 
	the offset i to the offset j-1 (if j is positive).  So 
	bootp.hw.addr[3:6] 
	would denote the byte sequence  from the byte at offset 3 
	bootp.hw.addr to the fifth byte (inclusive) of the bootp.hw.addr
	variable.

4)	Ranges of the form [i:-j] where j is positive will denote 
	the elements of the sequence from offset i to an offset 
	j from the END of the sequence.  So 
	bootp.hw.addr[3:-1] 
	would refer to the byte sequence from the byte at offset 3 
	to the next to last byte in the sequence.  So given a
	bootp.hw.addr with a length 6, bootp.hw.addr[3:-1] would be 
	equivalent to bootp.hw.addr[3:5]

5)	Ranges of the form [-i:j] will simply denote offset into 
	the bytes before the variable field in the frame as they 
	currently seem to do.

6)	Range will be bound to the variable directly to their left.
	There will no longer be binding partially to the variable 
	in a relation and partially to the value in a relation. 

7)	Variables for sequence types without any range operator 	
	will refer to the entire sequence making up that variable.
	So
	bootp.hw.addr
	will refer to all bytes in bootp.hw.addr and any relation 
	will be between all bytes in bootp.hw.addr and whatever 
	else is being compared to.  So 
	bootp.hw.addr == 0:8:20:A:A:A
	would only be true for proto_trees for which bootp.hw.addr 
	was exactly 0:8:20:A:A:A

8)	All variables which are composed of sequences shall filter 
	according to this standard (or whatever standard comes out 
	of the ensuing disagreement from this email).  Consistency
	is important.  This definitely includes FT_STRINGS, FT_BYTES,
	and probably includes FT_ETHER, FT_IPv4, and FT_IPv6.  Suggestions 
	as to other types which people consider a sequenced type 
	which they think should have these principles applied to them
	would be welcome.

I think that this set of actions would significantly clean up the 
filtering process and how it deals with sequenced types.  I am willing to 
code towards this, but there are substantial enough changes proposed 
here that I didn't want to make them without at least putting out a 
RFC so people could raise complaints.

Ed