Ethereal-dev: Re: [Ethereal-dev] a modest proposal (range filtering RFC)

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Ed Warnicke <hagbard@xxxxxxxxxxxxxxxxxxx>
Date: Tue, 19 Dec 2000 00:10:04 -0500 (EST)
Comments inline...

On Mon, 18 Dec 2000, Gilbert Ramirez wrote:

<snip> 
> > 2)	[i ] 
> > 	Where i is the offset and the length is implied to be to 
> > 	the end of the RHS value given (or the length of the field 
> > 	types being compared).  
> > 
> > 
> > The problem with 2 is that the range operators [] doesn't 
> > actually bind to the LHS variable to which it appears to be 
> > attached, but rather binds to both the LHS (through the offset) and 
> > to the RHS (through the implied length of the RHS).
> > I find this quite counter intuitive.  Second it would 
> > be nice to use the range operator [] to be able to specify a 
> > single particular element in a sequence for use in a relation.
> > For example it would be nice if 
> > 
> > bootp.hw.addr[3] 
> > 
> > actually refered to the element at offset 3 in the sequence of bytes
> > making the variable boot.hw.addr, instead it appears to compare whatever the 
> > RHS value in the relation is to the bootp.hw.addr variable, starting at 
> > the third element and continueing out to the length of the RHS.  This
> > seems highly counterintuitive to me.
> 
> Part of my thinking on implementing that was the fact that the length (or
> final offset - 1) in the slice operator on the LHS does not have to be specified
> when there are a countable number of values on the RHS. You're right that it
> did come out somewhat counterintuitive.
> 
> When testing a slice of an ethernet address, e.g., I could say:
> 
> bootp.hw.addr[0:3] == 00:00:f6
> 
> But since the RHS is countable by the computer, I could just say:
> 
> bootp.hw.addr[0] == 00:00:f6
> I do agree that is is counterintuitive w/ regard to other programming languages,
> but we need to come up with a way to code this. This is different than:
> 
> bootp.hw.addr[0:] == 00:00:f6
> 
> since [0:] would mean "from 0 to the end of the field", which would produce
> 6 bytes, whereas "00:00:f6" is only 3 bytes long. See the subtle difference?

I absolutely see the subtle difference.  Given two frames, frame A and 
frame B with frame A having a bootp.hw.addr of 00:00:f6 and frame B 
having a bootp.hw.addr of 00:00:f6:ab:ac:ad the filter 

bootp.hw.addr[0 ] == 00:00:f6 

matches both frames A and B while the filter 

bootp.hw.addr[0:] == 00:00:f6 

matches only frame A.
I sympathize with the desire for a shorthand that led to your 
[0 ] notation, I'll  say more below of my opinions of its future.

<snipping those sections in which we are in agreement>
> > 
> > 3)	Ranges of the form [i:j] denote the elements of the sequence from 
> > 	the offset i to the offset j-1 (if j is positive).  So 
> > 	bootp.hw.addr[3:6] 
> > 	would denote the byte sequence  from the byte at offset 3 
> > 	bootp.hw.addr to the fifth byte (inclusive) of the bootp.hw.addr
> > 	variable.
> 
> This might be a point of contention; this might be a religious point.
> Should j refer to length or final offset? What are the advantages/
> disadvantages of both? 

Yes,  I admit it.  This is a religious point.  Having been caught up 
in the elegance with which python thinks about sequences I tried to slip
it's notion of slices in whole.  I would say that the advantage of 
the [offset, finaloffset +1] notation is that it is in harmony with 
python, a language growing in popularity, and so retaining the 
[offset:length] notation would be confusing as python users thought they
recognized a familiar convention and discovered they where wrong.

Switching would break existing filters and confuse existing users.

I propose the following compromise on the [i:j] case:

1)	Ranges of the form [i:j] (j positve) will be considered to be 
	[offset:length], as currently.

2)	Ranges of the form [i:-j] will be considered to be 
	[offset:maxoffset - j].  In otherwords [i:-j] 
	denotes the range from the offset i to -j length from the end 
	of the sequence.  

Examples: 

If for frame A bootp.hw.addr is 00:00:f6:00:00:01

then 

bootp.hw.addr[2:2] 

would be f6:00 and 

bootp.hw.addr[2:-1] 

would be f6:00:00

> 
> Most of the time, the RHS of a byte slice comparison will be countable
> by the computer, so explicitly specifing a j argument wouldn't
> be necessary (if we can come up with a good syntax for that).

This true.  The trick is coming up with a good consistent syntax for it.

> 
> Perhaps it's just me, but I find the [offset:length] slice easier to
> comprehend, at least in the context of packet analysis, than
> [start_offset:final_offset-1]. But this is subjective.

I agree that it is subjective.  I in the absence of the python 
influences I may even agree with you.  See the above proposed comprimise.

> 
> > 
> > 6)	Range will be bound to the variable directly to their left.
> > 	There will no longer be binding partially to the variable 
> > 	in a relation and partially to the value in a relation. 
> 
> (see my note above about letting the computer doing the counting)

This still feels somewhat wrong to me.  Not the part about the compiler 
doing the counting, but the general feeling that the [] operators 
SHOULD bind to the variable to their left.  Its one of these nagging 
aesthetic certainties that defies all logic and reason. 

It seems that we can agree on the following:

1)	The [i] form should refer to single elements at offset i in 
	the sequence.

2)	variable[i:] should refer to the sequence from offset i 
	in variable to the end of the sequence.

I've put out a comprimise for consideration on the [i:j] form.  

We agree that having the compiler do the counting is a good thing, but 
that it requires a better syntax then what is in use currently.
I stipulate that I want desparately to cling to my left binding of 
the [] operator but am willing to admit that this may be slightly 
childish.

You have not commented on my points 7 and 8 which suggested 

7)	To have the variable name without a range operator 
	refer to the whole variable.

8)	Extend ranges to all sequence types.  

If you could either accept, reject, or throw back for further discussion
my [i:j] comprimise suggestion and my points 7 and 8 then I think we can 
formulate a plan to move forward.

I look forward to hearing from you.

Ed