Draft:  draft-arberg-pppoe-mtu-gt1492-03
Reviewer: Harald Alvestrand [harald@alvestrand.no]
Review Date: Wednesday 4/19/2006 9:33 AM
IETF LC Date: 3/02/2006

Summary: Problem description is wrong. Solution is OK.

Unlike some of the commentary on the IETF list, I see no general problem 
with the solution, as long as we have decided that violating an IEEE 
standard is OK. Also, I see no big problem with publishing an Info 
document showing a somewhat-distasteful fix to a somewhat-distasteful 
problem.

But I really dislike the fact that the problem description is wrong.

This is the problem description, from section 2:

   <----- IPoE -----> <--------- PPPoE session --------->

                                      +-----+            +-----+
    +--+              +---+           |     |            |     |
    |PC|--------------| RG|-----------|DSLAM|------------| BRAS|
    +--+  <Ethernet>  +---+   <ATM>   |     |   <GigE>   |     |
                                      +-----+            +-----+

    Fig. 2: Next generation broadband network designs with PPPoE.

    In the network design shown in figure 2, fragmentation becomes a 
    major problem since the subscriber session is a combination of
    IPoE and PPPoE. The IPoE typically use a MTU of 1500 octets.
    However, when the Residential Gateway and the BRAS are the PPPoE 
    session endpoints, and therefore negotiate a MTU/MRU of 1492 octets 
    resulting in a large number of fragmented packets in the network.

But in this case, there is no big problem. Both ends of the session 
shown in the picture know what's going on, and can adjust expectations 
accordingly.
The REAL problem occurs in this config:

<----- IPoE -----> <------ PPPoE session --------------><-----Internet---->

                               +-----+         +-----+   +----+   +-------+
+--+              +---+        |     |         |     |   |    |   |       |
|PC|--------------| RG|--------|DSLAM|---------| BRAS|---| FW |---| Server|
+--+  <Ethernet>  +---+  <ATM> |     | <GigE>  |     |   |    |   |       |
                               +-----+         +-----+   +----+   +-------+

"FW" is a firewall. "Server" is a server that "PC" wants to access.
"RG" and "BRAS" set up the PPPoE session, and limit the MTU to 1492 octets. They see no problem.

But - "PC", "FW" and "Server" have no way to find out that there is a PPPoE session on the path at all.

In a normal Web experience, "PC" will send out a small query (Web retrieval), which will get to "Server".  "Server" will send back a reply in a 1500-byte
packet (with DF), which hits BRAS, which sends back "packet too big", 
"Server" adjusts its MTU for that session, and everyone's happy. One extra RTT
per session; no big deal.

The failure mode is when "FW" is configured (stupidly) to drop incoming ICMP "packet too big" messages.

"Server" never gets an ack to its packet, retransmits at the same size, never
gets an ack to that - and remains stuck in that mode forever.

I've tried to live in that mode; at the time (2003), it seemed that about 
20% of the Web was configured this way.

I believe the document would make a much clearer case for applying its 
fix if it explained that this is the scenario for which the PPPoE user
experience is mind-bogglingly bad.

Apart from that - I say go for it.