version 1.0 2011.10.11 NANOG53 tues afternoon session notes Dave Temkin gets the ball rolling at 1434 hours, reminding people to fill out the surveys! Will start with "how many tiers; pricing in the internet" from Georgia Tech. Vytautas and others from Georgia Tech and Stanford thanks to Todd for warming people up to math. Conventional transit pricing; provider like cogent, customer like georgia tech, gets a blended rate from provider for 95th percentile traffic. charged each month on aggregate throughput, regardless of destination of the flow. price should recover total costs plus some margin. can be inefficient. uniform price, yet diverse resoure cost; client has no incentive to conserve resource to costly destinations. ISP side; lack of incentive to connect to costly destinations. Instead, can result in Pareto inefficient allocation. Loss of ISP profit, and client surplus due to congestion to far away places. Alternative: tiered pricing. price flows based on cost and demand. flows to EU cost more, flows to US cost less. Some ISPs already use tiered pricing paid peering backplane peering regional pricing cogent with georgia tech example, perhaps use different vlans, one bill, two lines, charge differently for different types of traffic. internet cases seen have limited use, with limited number of tiers. other transport networks, like parcel delivery, or trains, they have many, many more pricing tiers; why does the internet have so few tiers? How efficient is such tiered pricing? can ISPs benefit from more tiers? at other extreme, could charge differently for every destination in the world. how can you test this? construct an ISP profit model that accounts for: traffic demand of different flows servicing costs of different traffic flows Need real data to drive the model; feed real traffic and topology data from ISPs Then test the effects of tiered pricing. so, model first, then map data, then do number crunching. Certain assumptions going in: profit == revenue - cost (for all flows) Flow revenue: price * traffic demand (95th percentile) traffic demand is function of price. over time, demand may change based on price. How do we model and discover demand functions? Flow cost service cost * traffic demand servicing cost is a function of distance model has 3 big boxes: traffic demands, current prices, and network topologies that data comes from ISPs traffic demands generate demand models, derive demand functions cost models? use network topologies, derive relative costs based on distances in that topology. Don't know absolute costs. So, with demand functions + relative costs with current market pricings, you can find the actual pricing model. Demand=F(price, valuation, elasticity) valuation==how valuable flow is elasticity==how fast demand changes with price inelastic; change price, demand doesn't shift; elastic, change price, demand changes a lot. How do you find demand function parameters? We know current price, we know current flow, only thing they don't know is elasticity; so, they just run through range of elasticities and see if it matters in the output. Modeling costs is another hard question; everyone has different ideas about what the cost of their network is. They took many cost models; linear, concave, region, destination type; X axis is distance, y axis is cost of model. destination type may be discontiguous, sending to customers is cheaper than peers. even linear model starts off with some fixed costs; none of the models start at zero. Different models, along with topologies shows how far the traffic flowed, and give relative servicing costs. In the model, the real cost is relative cost scaled by gamma. normalize costs: Assume ISP is rational and profit maximizing Profit = Revenue - Costs = F(price,valuations, elasticities, real_costs) and real costs are Gamma * relative costs; you can solve for Gamma in each flow, and get a data mapping, to show demand plus cost. Last step is number crunching; select number of pricing tiers to test map flows into pricing tiers optimal mapping and mapping heuristics then find profit maximizing price for each pricing tier, compute profit. repeated above for 2x demand models 4x cost models 3x network topologies and traffic matrices result: constant elasticity demand with linear cost model; EU ISP, Internet2, and global CDN If you use 6 pricing tiers is almost identical to charge every flow differently, 2 tiers gets you to 80% of max. basically, tier1 is local traffic, tier2 is "all other" traffic. The traffic matrices are very different from parcel/train model; there's a lot of local traffic. CDNs are very good at delivering content. A lot local, and then falls off with distance. If your network has different topology, it might look different. big picture; looked at many more setups, linear cost, concave cost, logit demand, all graphs look very similar; they all end up reflecting the "local" vs "other" tier buckets gets you to 80% of your profit maxima. Having more than 2-3 pricing tiers only adds marginal benefit to ISP. results hold for a wide range of inputs. Current transit pricing strategies are close to optimal; if you have 2 or 3, you're good; if you have only 1, the client may migrate to someone who does have 2 or 3 tiers. Questions? Yi Liu, IPv4 and IPv6 interworking on lawful intercept, ATT you'll want to read the presentation in more detail; talk will be just high level. We've just exhausted IPv4 in feb of this year, however migration to IPv6 won't happen overnight due to complexity of the internet. there's strong business in IPv4; microsoft acquired 600,000 IPs for 11.25 per IP address; more costly than a domain name. plus, some customers may never want to retire IPv4, and some apps may not be able to migrate off. So, IPv4 and IPv6 coexistance will last for a very long time. ISPs need to support internetworking, so clients can talk to each other. This has impacts on lawful intercept, the ability to restore and report the original traffic characteristics for LEAs. CACmII is the information that needs to be reported for CALEA. Need to be able to figure out identifying information; two approaches to consider: integrated DNS64 and NAT64, as well as DS-Lite (v4-in-v6 and CGN combination) Goal here is to stimulate brainstorming, not present solutions. Overview of integrated DNS64 and NAT64 basically, the edge devices of the network become the intercept access points from which the identifying elements can be captured in the traffic. But with DNS64, the DNS modifies the address by adding an address pointing to the translator. And then the NAT64 does substitutions on the headers, which makes the resulting data not accurate anymore. The altered data is not of interest to the LEA, as it no longer identifies the original stream. To do this, need to report the headers of *both* IP versions in the same CACmII message; but the standard doesn't allow that. IASPs can do extra magic to resolve the original data; strip off the extra data added by DNS64. And then trace back the v6 address from the translated v4 address by going through the logs. You can represent v4 address in v6 format, so both headers exist in same version for CACmII requirement. option 2 is to push the job onto LEAs with IASP assistance; would need the definition rule of Pref64::/n from IASP. Or, we get standard revised to allow different version information in the same CACmII message. open questions: is the v4-mapped v6 address understood by LEAs? Are LEAs willing to restore the v4 address by themselves? Original 1994 CALEA law has been extended to require more and more work from the IASP; LEAs want the IASP to do more and more work; used to allow sending encrypting data, now they've pushed the IASP to decrypt the data first. Realtime traceback (retrieve v6 address from before translation given source port and timestamp). Data to be logged in the translator is huge, which raises costs. isn't currently available in hardware vendors, and there's no testing for it yet. DS-Lite--tuneling and CGN combined; v4 in v6 tuneling, from CPE of access network, and terminates on AFTR, tunnel. Each end is on private IPv4 network; private end isn't known, only public IP address on CPE is known. CPE is assigned with IPv6, and is used to target endpoint from LEA intercepted traffic (v4 packet) is hidden in v4-in-v6 softwire CGN (NAT44) alters original source IP of the communication session two possible solutions: option 1: decapsulate softwire and report v4 headers (initiator is private v4 address) option 2 is fancy magic: generate CACmII message based on combination of inner and outer v4 headers; tunnel v6 endpoint header is the only public address known represent v4 address in v6 format, to keep them in same version, to support the letter of the law. DS-Lite; open questions; 1) realtime traceback; is it feasible, technically and in terms of cost? 2) is reporting IP headers for private endpoints required by CALEA? Normally, private endpoints are invisible to ISPs. will reporting of v4 address lead to confusion when subjest is targetd by v6 address. is cost required by complex mediation process justified? Is the v4-mapped v6 address understood by LEA? Only difference with DSLite is the v4 in v6 tunnel introduced by it. This only talks about two transition mechanisms; there are other transition technologies being deployed out there. Goal is to provide lawful intercept in as transparent a method as possible. Goal is to deliver the most specific, accurate identifying information as possible; but cost is always a constraint, and it must make technical sense as well. If we get to single, unified guideline, this confusion will disappear. Q: how does this relate to the illegal wiretapping of US citizens? A: All the wiretapping has to be launched by lawful court orders; service providers cannot launch such a process by themselves. Q: Chris Morrow, Google--used to do CALEA for 4 years; not sure how v4 vs v6 matters; you tell edge device "this subscriber needs to go over there", you copy packets to mediation device, they get shipped over; he's confused by how transition mechanism makes a difference? Is mediation device too dumb to know more than just /32 of v4 address? A: you're right, mediation device should be able to restore original identifying information. Q: Fred Baker did cisco work; it understands v4, it should understand v6 A: But the v4 and v6 captured by the mediation device, how do you know it's the original customer information if you don't have the identifying information visible. It may be as simple as stripping off headers. Q: Alain from Juniper; why not overlay the two data, and send to the LEA? A: if you pass as it is, it may have been changed in translation, the LEA may not know it's the right information. Q: Why can't we push it on LEA? A: we used to be able to; decryption used to be burden of LEA; law proposed last year wants ISP to do data decryption for them. Q: Chris donelly, cable labs; depends on which side of NAT device you do lawful intercept on; if you do it in front of NAT, all the headers are still intact A: Right. Q: Marty; in terms of pushing costs onto the IASPs from the LEA; they're done on a cost recovery basis. A: They are proposing changes to the law, but they haven't passed yet; they keep asking Congress to pass them into law. David Saccon, Ericcson, unified MPLS from multiple parallel networks to support mobile, wireline, etc to single converged network. Mobile handsets authenticated with SIM based technology; wireless hotspots use mobile authentication, but packet core with fixed backhaul; there's elements of both networks coming together. IP core is carrying all types of traffic today already. Metro backhaul, look at use packet networks in areas that used to be TDM, microwave, and fiber. End up with one set of gear, one set of technologies to support them. Essentially, we're seeing a slow collapsing of layers. IP services layer has to be there; MPLS started off in IP core, and is moving into metro network, aggregation and even into access network; can serve as packet transport function that goes from core to access. We have fiber technologies, radio technologies, many different transport mechanisms in place; now, we can start to reduce the number of layers. If we have a packet transport/MPLS layer, we still need to scale network, so combine with OTN, WDM, and MPLS-TP to scale the bandwidth layers. Unified MPLS it's still label swapping as we know it; bidirectional circuit switched model still available, enhanced OAM and traffic monitoring. Use IP topology signalling mechanisms to calculate paths. Unified MPLS based on MPLS-TP at the access and metro layer, support bidirectional circuits with OAM, co-routed/fate-shared LSPs. At the edge and in the core, use IP/MPLS, allow for multi-segment pseudowires, with LSP stitching and LSP tunneling. Different connection models coming into the service nodes; from there, service node determines if it can be handled locally or handed off to the acess node. Access Node to Access Node, may need some interworking between MPLS flavours to make it work. AN-to-AN partitioned model; LSP to LSP stiching, or multi-segment pseudowires. Or layer it; tunnel data from one side through the other side, pop out the other side. IP/MPLS islands over MPLS-TP core, or vice versa. If you have an IP/MPLS network and want to build out new networks on it, MPLS-TP can be tunneled over IP/MPLS core. LTE backhaul example. IP is transport end to end, over 3PP, evolved packet core. hub-spoke model; backhaul traffic to hub PE, then hand off to other sites? actual bandwidth for these connections for 3GPP for LTE isn't very high. With no control plane involved, can easily meet requirements for 50ms failover. We can have a provider edge device which is termination for transport network, but also process IP packet, and be next hop for enodeB no external NNI needed. Unified PE architecture; full functioning edge device; can re-use it if it makes sense for other services, like video delivery, wifi integration. service mappings: bring together transport and IP side. different ways of doing it; start to think about ways to collapse the network, have different elements work together. Announcements! bylaws, accept program committe nominations until 5pm; break ends at 4:30, IPv6 part 2, DNS, peering bofs, for those in the thick of v6, come to part 2 of the transition talk. newcomers debrief at 8:30 for those who came to the breakfast; tell us what you got out of this, what can we do better. NOG lab is open; thanks to break sponsor, NTT...video cuts dave off mid-sentence. BREAK time. DNS track is up next on the video. Matt, Verisign is up to talk about DNSSEC deployment. Verisign had a hand in signing root on July 15, 2010 edu signed July 28 2010 net signed Dec 9 2010 com signed March 31 2011 So, well over half of all domains worldwide can be reached with a chain of trust now. slow rollout of DNSSEC capable nameservers (Atlas); let the new code bake for a long time. rolled out an unverifiable zone out; get the larger records out there, get DNSSEC tests rolling; then you can do a gentle rollout, once all sites are up with unverifiable zone. Then you unblind the unvalidatable zone, one site at a time; serve properly signed zone with right key; then once all sites ready, add DS records to root zone for the subdomain. provisioning interface deployment steps; OTE environment for registrars rolled out well in advance of everything else; the sandbox was ready to roll for a long time. Always allow time at each step for things to bake, and problems to surface. DNSSEC in .com; served a signed, unusable zone for a month; unblinded the zone over a two day period (with no DS entry); test carefully; then, when satisfied, put DS entry in root. issues: bug in some versions of BIND affected; one person reported it. name servers required restart after signing, about 62% of queries request DO bit, DNSSEC OK bit set; figure hasn't changed. Even if they're not doing DNSSEC, they still get the data back. overall bandwidth went up 2x with signed info. Increase in TCP queries was neglible; from single digit/sec to hundreds/sec; nothing they worried about. possible TCP failovers; UDP queries, followed by TCP; a few dozen per second; negligible. 36 registrars that have registered at least one signed delegation. 900 registrars, 300 families of registrars; so about 10% have at least one signed delegation. One enterprise has signed 500 of its own, one registrar has almost 1000 signed delgations. http://scoreboard.verisignlabs.com/ lessons learned: the internet didn't break. incremental deployment is possible; unvalidatable technique registrar test environment (with resolvable signed zone) helpful for every party monitoring is critical, esp. surrounding key rollovers issues with hardware and software installed base possible. Questions? Q: John Christoff, from team Cymru; there were 36 registrars, was there one who had 1000, rest had just 1, do you have a breakdown on it, and how many are moving to it? A: Verisign has done everything it can to encourage registrars to go to DNSSEC; they have whitepapers, deployment guides, bump in the wire support for them. Some large registrars are DNSSEC capable. Chris Griffiths, Comcast is up next DNSSEC work. 5% of traffic at NANOG51 was doing validation for residential and commercial providers. Now doing roughly 25% of footprint doing validation. several thousand comcast zones signed, goal of signing all of them by end of year. Working with registrar to automate it. example of validated lookup against the NOGlab looking up their anycast resolvers. xfinitytv.com Now a fully signed zone; this is what you see in production. recently launched comcast IPv6 anycast resolvers for customers 2001:558:FEED::1 and FEED::2 dual stack caching servers, also perform DNSSEC validation glue records have been updated to support v6 as well. You can validate that in the NOGlab today. www.dnssec.comcast.net www.comcast6.net dns.comcast.net chris_griffiths@cable.comcast.com Take the plunge, sign your zones, Q: Peter, ISC; if you type www.comcast.net, if you run a validating resolver, it should get a reply? A: not yet, that's not done yet; the very high value domains they're still working on; expect those soon. Q: Any DNSSEC helpdesk stories you can share? Any issues? Validation side, there are some top level domains that haven't spent enough time on their operational practices. They've had to disable validation for a few domains; they do see it, and notify folks to fix their domains. It's more issues with key management or record management than DNSSEC itself. Q: Peter, ISC again; do your recursive or caching servers allow EDNS0, and support 4k responses, UDP and TCP? A: Yes. Q: have you informed your customers that their resolvers may not be able to resolve domains because the responses are suddenly much larger? A: Yes, they've written up how to troubleshoot and deal with these issues when they come up; but it'll still be a surprise to them. Neal Shelly, Dyn problem they had this issue with BINDs load time with 110,000 zones; servers were taking a really long time to start up; took 45+ minutes to finish starting. Any patching or rebooting was painful for the team. And looking to import even more zones, up to 600,000 zones ideas: mount configs/zone files on partition with no atime set upgrade hardware configs and zone files on SSD bind 9.8.1 patch there's a link to a blog post about it; 98% improvement in load time. mount option; no atime improved, but improved more with the patch. faster hardware helped. old dual-core with SATA mirror 4G RAM went to dual six-core Xeons, 12G RAM, SSD And using SSD on new hardware sped it up even more; sub 10 minutes, with patch, sub 2 minutes. easy, on line patch to BIND got a huge win as well; handles how BIND does multithreading; instead of hard-coding at 8 threads the way it was before. blog post was zones/100, find nearest prime number. any big number really helped a lot. dropped it down to less than 2 minutes. 12G held whole disk cache, that's sub one minute. improvements: noatime, 11%, 25% with patch hardware, 58%, 69% with patch SSD storage, 48%, 73% with patch overall; 96% improvement. Questions? Steve Gibbard, nominum. Here, but no slides. usually goes to peering bofs historically, do big dns servers for big carriers; vanteo, resolver ANS, authoritative name server hosted dns services; sky authority, yet another anycasted DNS network; solid NA, EU coverage, 9 multihomed, independently routed pops, looking for customers. does health monitoring, does A record failover if host goes down; supports DNSSEC, including doing conversions for you; can slave zones itself, or from others. all new management from CEO on down on it; scared off customers with industry leading pricing and inflexible terms; if you've been scared off in the past, please come back. Duane talks about tracing DNS reflection attack via anycast changes. big bang eath cooled, TCP/IP invented DNS invented people discover it's a great attack platform. Bad guy spoofs packets, he sends them to auth server who sends response to the victim. these are attacks reflecting off root nameservers starting in 2010. typical load, 6k qps, rising to about 15k qps by end of year. 10-30k qps normal operation attacks lasted 1-2 days consistent query names. normally difficult to get good data out of root nameservers. DURZ deployment, DNSSEC to root; lots of data collection for DNS-OARC during this time period. lots of data collections from root servers listed on y axis some letters better at participating than others. extracted queries with the 2 names that showed up in attack traffic. always ended with 2 second level domain names. dnscap filtered on names, took a LONG time to run. good thing servers stay up for months. resulted in 275G dataset. populated an SQL database count number of queries by name, type, source, etc. attack event begins when a server sees at least 50 qps for either of two names; ends when none of them see that level of traffic. Gap of 5 or more minutes separates attacks. March 23 2010, four different attacks. name1 and name2 would be two separate events. traffic levels during attack early one in january, around 6500qps see drops where data collection where lost. few dips here and there. all type 1 A queries source addresses; 1-5 unique sources, usually spoofed; sources changed between attacks. a lot of consistency in servers hit during attacks. i root in washington, j root in bombay, for example. march 23rd, one attack that was different; interesting, saw traffic at sites they'd never seen before. closer look at nodes getting traffic. Can see where traffic shifted from one site to another for a period of time for all affected nodes. The glitch at exactly the same time pointed to a routing problem; it must have happened close to the real source of the traffic. made a fingerprint of times when traffic shifted, and where it shifted to for each letter. Then looking at real traffic that shfited in same way, they could see real addresses that moved at same time. AS21788 was the common AS for them. also known as BurstNET, does managed servers and hosting. location makes sense for hitting east coast and europe servers. looked at other attack events to see if its traffic hit same servers as burstNET The attacker kinda inflicted their own "outing"--because root servers have diverse locations around the world, traffic shifts allowed identifying who the source was. Could this be done in realtime? if you had enough hardware! You'd need something like the root servers to be able to see it. Q: Matt, Affilias--you'd need more routing glitches to do this in realtime; that tipped you off in realtime, but if you had enough data, couldn't you do it even without routing glitches, see the correlation on where the traffic hit? A: yeah, that was plan originally; stumbling on glitch really just cemented it. At the very least, would narrow down the number of NOCs to call. Q: Team Cymru, could do some triangulation yourself by adjusting peerings, or announce prefixes with injected ASNs to cause fake loops back to selected sources... A: Yes, but it would take someone that's more of a routing guru than him to do it. Q: Kyle, Merit; were you only looking at A records? Aren't there larger attack types they could launch? A: these attacks were only using A queries; weren't counting on size, just on on reply count. Anything else before we have to clear out of the room? And you can go talk to Peter about the f.root situation. If not, you're dismissed, have a great night, and a wonderful rest of nanog! Stream ends at 1724 hours eastern time.