version 1.0 2011.10.10 NANOG 53 afternoon session Welcome back--hope you all had a nice lunch! don't forget to fill out your surveys! If you ever wished to serve on the program committee, you have until tuesday at 5pm to get your nominations in! Srikath S from Georgia Tech is up next. Broadband networks continue to grow; content providers are interested to know if their optimization are bringing better experience. consumers want to know if ISPs are delivering on their promises. But measuring access characteristics is not easy; home user with multiple computers on a wireless network. Ran speedtest on DSL user in Atlanta, 4.4down, 140K up; speedalyzer showed 4.8 down, 430K up. measurements vary, and are different from what ISP had promised. These measurements are done at different times, and have to deal with confounding issues, like the wireless network and cross traffic. Instead, run it from the router itself; that measures 5.6 down, 480K up; closer to what ISP promised. This takes wireless network and cross traffic out of the picture. BISmark platform, consisting of Bismark gateway which measures performance to other gateways. based on openWRT, and measures throughput, latency, loss, and jitter; support netgear 3700 router, plan to support other platforms. measurements upload to georgia tech, visible at networkdashboard.org Netperf to measure throughput. example of 3mbit ATT user; measurements every 2 hours for 3 weeks. do both single threaded and multi-threaded test, run for 15 seconds. Also do latency measurements to georgia, california, and italy. Around Sept 20th, spikes across all 3 latency measurements. Last mile latency, to first hop, find the first non-nat'd IP, run ping to that first non-NAT'd IP. 8ms, log scale, up to 80ms; that latency is seen by every packet leaving the home router. Also conduct DNS lookup and traceroute measurements; duration and frequency of tests configured by server. Also have iperf, curl, tcptraceroute, paris-traceroute, DITG, all configurable by control server. routers download scripts via SSL from server, and run them every 10 minutes; each router could in theory run custom tests. Routers administered from central control server at georgia tech. collects stats, pushes configs out data stored in postgres database. measurement servers in GeorgiaTech, Naples, and cape town. BISmark, openWRT, luci web interface IPv6 capable atheros chipset gigE, MIPS processor, 16MB flash, 64MB ram Traffic shaping graph, showing comcast power boost customers in Atlanta Some users see different peaks when power boost is enabled, dropping back after 14 seconds; all 3 users use about 8MB, but with different shaped graphs. Benefits are good for short-term web session; but long term, almost no impact. last mile latency; cable ISPs, most see low latency, less than 10ms. For DSL ISPs, a significant fraction see much higher last-mile latencies, up to 40ms. DSL ISPs use interleaving on last mile to reduce loss in last mile by introducing error correction; reduces loss, but increases latency. Modem buffers in home networks profile the buffering in modems by runnig ping to last mile server with no traffic, and while uplink is saturated. 30 seconds of ping, then saturate uplink with iperf for 60 seconds, and measure again. ATT user with 2wire modem; 10ms latency jumps to 800ms swap 2-wire to motorola; latency goes to 1200ms. Westel modem, up to 10second latency. Watch user upload a large file, network becomes unusable for everyone else. Aout 20+ BISmark nodes in US, 10 in South Africa. Plan to deploy into Europe and Asia, once they get measurement servers there. Plan to support Atom routers and TPLink router in future. Want to understand transit and access ISPs; effect of peering on performance IPv6 performance effect of CDN location, traffic engineering on application performance Get involved! Host a BISmark router get a high end wireless router for free host measurement servers in your network geographic diversity is important Contribute measurement tests http://github.com/bismark-test Please contact them if you'd like to collaborate. Any questions? Q: ATT, you were measuring at IP layer, so there was probably overhead. A: With ATT DSL, synchronizes at slightly higher rate, so shouldn't be an impact. Q: Levels for DSL weren't really high enough to impact gaming; but above that... A: there have been issues when people move to the US and are shocked at performance issues. Q: Ben, Medias, what about looking at passive measurements, looking at traffic going through the modem? A: Yes, but there's privacy issues; would need to get explicit permission from user to look at their data first, and then do measurements based on it. BGP slow table transfers is up next Pei Chen, PhD student from UCLA work is more about data crunching and analysis; will focus on tool, tcp delay analyser. No single study can explain all the BGP problems in the wild. maybe you can use the tool to analyze your own BGP traffic. Joint effort with Level3, Cisco, and UCLA. BGP table transfer; update affecting a large portion of a router's BGP table, due to session resets or major route changes; can take 5-50 minutes to complete. How can you know if your table transfers have problems or not? Goal is to identify the delay time more efficiently and systematically. Look at TCP layer; BGP is just an app running over TCP, really. Look at TCP transfer; 5MB of data, but takes 560 seconds. many retransmissions, and periods with different slopes, due to receiver awnd or sender cwnd If you look at BGP messages, the spacing between updates are just due to retransmissions, transport induced delays. Need to look at TCP trace; looking at visual data can show single issues, but that doesn't scale; so how do you analyze the data without having to look at each one? T-DAT, TCP delay analysis tool automatically identify delay contributors which application (BGP, TCP, Network) where (sending router, receiving router, network) series-based approach; convert TCP trace into a bunch of event series record in two-tuple; event duration and data Can have multiple packets being tracked in the series; analyse sequence to look for delay contributors. Slide shows T-DAT operation in brief; tool converts the trace to a bunch of event series; for some series, conversion is easy, and is a one-to-one extraction. For others, need to look at outstanding packets, window, etc. result is a series of square waves, showing delay and duration. retransmission is shown by 9 square waves in the transmission line then calculate the delay contribution from each series. In this case, 8 delay contributors, from 3 groups. 2 BGP, 2 TCP, 4 network then figure out ratio of time spent in each contributor in the delay vector. Take union of the time series, rather than simple addition; takes overlaps into consideration apply delay vector contributors to BGP updates to model the overall update time. Apply the data to BGP data (ISPa and routeviews data) collect ISP data to vendor, and to collector via sniffer, and also collect route views to vendor. need to identify BGP table transfer separate from normal updates. Use pcap2bgp to reconstruct BGP byte stream from the capture; extract BGP updates, store in MRT files; then apply the method to them find contributors for the data; for ISPa, apply T-DAT tool, and develop group delay ratio for each group. network delay is usually low failing side of a session tends to have more impact. more often due to slow sender than slow reciever. (for the vendor). Quagga, both sides equally contribute. For each transfer, look to see which side triggered the transfer. showing the sending router triggering the update, it sems that the sending router more often contributes to delay, probably due to it having to update many, many different neighbors. from routeviews data, no visible trend, it's got sessions from all over the internet But if you pick one slow transfer, in this case, slow receiver, and look at it, it looks like 86% was due to small window on recieving side. picking a slow sender, took 160 seconds, sending router pauses often; idle 92%; could be timer interaction on router. final contributor is one with a lot of retransmissions; neither sender nor receiver; 67% of time spent recovering from lost packets. screenshot for software in action. use tcptrace -G file to analyze data into streams. tool requires tcptrace/tcpdump; feel free to grab it, test it out, and give them suggesions. http://irl.cs.ucla.edu/bgpmicro If you have BGP monitoring, and a sniffer, these tools can help you analyse the performance. Can help reduce transport induced latency. Ongoing work improving the tool parameter/threshold settings support other TCP variations? collect BGP/TCP traces ISPa routeviews UCLA explore the tool usage for other TCP apps If you can share your BGP/TCP data with them, it would really help! Q: you made a couple of statements...you work with Lixia, so you might have details; you used a Quagga linux version; could you post your Quagga Linux specification, since they don't have best TCP in-house. Also, TCP in routers are probably more tuned for BGP; that assumption might bear review? A: Need to check with data source to see if he can post the specs on the linux and quagga box. Second question, no he hasn't looked at each vendor's TCP implementation. Q: Has he done any testing with any commercial routers with this tool? A: Once side is the vendor router; other side is the collector. Q: would be good to post parameters from the router and collector sides. And has he done this with any of the newer TCPs, like S-TCP? A: No, he's just worked with Quagga and vendor boxes; so comes down to the TCP implementation in the boxes. They only observe the output with TCPdump, they don't dig into the internal tcp settings as part of this. Lightning talks are up next Facebook neteng person comes to podium...and the laptop crashes; we go to generic background slide while they fix it. Peter Hoose, network engineering rant why are you complaining? troubleshooting is easy with a simple network topology. confirm issue--ping validate the path--traceroute check counters--show int call vendor. but that's probably not what your network really looks like. he shows the real set of paths in their network, which is much more painful. ECMP and LAG make ping and traceroute less useful show int, show log, shutting down ports becomes more haphazard. "must be a network issue!" can't really blame the people for spewing that, because you can't really prove one way or the other. Often later, find it is a network issue. TraceFlow/TracePath (traceroute with path including physical links; need full tuple you're looking for) hping | ping v2 shows full tuple full fabric monitoring (every host checking each other) BFP is cool (but need to see inside LAG groups) more granular counters (30 seconds doesn't cut it) need to poll more (if you can't get data from a router, it sucks) need to poll less (don't want to wait to pull data from router, want router to tell you when it has an issue) (but routers are insane) let me pop the hood (let me connect via openflow, run perl, python, run tests on it) Now what? do you have suggesions on how we can break this open, and bring sanity to it? http://facebook.com/phoose ph@fb.com Q: tkap -- can you talk about active measurements, compare what exists now, and what's lacking? congested link exists somewhere, ping is flying back and forth, do you see them correlate, do you decide if they're telling the right story? A: They take every error counter on every device, and roll them up; sometimes the line doesn't go up because there isn't an error counter, or device doesn't report it, etc. Q: So you need both, active polling, and passive reporting. A: Yes. Linda Dunbar, huawei IETF working group ARMD address resolution for massive numbers of hosts in the datacenter ARMD track at June NANOG goal of talk: solicit operators to challenge the generic DC network designs and the associated pain points scenario #1; L3 to access (ToR) single rack has its own L2 domain, has its own subnet benefits: ARP/ND scale very well. no problem but server is loaded with new applications, it has to inherit the same IP subnet. IP addresses have to be reconfigured when VMs move to a different rack. solution 2A; L3 to aggregation L3 routed domain, L2 for ToR VMs can be moved from rack to rack without IP reconfiguration within L3 domains. Scenario 2B; L3 to gateway only bigger L2 domain, VMs can be moved anywhere, minimal re-IP needed for any applications loaded. triggered by demand changes reduce or increase number of racks when demand changes allow servers to be reloaded with different applications under different subnets without any physical moving or IP reconfiguration (like online gaming, which comes and goes and shifts with popular fads--allows for reconfiguring same servers with different games as demand shifts) pain point#A for scenario #2 when external peers initiate communication with hosts inside datacenter router needs to hold data frame while it tries to locate the host. pain point B for scenario #2 hosts send ARP/ND to default gateways frequently v4 solution-- frequent gratuitous ARPs by gateways v6--no solution, need unicast ND communication with gateway pain point C for #2 hosts in two different subnets communicate with each other within datacenter gateway router impacted twice, once for each subnet (coming in, and going out) Overlay networks; Trill, Mac in Mac, GRE encaps in hypervisor; but ToR or hypervisor still needs to perform network adress encapsulation; you still have bottleneck in gateway for external communication. http://tools.ietf.org/wg/armd Please come talk to them; they want to help us find an operational solution; they'll be at the beer and gear tonight, so find them, and talk to them!! Q: if this is your network, maybe it's just that easy; if you can help write text for their problem statement, he'll buy you a beer! Samim Akhtar is up next welcome to Philly his hometown; hopefully everyone had a good lunch! 100G check point has everyone tried 100G yet? what does it take to land a technology to an operator? solid understanding of operators network lifecycle understanding of the end-to-end use case sound familiarity with operators current architecture laundry list of operator architectural constraints good understanding of cross layer interdependencies need to articulate our use cases needed; include details on our specific cases. 100G let it be four characters, not a four letter word. connectivity efficiency, spectral efficiency, and MAC efficiency; need to be aligned on all three axis for mass deployement. OIF based 1st gen on optical side, second gen superchannel for 100G+ for MAC efficiency; need to shorten gap between 2x50G ASIC and 1x100G ASIC; gap between what was first released and what is needed. outages that affect millions of customers have big impacts; migrations must go smoothly. IEEE started with LR4, then cost curve of 10G came into play; LR4 wasn't in the same range of what people were looking for. LR10 has been tried, still trying; multimode active cable could have solved many issues, but we didn't pay attention to it. Now we're looking at it for low cost connectivity, about 9 months too late. Trying to land all 3 layers at the same time for it to be a hit. Going into 1T needs "non-linear" thinking and new toolsets current 100G and 400G is linear thinking linear approach and current tools break on 1T need out of box thinking, think outside IO speed, think about end to end, power, space, cooling, etc. 25G VSR will help; but Shannon's limit in fiber requires deeper look; SuperChannel is a temporary fix. empower content providers to help guide traffic, and carriers can work with them to dampen the bandwidth floods, elongate the life of the fiber. We need 1T for LAN connectivity. We left LAN to IEEE, piecemeal ownership doesn't help, we need to get more unified approach to this. For newcomers, few notes; separate tracks, IPv6, ISP security, best current practices. Take some time during break to fill out the survey, sign up to be a NANOG member, enjoy the break food from XKL, check out the NOGlab, and don't forget the beer and gear in the regency, and the microsoft social at the triumph brewery. BREAK TIME! Oh. Apparently the after-break stuff isn't going to be streamed, so that's it for today's notes, sorry. :(