(version 1.4) 2010.06.15 NANOG49 day 2 notes Dave Meyers starts things off at 0933, thanking everyone for the wonderful party last night; thanks to Google, Netflix, and Equinix for that party! Kevin thanks people for being here, esp. the sponsors and the primary sponsor, Netflix. Don't forget to fill our the daily surveys! That's how the program committee decides if they're doing the right things or not, as well as the steering committee. To start things off, Juniper networks will talk about the heart of what makes the packets go, the ASICS. http://nanog.org/meetings/nanog49/abstracts.php?pt=MTYwMCZuYW5vZzQ5&nm=nanog49 Chang-Hong Wu Thanks to the NANOG program committee for letting him talk about ASIC technology. The driving force behind the exponential growth of the internet is driven by growth in storage, networking, and computing; the overall growth also drives the overall information system to grow faster, so it's a feedback system. Computer performance growth in last 20 years has been growing almost 2x per year, and microprocessor is growing at 1.3x per year; overall system is growing faster than overall system, showing importance of interconnects between systems. Router performance improvement in last 20 years; before mid 90s, before ASICs used, was only 1.6x/year growth, based on computer system growth. After ASICS become widely used in routing systems, the pace of improvement increased dramatically. After ASICs, growth rate was about 2.2x/year, which shows over many years, they far outstrip computer growth. Use of silicon inside system is categorized by how much information they process, from less processing per packet to most processing per packet. simple packet forwarding engine does least processing, all the way up to edge services engine, which does most. ASIC is application specific integrated circuit It is specifically designed for your task. Table showing different types of devices put into building a system, from general purpose cpus, field programmable gate arrays, off-the-shelf network processors, and finally to tailored, custom ASICs. Along that axis, you get less and less flexibility in the tasks they can do, but increasing cost. ASICS are fastest, but have high NRE development cost, long design cycles, and are less flexible; but you have to do them to keep up in the high end routing marketplace. How do you design them? Start with system architecture. factors that influence the system architecture: market requirements software/hardware interactions silicon process technology evaluation memory chosen chip partitioning ASIC process technology first step, fabrication technology Over time, the processes improve, and Moore's law about doubling transistor densisty every 18 months is still holding true, in spite of people predicting the end of it for the past decade. In the last five years, the performance is starting to level off even though density is still doubling. And power consumption is starting to level off; we're reaching the limits of improvement. dynamic power reductions slow down, voltage can't drop any faster; static power consumption due to leakage increases. So we can't rely on fabrication technology improvements to continue to deliver improvements. NRE costs have been going up for every chip, every step of the way. Network ASICs and memories diagram showing ASIC connecting to various memory system; inputs, outputs, packet buffers (to store packets during congestion), Link memory to chain cells in the packet together, and control memory to store forwarding tables, ACLs, and all the configs and control structures. Chart with different memory technology characteristics is shown, tagged with high/medium/low for each box in the matrix. Embedded memory is within ASIC itself; external memory resides off-chip, separate from the ASIC. embedded memory is faster, burns less power, but smaller in capacity, and will be more expensive, per size. SRAM is 6 transistor cell device, higher performance, lower latency, smaller, more power hungry DRAM, dynamic memory, high capacity, lower performance, higher latency, lower cost. TCAM, ternery content addressable memory. You give them content, they come back with which entry matches that content; backwards from regular memory, where you give it address, it tells you what's there. TCAM has predictable latency; you give them content, they return address in fixed time. smaller, much higher power, very high cost. These affect scaling of your system. So, scalability, feature, and cost are affected by choice of memory; and being silicon devices, they follow Moore's law as well, so over time, they change as well. Back at NANOG39 or so, there was a talk about FIB scaling. That talked a lot about SRAM; since then, since microprocessor put L2 cache on die, the market for SRAM has decreased, so improvements in SRAM has almost completely stopped. So, doing design requires predicting what's likely to be available in the future. So, for different tasks, choose different memory packet buffers need high throughput, high density, long bursts ok SDRAM, or RLDRAM (reduced latency DRAM) Queueing/Link memory need high throughput, low latency shorter bursts SRAM, RLDRAM, or SDRAM Control memory need high throughput, low latency even smaller access quantum (need just a little bit of control information at a time) SRAM, TCAM, or RLDRAM How many chip partitions does the architecture need? fewer chips does not mean less overall cost; bigger chips may reduce overall yield for die, so there is a crossover between one huge chip on die, with huge transistor count using multiple chips may increase yield, reduce power dissipation for each chip. chips use periphery for I/O, with limited periphery, you balance I/O need with edge area and chip density. Need clean interface between them. IP1, IP2, M40 routers, 250nm process, 4 chips per chipset, split due to system partitioning, uses SRAM, 125MBits, and SDRAM. IP3 (2002), 180nm process, 10 chips per PFE, 20G PFE, split due to memory bandwidth, 5 of 10 chips are interfaces to memory, used SRAM for control memory, use RAMBUS for data memory (only time they used it!) T640, 90nm, 9 chip complex (2 chips combined), used SRAM as control memory, used RLDRAM for packet memory, DDR SRAM wasn't mature enough to use. the SRAM disappeared from vendor's roadmap halfway through development, so they had to improvise. 2006, I-chip, 1 chip 10G PFE, 90nm process, used RLDRAM for control memory, DDR2 SDRAM for packet memory. Trill, NISSP, 65nm process, 4 chips, 2 are 90 2 are 65, transistor count increased, RLDRAM for control, DDR3 SDRAM for packet memory Why change them all the time? To keep providing value to their customers. 1998 to 2008, 4 generations of routers improved slot capacity, increased per power bandwidth being deliverered, from 13Gb per kw to 96Gb/kw. That's the power of ASIC improvements. OK, that's architecture; now it's time for the design phase itself. Humans design them based on divide and conquer; divide into separate chips, those chips into subsystems, and blocks, and sub blocks, and down to basic logic elements. document functionality and architecture, the inputs and outputs needed. translate the microarchicture into register transfer level code. Those functional designs take the most time; translate *everything* into RTL code, each chip has hundreds of thousands or millions of lines of RTL code. These look like C, but they aren't software, they are translated into physical devices. shift registers are cheap in hardware, no logic involved; a barrel shifter in software is easy, but in hardware, much more complex to do; so the logic steps do matter! synthesis is mapping of RTL to GATES in the technology of choice. Also need timing requirements for the gates. INPUT is RTL code specification of clocks and cycle-times (frequency) input and output constraints physical synthesis becoming more and more important over time. Verification--once the code is done, how to you make sure the chip is correct; the NRE is huge if you spin it wrong. simulations are easier to debug than real chips; very hard to see why physical silicon isn't working. At least as many verification engineers as design engineers per chip. perform simulations at many levels block level chip level subsystem level system level software and hardware co-simulation Look at physical, waveforms, logic flows, use verilog and other tools to run the simulations. Physical design power and clock planning are really important; very high power consumption in total perform high-level floor planning, place I/O, SRAMS, and register arrays those steps mostly done by humans, by hand random logic placements perform congestion analysis, see if it works wire up all logic and I/Os run timing with physical element placement do many, many iterations of the above to try to get it correct. Slides showing example of memory placement on the die then the logic placement and clocks then the M1 routing (many layers of wiring, going in different directions M1, M2, M3, M4, M5, M6, M7, M8, M9, M10 about 10 layers of metal wiring between blocks. Finally, it's done, it goes to ASIC tapeout All functionality has to be complete All verification has to be complete Performance simulations meet goals Chip is error-free from the testability perspective chip meets timing under all process temperatures and conditions. archive all the data, send to manufacturing, build masks for photolithography ASICs then build layer by layer Each wafer has several dies; each die tested on the wafer; working ones are cut out, put on package, tested again. The sealed chips are put on boards, and then tested *again*. After that, ship product, and you're done. So, ASIC technology has transformed the network industry. silicon process technology is evolving at an impressive pace, but architectural innovations will still be needed to keep up with demands. Due to the vast number of tradeoffs in the architecture and design phases, need feedback from community to know what directions they need to go. Q: Tony Kapella, 5 nines. Can you provide comments on what you see as trajectory as far as where you see power budgets and distribution among systems. Where do you see TCAM, open flow systems, where will power be going more, and less, on the systems to come? A: wow. that's hard to answer. In terms of TCAM, the market is condensing pretty fast. Currently, he sees challenges in how they will scale performance without requireing more power. Q: Do you see scaling TCAM against natural limits of what chip can do, vs doing repartitioned chips, will the power factor be a bounding condition versus other chips? A: That's one reason Juniper doesn't use TCAMs, to avoid that power demand. By its nature, it burns more power than SDRAM. Vendors are trying to improve that, but it's a tough road to make TCAMs more scalable. He doesn't want to alienate TCAM vendors, but it's probably hard to scale pure TCAM solution; a hybrid model might work. Number of vendors serving the market is decreasing rapidly as well; the price of that may decrease that. Q: you talked a lot about the design and manufacturing process--how do you go from there to upper level things like doing ACLs, and TRIES; do they have a set of primatives that the higher level languages can access A: memory choices depend on your ACL algorithm, for example; some use TCAM, it's a natural lookup choice. Juniper uses a trie to do the lookup. In future, some lookups may be internal, and some external. Q: in many ways, this is more like graphics acceleration you start with the algorithm, and then figure out what you want to accelerate. A: it's a bit of both; you need to know the technologies available, and their strengths when you are working on your algorithms as well. And they both change over time. Q: Chis Woodfield, Yahoo At what point do you find operations that are better accelerated by ASICs vs general purpose chips? Were there operations that were once the domain of general purpose chips that are now in ASICS, or vice versa? A: Demands on bandwidth are always increasing exponentially; ASICs are generally always faster, but the cost and needs come into the architecgture and design phase. Q: Jeff Sacks, blue ridge Thank you for describing this; how do you simulate the chips in software to look at timing, and interactions between blocks; how fast can you simulate that, is it at 1 millionth of realtime? How close to real silicon can you get? A: There are many simulation techniques; software simulations are millions of times slower. Emulation systems try to compile chips into hardware, which are still slow, it's thousands times slower; it takes more effort to put them in emulators, but your simulatoins are faster. Q: Scott Whyte, Google Because networking changes, the ASIC technology we buy today won't be useful in 3-5 years, is that what he's saying? A: No, the protocols won't change, but the way the implementation in chips happen will change. Q: But network designs change all the time, so are chips limiting how we can build networks? Is all the complexity being built in, in fact the wrong way to go? If nobody in the room wants that... A: let me speak; these devices are all remarkably programmable; they try to anticipate these changes, so devices from 1998 still work today. They have to try to anticipate what will come; they have long lifespans. Q: You can use an IP2 in place of Trio, then? A: Yes, the IP2 systems are still in place today, and still work. Q: If I don't need all the features, and won't use them, it's a lost cost, and it constrains the way networks can be built, so ASICs may be the wrong way to go. A: If microprocessors work for you for building gigabit networks, then be my guest; performance and features at that speed are hard to get. Q: Joel Jaegli, it's good that IP forwarding is done in the thin part of the hourglass; as long as we don't crap up that part, it should be a good decision on our part. Thanks to the presenter! This is much more impressive and much less messy than seeing sausage being made!! There is lightning talk submission still open, if you'd like to get in for wednesday Social tonight at the Clift Hotel, Hurricane Electric and Coresite sponsoring it. Science fiction theme to it, should be fun. http://nanog.org/meetings/nanog49/abstracts.php?pt=MTYwMSZuYW5vZzQ5&nm=nanog49 In DR, talk about difficulties in doing internet in Haiti; after the earthquake, things got much worse. The AHTIC IXP panel emergency response update, Reynold Guerrier, treasurer of AHTIC Internet infrastructure before earthquake 1Gb capacity for international link 3 providers, microwave connections 10Gb submarine link, joint venture with incumbent telco 4 ISPs in Haiti running WiMax and other wireless 3 mobile operators 3 million subscribers out of population of 9 million incumbent is in recapitalization process with veitel May 6, 2009, IXP launched before earthquake, 4 ISPs connected, 2 via fiber, 1 via radio link, and one via ethernet cable. result of cooperation between UofO and IHT management of .ht done locally since redelegation in 2004 Jan 12, 2010, 4:53 earthquake struck, incumbent Teleco collapsed telecommunications were badly hit. core internet infrastructure survived, IXP survived. 40% of clients were lost, about 20% have been restored Image of Teleco NOC collapsed, only one that collapsed from Google Earth view. IXP in Boutilliers is on hill, the ISPs and cell phone operators all have facilities in the area. Thanks to distributed architecture, and thanks to ICANN community, the .ht domain never stopped working. IXP never stopped working during earthquake, but 2 IXPs had fiber cuts during aftershocks. the wireless and ethernet connected ISPs stayed connected, but traffic dropped from 2Mb down to 128k. One fiber provider is since back up. People able to send wireless messages from cell phones while trapped under the rubble. Messages on twitter and facebook from people receiving messages from people trapped under the rubble. Stephane Bruno started collecting messages, sent to Steven Huter, with help of state department, passed them to rescue worker. Earthquake emphasized need to decentralize the communcation for the state. ICT task force created to lead a project with IDB to address Time to develop local content, local hosting market, implement geographically distributed systems. Explore potential of other cities like Cap Hatian, Jacmel for example. Build more resilient infrastructure work with NREN landing station was completely destroyed vietel was concluding the deal, still interested; vietel will restore landing station and will put cable back in service again. and will put 5,000km of fiber in place to build national backbone. Second fiber to be brought to haiti. working to convince big operators like Digicel, Voila, Viettel to join the IXP. Telephone and internet licenses are different Recovery initiative Eric Brummer Williams, had adopt-a-haitian technition or facility. f-root instance will be shipped there, thanks to ISC Help from US military, from NANOG members, and thanks to list members, his family was able to be able to be evacuated to Miami. Thanks to Google, they sent a lot of support tents for technicians whose houses have collapsed. Assistance vs business time to move forward with bringing business to haiti due to earthquake, there is a lot of opportunity for reconstruction, and we need to rebuild it in a better way. changes are happening in telecom sector. digicel is acquiring ACN, one of the ISPs that participates in the IXP. Need to build synergy between US investors and local companies. Best way to build local capacity with knowledge investment and transfer. End of June, will be a round table in Barbados with the telecom sector. ITU, Caricom, and haitian regulator (conatel) match making for B2B Contacts slide is listed for people who would like to talk to them, or discuss business propositions. Max Larson Henry is up next Haitian ccTLD is managed by consortium, university of haiti, others before earthquake, 2 of auth servers hosted at teleco facility; other ones in NJ, Paris, and Canada. Last one is part of 50 anycast nodes hosted by PCH shadow master in australia after quake, teleco building collapsed, was also landing station for one optical fiber and microwave links to other areas. so 2 servers were destroyed. telecommunication unavailable to reach the secondary operators some ccTLD staff injured, needed to help families. But service for .ht stayed up, thanks to managers of secondaries who made sure service didn't stop. Learned about geographic diversity (avoid SPOF) shadow master configured to feed the secondaries also configured as master for nic.ht with data from dns.princeton.edu (secondary for nic.ht) secondaries reconfigured to pull .ht and nic.ht data from primary in australia People networking is very important. impossible to contact secondary operators, but they took initiative to make sure service continued to function learn from previous incidents. expire field changed from 1 to 2 weeks after a two day outage in 2006 at teleco DNS is strong/robust by design, but community still needs to apply BCP practices. .ht servers at IXP with ccTLD acquire number resources contingency planning geographic diversity -- more than one shadow master develop more projects with local internet community anycast copy of f.root capacity building CSIRT Thanks to everyone who helped during the earthquake! Thanks to Force10 networks for sponsoring the break! Oh, one more tiny point--the surprise--some might already know; the haitian famous rum, he brought with him a couple of bottles, there is one extra, it will be with him tonight at the social, see him for a drink of it! :) Break now until 11:30. Prizes for monday survey are drawn. Drat. forgot to turn in my survey yet again. Barry Greene, 3 talks on BGP next. Announcement--don't leave your equipment unless you lock it down!! Sharon is up first, how secure are BGP security protocols? http://nanog.org/meetings/nanog49/abstracts.php?pt=MTU3NSZuYW5vZzQ5&nm=nanog49 not proposing a new protocol, looking at proposals from last 10-15 years. BGP traffic attraction attacks can cause major issues; Pakistan youtube hijack man in the middle attack, defcon pilosov and kapela traffic interception If we had BGP security, these problems would go away, right? but different protocols prevent attacks in different ways, and all have different holes. Origin authentications, r-pkis defensive filtering (prefix lists) sBGP, soBGP look at simulations if everyone used these, and attacks were being launched. AS-level graphs of the internet, with business relationships; CAIDA, cyclops, etc. as sources. we don't know routing policies used in practice in the internet at the moment. will look at what happens if a given set of relationships and protocols are in use. To run simulations against traffic, need a model of what might be in place on the internet. All results in paper are based on model. 2 relationships 1) customer to provider (money flows from customer to provider) 2) settlement free peering (no money flows between two ASes) Model assumes that AS prefers cheaper paths; prefer path to customer over peer, peer over provider. All model assumes single IP prefix being attacked. V is victim, assume their prefix is being hijacked. Data from CAIDA, with ASes removed to anonymize data. Only transit traffic for your own customers (where you get money from them) start with simple prefix hijack against BGP look at single attacker, single victim from CAIDA data from 11/20/2009 AS relationship data Later, will look at all attackers/all victims. In this example, if the attack comes from a multihomed customer of larger ISPs, the larger ISPs will follow the attacker prefix instead of the legitimate source. 62% of ASes in source data will route through the attacker network. If you have ROA/PKI, attack can't happen. ROA/PKI gives secure mapping between prefix and who owns them. But if attacker is smarter, he can simply *claim* he is connected to the person owns the prefix, ie announce the origin AS and then your own AS. Attack will still work, based on route preferences for customers; longer AS path doesn't matter. 58% of the ASes in source data will still go to attacker in this model. Announcing paths that don't really exist shouldn't be allowed. soBGP and SecureBGP would prevent those from happening. SecureBGP, can it stop attacks? No, still allows route leaks. SecureBGP still has mapping of prefix to owner. Also prevents AS from announcing paths that were not announced to it. you sign announcements you recieve with your key before passing them along, so you can't announce paths that don't exist. But you can still leak prefixes along paths that it shouldn't normally go along, causing traffic hijacks. That still attracts 16% of the ASes on the internet; it may not be an "attack" per-se, but it does attract traffic, and is quite effective. Interlude--finding the optimal attack. What is the best attack strategy? attacker can't change relationships, so one strategy is to announce shortest path possible. But it turns out that shortest isn't always best; announcing a longer path that matches better works better. Announcing to most people would seem to make sense; but it turns out longer paths to fewer people can make more sense. It's hard to find the optimal attack strategy. It's also NP to find the optimal attack strategy. Smart Attack Strategy underestimates damage! Sometimes longer paths are better. Announce path that will be believed, even if longer, towards the person wh Q: Jeff Sacks, blue ridge looks like you want to purchase connectivity from different large providers, but announce in different directions, different paths, so that each party sees it to be advantageous to send traffic to you. If you pick both paths, and send them in each direction, you can pull traffic from both sides! (caveat expirations at some point when you pull traffic from *all* upstreams) Stub AS. If we do filtering on stubs via prefix lists... assume providers do filtering on stubs, and only accept announcements for prefixes they own. pakistan hijack, if its upstream had done filtering, attack would not have happened. These attacks would not work if filtering was in place. 85% of internet is stubs; if upstreams filtered their downstream stubs, most attacks can be stopped. graph of simulation results on whole internet data set, using random sampling of random attacker and random victim; success is if 10% of internet routes through the attacker after attack. count pairs of attacker and victim where attack is successful for each protocol. BGP alone, 90% of pairs succeed. OrAuth, 60% soBGP, SecureBGP, 15% success With filtering in place, still have 15% attack success on remaining 15% of non-stub internet. The black line is hard top for filtered attacks, vs unlimited attack potential for unfiltered attacks. secure BGP isn't a replacement for filtering; you need *both* in place. secure BGP drops by factor of 10, filtering drops by another factor of 10. At that point, remaining 'attacks' are just non-stubs announcing prefixes too widely. The data agrees very well with what CAIDA and Cyclops data capture. At least on average, the curves line up very well. Turns out tier 2 ASes are still most effective attackers. With filtering, stub ASes no longer effective; but upstreams are still effective. So, non-stub with at least 25 customers to 250 customers. Turns out tier 1's don't get as much traffic as tier 2's. tier 1's, you have to pay to route through them, and they don't attract other tier 1 traffic. But the tier 2 networks are customers of the tier 1's, and the tier 1's will *prefer* the path to the tier 2's. So, we need secure BGP as well as filtering. Secure BGP constrains paths announced, but not the export policies how do we do filtering properly? right now, relies on altruism; filter your customer to prevent rest of internet from being attacked. This relies on everyone trusting other providers to have a clue, and to care. Keeping prefix lists right now is annoying and hard. ROA/RPKI infrastructure would help; use them as input to filter lists for your customers. In the meantime, use prefix lists as well as you can. Small providers; if small providers don't filter, because they don't have the resources, how bad does it get? If providers with less than 5 customers don't filter, about 14% of stubs can still leak them out. If only networks with more than 500 customers filter, 86% can leak. If ISPs with more than 10 customers filter, 55% of attacks can be stopped. https://www.cs.bu.edu/~goldbe Q: Todd, google first half, great; second half horribly optimistic. After years of bellyaching, we've seen that getting accurate prefix lists for even single large ISP seems to not be possible. Relying on that to solve this problem seems to not be possible. A: Actually, filtering transitive customers was not assumed; it was just talking about the direct customers of an ISP, not transitive customers. That is, this is only around filtering stub customers, but NOT filtering non-stub customers; up to those non-stub customers to do their own filtering. Q: to clarify, that's the number of sources for attacks; same number of attacks could still come. Q: Sandra Murphy, Sparta In the model, the reason that what M was doing was considered an attack was because they were a stub. What if someone just lies about a business relationship, to make it appear they are not a stub? A: Yes, that would attack this filtering model by allowing a stub to bypass filtering. This assumes that transiting traffic for non-customers doesn't happen. Q: Danny McPherson this is good work, look at Jared's leaks page, for example, we do see a lot of real world examples. At InternetMCI, they filtered everyone, but people stopped because route refresh didn't exist, and dynamic filters didn't exist. Now that those exist, and ROAs are coming into existence, people need to start filtering! Q: Heather Schiller It sounds like you make your definition of stub someone who doesn't transit for external entities. You can have provider with customer that does not have their own ASN; it's hard, higher up, to validate what relationships exist further down in the tree. It's very hard to determine what the relationships exist from RIR data to tell if someone is a stub or not. It's an ideal, but there's no database of relationships to validate against. Q: you seem to assume that garnering largest volume of the traffic is goal. But what if you want specific traffic? You may not want to make waves, and look at who connects to whom to pull off just a small portion of traffic from a specific victim, and you can pull that off relatively undetected. A: to both points...if people can talk about what rPKI world is going to do, that's an important piece to this puzzle. To second point, goal was to look at broad scope of attacks, not pinpoint attacks. Q: If people can push RIRs to do a better job of matching prefixes and organization, that would help; Verizon business, for example, has many orgIDs, which makes it hard to tell what prefixes really belong to down streams. Q: One quick comment is that if you ROA and rPKIs exist, that excuse no longer exists; at that point, people should look at pulling that data to build filter lists. Qing Ju, Large Route Leak Detection University of Arizona http://nanog.org/meetings/nanog49/abstracts.php?pt=MTYwMiZuYW5vZzQ5&nm=nanog49 Large route leak is same as prefix hijack, just different attribution is given for it. unauthorized network announces prefixes of other networks. both prefix owner and other networks are victims of these leaks/attacks current practics is that only prefix owner deals with leaks/hijacks. Right now, you can't see leak yourself, you have to look at remote monitors, look to see which ones are real incidents, and call upstreams and attackers directly, which takes time. It took more than 2 hours to stop youtube attack, for example. picture shows the different parties in the leak/attack. Protecting your traffic how do other networks protect their traffic when an origin change happens, without having to wait for prefix owner to take action? There are some cases where you can, with high confidence decide a prefix change is not valid. When a network hijacks prefixes of many other networks at the same time; worst case is leaking the whole table. Detecting this; leaking whole table is trivial to detect. Leaking sub-portions of the table is a bit harder, goal is to be able to minimize false positive while allowing ISP to protect its traffic. If one AS hijacks prefix of another AS, very hard to say if it's legitimate or not. But if prefixes of multiple ASes change at same time, it becomes more suspicious. Process BGP raw data, look at recent changes. Most of them are legit changes. If AS has announced this prefix for more than one day in past year, consider it valid if announced superset of it for more than one day or as is stable neighbor of upstream filter out exchange point prefixes gets rid of most recent changes goal is to eliminate much of the data first. If AS is still announcing prefix normally announced by another network, it is 'offensive' number of networks it is announcing for generates an offense number. If you are offending 10 or more other networks, it is a route leak or prefix hijack. From 2009 data, whole year of route-view data collectors. Most events have offense value of 1 or 2; they cut the long tail at 10. Seven years of route view data, using this filter, see 5-10 hijack events a year. How accurate is this? Sent email to victim networks to try to confirm it; for 9 events in 2009, and 6 in 2008, all events were confirmed by victims as route leaks or hijacks. Didn't go back earlier, less likely people have records for them. From NANOG discussions, what we usually hear about are full table leaks; only 1, in 2008, was full table leak, and talked about on operator mailing lists. Rest of the other 14 did not get discussed. Even though we aren't trying to catch all hijacks, this data is still useful to people. Many of them last only a few minutes, but a few last for many, many hours, which cause big outages. Feb 14, 2009, AS in saudia arabia announced prefixes for 34 other local ASes. Those local ISPs switched to other ISP, but 8895 announced ex-customer prefixes due to misconfigurations. plot offense number over time; usually zero, then suddenly jumps to 20 for a few hours. Duration of large route leaks; 20% last more than 3 hours. median of 76 prefixes affected during leaks. percentage of route view monitors that see events; more than 75% percent of attacks seen by multiple monitors. Pretty Good BGP catches many of these, but you'll have many false positives. This is trying to catch fewer cases, but with very high confidence. Since this has high confidence, hopefully this can have automatic reaction. Individual networks can run this to see if leaks happen, and drop bad announcements. Can adjust threshold for when automatic action is taken, vs just alerting. Or incorporate into cyclops, for people to get alerts not just about their own prefixes, but about other ongoing events. No Q's, so next talk is BGP prefix origin validation with Pradosh Mohapatra http://nanog.org/meetings/nanog49/abstracts.php?pt=MTYxMyZuYW5vZzQ5&nm=nanog49 IETF SIDR working group motivations, previous talks gave good introduction for this. Origin validation, right now anybody can announce anything. Mostly mistakes, config errors. Provide solution to human errors as well as malicious attacks. Can either be full prefix hijack, or more specific path hijacking. origin validation framework rPKI--framework to created trusted data store object format, use already defined extensions for objects of interest, the IP prefixes, AS numbers, and way to bind them. ROA is way to bind the elements together. RPKI active side allocation hierarchy flow from root on down database maintenance how do we get data out of database to validate? transaction semantics, certificate checks, etc. Borrowed some slides from Randy to explain how RPKI infrastructure works. slide shows how allocation should work; resource is given from RIR, LIR, ISP, etc., front end provides GUI for getting certificates, ties into secure RPKI store. Once objects are given, they go into distributed framework. secure protocol to fetch objects from the secure store into cache store of rPKI data for routers to make use them. RPKI to router protocol has a secure session to end device, to build database securely, locally, so it can be used for validation. Create hierarchy, with caches in different regions, to scale, on down towards ISP local caches. Route origin authorization explicitly ties prefix to ASN that can originate it. (AS, prefix/mask, max prefix length) Once object store is in place, BGP design needs to be able to query the local cache to validate information. Once router gets data, it creates tree of AS number to prefix mappings. Then, when it receives updates from peers, it looks at the prefix validation database to make sure it's valid. Based on that, it can take action. cache-to-router protocol is secure ssh session, serial based binary exchange. PDUs formatted as TLVs, passes PDUs back and forth. communicated with serial numbers. Router can ask for new data, or cache can send asynchronous notify PDU with latest serial number to signal router to fetch new data. pseudo-code validation logic is listed; since ROA has prefix/length and max length, it can have update in 3 states; valid not found invalid valid is when prefix is within range, and origin matches. invalid is that prefix is in the database, but AS # does not match not found -- cannot always assume database is complete What actions can the router take? BGP update arrives, check, mark origin. validate against the database take action on results; either drop updates that are invalid, or change attributes, etc. Then send out updates that pass validation. Possibly set higher local pref for valid vs invalid updates. Prefer valid over invalid updates, for example Or set MED on items based on their state. Or just drop updates that are not valid. So, decision process changes somewhat for best path algorithm. Could assign different states, and change the bgp best-path alogorithm to evaluate those states automatically. Can do policy overrides for specific peers, or set of prefixes. prefix validation happens only on eBGP side; what do you do for iBGP peers; how do you convey valid or invalid or notfound state? Put an extended community in place for iBGP peers to evaluate. Prototype code is on IOS and IOS-XR Bug Ed Kern to play around with it. RPKI full implementation available as open source Open test bed setup slide is displayed router configuration commands router bgp XXXXX bgp rpki cache bgp orgin-validation bgp bestpath compare- good lord. slide of router show commands is a real eyechart. No questions, let's get some food! Thanks to Netflix, this is a fantastic NANOG, and thanks to Force10 for the great break! LUNCH BREAK!!