为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

09-P2P系统

2011-11-05 50页 ppt 6MB 18阅读

用户头像

is_693672

暂无简介

举报
09-P2P系统nullch9 P2P systemch9 P2P systemDefintion of P2PDefintion of P2P1) Significant autonomy from central servers 2) Exploits resources at the edges of the Internet ○ storage and content ○ CPU cycles ○ human presence 3) Resources at edge have intermittent connectivity, ...
09-P2P系统
nullch9 P2P systemch9 P2P systemDefintion of P2PDefintion of P2P1) Significant autonomy from central servers 2) Exploits resources at the edges of the Internet ○ storage and content ○ CPU cycles ○ human presence 3) Resources at edge have intermittent connectivity, being added & removed It’s a broad definitionIt’s a broad definition□ P2P file sharing ○ Napster, Gnutella, KaZaA, eDonkey, etc □ P2P communication ○ Instant messaging ○ Voice-over-IP: Skype □ P2P computation ○ seti@home □ DHTs & their apps ○ Chord, CAN, Pastry, Tapestry □ P2P apps built over ○ emerging overlays PlanetLab Wireless ad-hoc networking not covered hereTutorial Outline (1)Tutorial Outline (1)□ 1. Overview: overlay networks, P2P applications, copyright issues, worldwide computer vision □ 2. Unstructured P2P file sharing: Napster, Gnutella, KaZaA, search theory, flashfloods □ 3. Structured DHT systems: Chord, CAN, Pastry, Tapestry, etc.1. Overview of P2P1. Overview of P2P□ overlay networks □ P2P applications □ worldwide computer visionOverlay networksOverlay networksOverlay graphOverlay graphVirtual edge □ TCP connection □ or simply a pointer to an IP address Overlay maintenance □ Periodically ping to make sure neighbor is still alive □ Or verify liveness while messaging □ If neighbor goes down, may want to establish new edge □ New node needs to bootstrapMore about overlaysMore about overlaysUnstructured overlays □ e.g., new node randomly chooses three existing nodes as neighbors Structured overlays □ e.g., edges arranged in restrictive structure Proximity □ Not necessarily taken into accountOverlays: all in the application layerOverlays: all in the application layerTremendous design flexibility ○ Topology, maintenance ○ Message types ○ Protocol ○ Messaging over TCP or UDP Underlying physical net is transparent to developer ○ But some overlays exploitExamples of overlaysExamples of overlays□ DNS □ BGP routers and their peering relationships □ Content distribution networks (CDNs) □ Application-level multicast ○ economical way around barriers to IP multicast □ And P2P apps !Overview of P2POverview of P2P□ overlay networks □ current P2P applications ○ P2P file sharing & copyright issues ○ Instant messaging / voice over IP ○ P2P distributed computing □ worldwide computer visionP2P file sharingP2P file sharing□ Alice runs P2P client application on her notebook computer □ Intermittently connects to Internet; gets new IP address for each connection □ Registers her content in P2P system□ Asks for “Hey Jude” □ Application displays other peers that have copy of Hey Jude. □ Alice chooses one of the peers, Bob. □ File is copied from Bob’s PC to Alice’s notebook: P2P □ While Alice downloads, other users uploading from Alice.Millions of content serversMillions of content serversKiller deploymentsKiller deployments□ Napster ○ disruptive; proof of concept □ Gnutella ○ open source □ KaZaA/FastTrack ○ Today more KaZaA traffic then Web traffic! □ eDonkey / Overnet ○ Becoming popular in Europe ○ Appears to use a DHTIs success due to massive number of servers, or simply because content is free?P2P file sharing softwareP2P file sharing software□ Allows Alice to open up a directory in her file system ○ Anyone can retrieve a file from directory ○ Like a Web server □ Allows Alice to copy files from other users’ open directories: ○ Like a Web client□ Allows users to search nodes for content based on keyword matches: ○ Like Google Seems harmless to me !Copyright issues (1)Copyright issues (1)Direct infringement: □ end users who download or upload copyrighted works Indirect infringement: □ Hold an individual accountable for actions of others □ Contributory □ Vicariousdirect infringersCopyright issues (2)Copyright issues (2)Contributory infringer: □ knew of underlying direct infringement, and □ caused, induced, or materially contributed to direct infringementVicarious infringer: □ able to control the direct infringers (e.g., terminate user accounts), and □ derived direct financial benefit from direct infringement (money, more users) (knowledge not necessary)Copyright issues (2)Copyright issues (2)Betamax VCR defense □ Manufacturer not liable for contributory infringement □ “capable of substantial non-infringing use” □ But in Napster case, court found defense does not apply to all vicarious liability Guidelines for P2P developers □ total control so that there’s no direct infringement Or □ no control over users – no remote kill switch, automatic updates, actively promote non-infringing uses of product □ Disaggregate functions: indexing, search, transfer □ No customer support Instant MessagingInstant Messaging□ Alice runs IM client on her PC □ Intermittently connects to Internet; gets new IP address for each connection □ Registers herself with “system” □ Learns from “system” that Bob in her buddy list is active□ Alice initiates direct TCP connection with Bob: P2P □ Alice and Bob chat. □ Can also be voice, video and text.We’ll see that Skype is a VoIP P2P systemP2P Distributed ComputingP2P Distributed Computingseti@home □ Search for ET intelligence □ Central site collects radio telescope data □ Data is divided into work chunks of 300 Kbytes □ User obtains client, which runs in backgrd□ Peer sets up TCP connection to central computer, downloads chunk □ Peer does FFT on chunk, uploads results, gets new chunkNot peer to peer, but exploits resources at network edgeWorldwide Computer VisionWorldwide Computer Vision□ overlay networks □ P2P applications □ worldwide computer vision Worldwide Computer VisionWorldwide Computer VisionAlice’s home computer: □ Working for biotech, matching gene sequences □ DSL connection downloading telescope data □ Contains encrypted fragments of thousands of non-Alice files □ Occasionally a fragment is read; it’s part of a movie someone is watching in Paris □ Her laptop is off, but it’s backing up others’ files□ Alice’s computer is moonlighting □ Payments come from biotech company, movie system and backup serviceYour PC is only a component in the “big” computerWorldwide Computer (2)Worldwide Computer (2)Anderson & Kubiatowicz: Internet-scale OS □ Thin software layer running on each host & central coordinating system running on ISOS server complex □ allocating resources, coordinating currency transfer □ Supports data processing & online services Challenges □ heterogeneous hosts □ security □ payments Central server complex □ needed to ensure privacy of sensitive data □ ISOS server complex maintains databases of resource descriptions, usage policies, and task descriptions2. Unstructured P2P File Sharing2. Unstructured P2P File Sharing□ Napster □ Gnutella □ KaZaA □ BitTorrent □ search theory □ dealing with flash crowds NapsterNapster□ Paradigm shift □ not the first (c.f. probably Eternity, from Ross Anderson in Cambridge) □ but instructive for what it gets right, and □ also wrong… □ also had a political message…and economic and legal…NapsterNapster□ program for sharing files over the Internet □ a “disruptive” application/technology? □ history: ○ 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service ○ 12/99: first lawsuit ○ 3/00: 25% UWisc traffic Napster ○ 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws ○ 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus (KaZaA): 300KNapsterNapster□ judge orders Napster to pull plug in July ’01 □ other file sharing apps take over!Napster:how did it workNapster:how did it work□ Application-level, client-server protocol over point-to-point TCP □ Centralized directory server Steps: □ connect to Napster server □ upload your list of files to server. □ give server keywords to search the full list with. □ select “best” of correct answers. (pings)NapsterNapster1. File list and IP address is uploadednapster.com centralized directoryNapsterNapster2. User requests search at server.napster.com centralized directoryNapsterNapster3. User pings hosts that apparently have data. Looks for best transfer rate.napster.com centralized directoryNapsterNapster4. User chooses server napster.com centralized directoryNapster’s centralized server farm had difficult time keeping up with traffic2. Unstructured P2P File Sharing2. Unstructured P2P File Sharing□ Napster □ Gnutella □ KaZaA □ BitTorrent □ search theory □ dealing with flash crowds Distributed Search/FloodingDistributed Search/FloodingDistributed Search/FloodingDistributed Search/FloodingGnutellaGnutella□ Gnutella history: ○ 3/14/00: release by AOL, almost immediately withdrawn ○ became open source ○ many iterations to fix poor initial design (poor design turned many people off) □ issues: ○ how much traffic does one query generate? ○ how many hosts can it support at once? ○ what is the latency associated with querying? ○ is there a bottleneck?Gnutella: limited scope queryGnutella: limited scope querySearching by flooding: □ if you don’t have the file you want, query 7 of your neighbors. □ if they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10. □ reverse path forwarding for responses (not files)Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2pGnutella overlay managementGnutella overlay management□ New node uses bootstrap node to get IP addresses of existing Gnutella nodes □ New node establishes neighboring relations by sending join messagesGnutella in practiceGnutella in practice□ Gnutella traffic << KaZaA traffic □ 16-year-old daughter said “it stinks” ○ Couldn’t find anything ○ Downloads wouldn’t complete □ Fixes: do things KaZaA is doing: hierarchy, queue management, parallel download,…Gnutella Discussion:Gnutella Discussion:□ researchers like it because it’s open source ○ but is it truly representative? □ architectural lessons learned? □ More details in Kurose and Ross, 3rd edition2. Unstructured P2P File Sharing2. Unstructured P2P File Sharing□ Napster □ Gnutella □ KaZaA □ BitTorrent □ search theory □ dealing with flash crowds KaZaA: The serviceKaZaA: The service□ more than 3 million up peers sharing over 3,000 terabytes of content □ more popular than Napster ever was □ more than 50% of Internet traffic ? □ MP3s & entire albums, videos, games □ optional parallel downloading of files □ automatically switches to new download server when current server becomes unavailable □ provides estimated download timesKaZaA: The service (2)KaZaA: The service (2)□ User can configure max number of simultaneous uploads and max number of simultaneous downloads □ queue management at server and client ○ Frequent uploaders can get priority in server queue □ Keyword search ○ User can configure “up to x” responses to keywords □ Responses to keyword queries come in waves; stops when x responses are found □ From user’s perspective, service resembles Google, but provides links to MP3s and videos rather than Web pages KaZaA: TechnologyKaZaA: TechnologySoftware □ Proprietary □ control data encrypted □ Everything in HTTP request and response messages Architecture □ hierarchical □ cross between Napster and GnutellaKaZaA: ArchitectureKaZaA: Architecture□ Each peer is either a supernode or is assigned to a supernode ○ 56 min avg connect ○ Each SN has about 100-150 children ○ Roughly 30,000 SNs □ Each supernode has TCP connections with 30-50 supernodes ○ 0.1% connectivity ○ 23 min avg connectMeasurement studyMeasurement study●Tap point SN-SN session ○Tap point ON-SN sessionEvolution of connections at SNEvolution of connections at SNKaZaA: Architecture (2)KaZaA: Architecture (2)□ Nodes that have more connection bandwidth and are more available are designated as supernodes □ Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants □ Does a KaZaA SN track only the content of its children, or does it also track the content under its neighboring SNs? ○ Testing indicates only children.KaZaA metadataKaZaA metadata□ When ON connects to SN, it uploads its metadata. □ For each file: ○ File name ○ File size ○ Content Hash ○ File descriptors: used for keyword matches during query □ Content Hash: ○ When peer A selects file at peer B, peer A sends ContentHash in HTTP request ○ If download for a specific file fails (partially completes), ContentHash is used to search for new copy of file.KaZaA: Overlay maintenanceKaZaA: Overlay maintenance□ List of potential supernodes included within software download □ New peer goes through list until it finds operational supernode ○ Connects, obtains more up-to-date list, with 200 entries ○ Nodes in list are “close” to ON. ○ Node then pings 5 nodes on list and connects with the one □ If supernode goes down, node obtains updated list and chooses new supernodeKaZaA QueriesKaZaA Queries□ Node first sends query to supernode ○ Supernode responds with matches ○ If x matches found, done. □ Otherwise, supernode forwards query to subset of supernodes ○ If total of x matches found, done. □ Otherwise, query further forwarded ○ Probably by original supernode rather than recursivelyKazaa-liteKazaa-lite□ Hacked version of the KaZaA client □ No spyware; no pop-up windows □ Everyone is rated as a priority user □ Supernode hopping ○ After receiving replies from SN, ON often connects to new SN and re-sends query ○ SN does not cache hopped-out ON’s metadataParallel Downloading; RecoveryParallel Downloading; Recovery□ If file is found in multiple nodes, user can select parallel downloading ○ Identical copies identified by ContentHash □ HTTP byte-range header used to request different portions of the file from different nodes □ Automatic recovery when server peer stops sending file ○ ContentHashKaZaA Corporate StructureKaZaA Corporate Structure□ Software developed by Estonians □ FastTrack originally incorporated in Amsterdam □ FastTrack also deploys KaZaA service □ FastTrack licenses software to Music City (Morpheus) and Grokster □ Later, FastTrack terminates license, leaves only KaZaA with killer service□ Summer 2001, Sharman networks, founded in Vanuatu (small island in Pacific), acquires FastTrack Board of directors, investors: secret □ Employees spread around, hard to locateLessons learned from KaZaALessons learned from KaZaAKaZaA provides powerful file search and transfer service without server infrastructure Exploit heterogeneity Provide automatic recovery for interrupted downloads Powerful, intuitive user interfacenullMeasurement studies by Gribble et al 2002 U. Wash campus Study P2P: 43%; Web: 14% Kazaa objects fetched at most once per client Popularity distribution deviates substantially from Zipf distribution Flat for 100 most popular objects Popularity of objects is short.KaZaA users are patient Small objects (<10MB): 30% take more than hour to download Large objects (>100MB): 50% more than 1 day Kazaa is a batch-mode system, downloads done in backgroundPollution in P2PPollution in P2PRecord labels hire “polluting companies” to put bogus versions of popular songs in file sharing systemsPolluting company maintains hundreds of nodes with high bandwidth connections User A downloads polluted fileUser B may download polluted file before A removes itHow extensive is pollution today? Anti-pollution mechanisms?2. Unstructured P2P File Sharing 2. Unstructured P2P File Sharing Napster Gnutella KaZaABitTorrentsearch theorydealing with flash crowdsBitTorrentBitTorrentBitTorrent: PiecesBitTorrent: Pieces□ File is broken into piecesTypically piece is 256 KBytesUpload pieces while downloading pieces□ Piece selectionSelect rarest pieceExcept at beginning, select random pieces□ Tit-for-tatBit-torrent uploads to at most four peers Among the uploaders, upload to the four that are downloading to you at the highest rates A little randomness too, for probingNATsNATs□ nemesis for P2P□ Peer behind NAT can’t be a TCP server □ Partial solution: reverse callSuppose A wants to download from B, B behind NAT Suppose A and B have each maintain TCP connection to server C (not behind NAT)A can then ask B, through C, to set up a TCP connection from B to A.A can then send query over this TCP connection, and B can return the file□ What if both A and B are behind NATs?2. Unstructured P2P File Sharing2. Unstructured P2P File Sharing□ Napster □ Gnutella □ KaZaA□ search theory□ dealing with flash crowdsModeling Unstructured P2P NetworksModeling Unstructured P2P Networks□ In comparison to DHT-based searches, unstructured searches aresimple to buildsimple to understand algorithmically□ Little concrete is known about their performance □ Q: what is the expected overhead of a search? □ Q: how does caching pointers help?ReplicationReplication□ ScenarioNodes cache copies (or pointers to) content• object info can be “pushed” from nodes that have copies • more copies leads to shorter searchesCaches have limited size: can’t hold everythingObjects have different popularities: different content requested at different rates□ Q: How should the cache be shared among the different content?Favor items under heavy demand too much then lightly demanded items will drive up search costsFavor a more “flat” caching (i.e., independent of popularity), then frequent searches for heavily- requested items will drive up costs□ Is there an optimal strategy?ModelModel67□ Givenm objects, n nodes, each node can hold c objects, total system capacity = cnqi is the request rate for the ith object, q1 ≥ q2 ≥ … ≥ qmpi is the fraction of total system capacity used to store object i, ∑pi = 1□ ThenExpected length of search for object i = K / pi for someconstant Kas soon as object foundNetwork “bandwidth” used to search for all objects: B = ∑qi K / pi□ Goal: Find allocation for {pi} (as a function of {qi}) to minimize B□ Goal 2: Find distributed method to implement this allocation of {pi}Some possible choices for {pi}Some possible choices for {pi}□ Consider some typical allocations used in practiceUniform: p1 = p2 = … = pm = 1/m• easy to implement: whoever creates the object sends outcn/m copiesProportional: pi = a qi where a = 1/∑qi is a normalizationconstant• also easy to implement: keep the received copy cached□ What is B = ∑qi K / pi for these two policies?Uniform: B = ∑qi K / (1/m) = Km/a Proportional: B = ∑qi K / (a qi) = Km/a□ B is the same for the Proportional and Uniform policies!In between Proportional and Uniform In between Proportional and Uniform 69□ Uniform: pi / pi+1 = 1, Proportional: pi / pi+1 = qi / qi+1≥ 1□ In between: 1 ≤ pi / pi+1 ≤ qi / qi+1□ Claim: any in-between allocation has lower B than B for Uniform / Proportional □ Proof: Omitted here□ Consider Square-Root allocation: pi = sqrt(qi) / ∑sqrt(qi)□ Thm: Square-Root is optimal □ Proof (sketch):Noting pm = 1 – (p1 + … + pm-1)write B = F(p1, …, pm-1) = ∑m-1 qi/pi + qm/(1- ∑m-1 pi) Solving dF/dpi = 0 gives pi = pm sqrt(qi/qm)Distributed Method for Square-Root AllocationDistributed Method for Square-Root Allocation70□ Assumption: each copy in the cachedisappears from the cache at some rate independent of the object cached (e.g., object lifetime is i.i.d.)□ Algorithm Sqrt-Cache: cache a copy of object i (once found) at each node visited while searching for object i□ Claim Alg
/
本文档为【09-P2P系统】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索