A wonderful new piece of software developed by Nullsoft (now 0wned by the man @ AOL), the makers of Winamp. It's Napster's daddy, with the ability to share not only mp3's but also all other mainstream media and archiving files. Based on a distributed client/sever model, it, unlike Napster, is basically unstoppable in that it eats firewalls for breakfast. All computers on Gnutella are interconnected and servers themselves, so you never have to worry about being disconnected.

Currently a windows-only beta, it was designed to go Open Source at v1.0, but thanks to AOL gnutella will probably never see the light of day as non-betaware. Thankfully, a number of clones are already sprouting up, and soon you too will be downloading mp3's, warez, media, porn and just about any other high-bandwith medium from this little jewel. Problems with the current client include memory leakage, cpu utilization, and spamming by wannabe hackers.

head to gnutella.nerdherd.net or #gnutella for the skinny.

The problem with gnutella's distributed layout is one of scalability. Consider this:

1) The Gnutella network has U users
2) Each user adds (on average) B bandwidth to the network
3) Each user makes S searches

The total bandwidth available can be calculated with:

Total Bandwidth = U * B

Since Gnutella doesn't have a central server indexing songs, your request has to be sent to many other users, that they may check to see if they have the file you want. Therefore, when you make S searches:

Search Bandwidth = S * U

But you are not the only person wanting to make searches. There's U users, hence:

Total Search Bandwidth = S * U * U = S * U2

As you can see, the Total Bandwidth rises steadily as users increase evenly, but thte Total Search Bandwidth rises exponentially.

A graph would be nice here, but I can't put one in. Instead, here are some figures:

I will assume that each user brings on average 10 units of bandwidth (throughput/second) to the network. I will also assume that every user performs, on average, one search using 1 bandwidth unit per 100 bandwidth units provided.

Users
	Total network throughput per time unit
			Searches per user per time unit
					Search bandwidth per time unit

	(users*10)	(1/100)		(1/100)*users*users

10	100		0.01		1
20	200		0.01		4
30	300		0.01		9
50	500		0.01		25
100	1000		0.01		100
150	1500		0.01		225
200	2000		0.01		400
300	3000		0.01		900
500	5000		0.01		2500 
1000	10000		0.01		10000
1500	15000		0.01		22500
2000	20000		0.01		40000
4000	40000		0.01		160000
6000	60000		0.01		360000
10000	100000		0.01		1000000
15000	150000		0.01		2250000
20000	200000		0.01		4000000
As you can see, with above 10,000 users, our imaginary network can't support itself. The real gnutella would be more complex than this, but that's basically the maths behind it, as far as I know.

This writeup is an editorial/discussion on this author's perceived oncoming death of Gnutella

Besides the fundamental flaws in the scalability and the observation that it tends to kill connections, Gnutella has a tiny community problem. It was recently observed that 50% of Gnutella was served up by 5% of the users. Primarily no one shares files on these sorts of network, but are willing to quietly steal from them.

This is the downfall of any "file sharing" community: there are many people taking, but not a lot giving. Those few people who are sticking their neck out supplying the repositories of information (whether they be pirates or freedom advocates) are not getting any incentive to keep going. This will kill a community. Gnapster is in a way kind of dying. There are splinters on the network (making it perform better, but not helping the user), and fewer and fewer people are using it and contributing to it. It will eventually fall into a kind of obscurity, I beleive. Justin Frankel, it's author has stopped working on the project, and thus can not improve any of its fundamentals. Many Open Source and port projects are underway to reverse engineer and make the sharing application better, however that progesses at only a limited rate of speed. How much longer can such a network hold up before people leave?

It's a shame. Gnutella had a great chance of becoming something really big and innovative, and in a way it did, but now, I think it will be just another part of the internet fallen into ruin. Distributed networks have their place in theory, but are they ready for practice? Perhaps that will be the new model of the internet one day: people sharing information via their own repository, rather than the server /client topology that we stand by today. Perhaps it is time for the great minds to rethink this problem, and get back to us.

Some thoughts on using the Gnutella network

It sucks bandwidth As mentioned above, the simple act of connecting to a gnutella network* will eat up a fair portion of your bandwidth. Add to this the seemingingly instantaenous downloading of your obscure files by many many people, and it becomes apparent that this is not an application for a dial-up internet connection. While you can restrict the bandwidth a downloading user takes from you, that's not enough if you are on the end of a modem as your websurfing and other net activities are now slow as heck.

It's hard to find anything other than popular stuff If you're after a zipped copy of Photoshop 6 or an illegal copy of a Britney Spears song, then fine. If you're searching for a video of Peter Gabriel in concert, or a high quality picture of mountains then you are out of luck, because...

You can only search on filenames Gnutella doesn't support any kind of metadata in it's searches. So, just like E2 where you can only search on a node's title, not it's contents, with gnutella you are relying on the file sharer to provide you with a verbose and meaningful filename. This also leaves the system open to abuse; I searched for "Photoshop" and while most of the results returned were Photoshop-specific, I also had results like "anime video japanese lolita chugakusei school sex preteen porn cute hentai rape girls idol asian avi mov windows photoshop traffic .mpg" returned to me. Another problem with filename only searching is that, using mp3s as an example, unless the sharer includes the bitrate in the filename, you cannot search based on that attribute.

Downloads often fail It's been my experience on Napster that downloads will fail about 50% of the time. On Gnutella, I've found download failures to happen more like 90% of the time.

It's not anonymous As mentioned above, your IP address is there for everyone to see, making it a haven for wannabe hackers.


* I say "a" gnutella network here, because there is no single network, as there is with napster.

Gnutella is one of those historic things that can potentially change completely the way copyrights are enforced. When it took off, it outdid Napster exponentially.

Nullsoft (Yes, the makers of WinAmp) published an Open-source beta release of Gnutella 0.1. I believe slashdot covered it, and the program and source was removed from the site within hours.

Why yank it so fast? Nullsoft is now fully owned by AOL, and extrememly recently before that, AOL merged with Time-Warner. TW owns a large record company, so it's not in their interests to release this type of program.

Luckily, there were so many people who already downloaded it, and the program wound up on FTP sites around the world. It's now completely out of the hands of Nullsoft, and has improved vastly, and is on every platform. Clones such as LimeWire and Bearshare are very popular now.

Gnutella did better than Napster and Scour, and continues to this day due to a simple concept; decentralization. Some clever programmer at Nullsoft realized that Napster and Scour had a simple flaw; shut down the main server and the network stops. If you could make a decentralized network of nodes that managed to not only download peer to peer but search themselves p2p and connect to others. Every client that signed on was able to locate other clients on the network, and share files and search across the network without a central authority.

Once that sort of network went up, it never went down. Open source developers augmented the code, allowing for NAT and firewalls, improved search speed, the ability to autoconnect to clients and interact with other gnutella distributions that were making their presence known. Today, it has copied many ideas from its competitor Kazaa, and implemented a two-level peer system, where some nodes can become Ultranodes, and forward traffic. This helps alleviate Gnutella's biggest problem of the time, scalability.

There are many popular gnutella applications that implement the protocol and can share amongst themselves. Many make use of the same Open Source libraries:
Limewire
BearShare
Gnucleus
Acquisition for OS X
Shareaza
giFT
Phex
Furi

Doesn't a grassroots movement that grows exponentially give you a warm, tingly feeling?

How it works:


The gnutella protocol makes every node connected to the network equal. Each node is a server, and each is a client. The term coined for this was "servent", and it prevents the entire network from going down because of a shutdown, or government regulation (in theory. See JayBonci's writeup for the problem.)

A brodcast packet on the network begins from a single node, and is brodcast to each connected servent. Each of these servents brodcast the same packet to all their connected servents, and so on. What's to stop a packet from circulating throughout the network forever you ask? Each packet has a TTL, or "Time To Live" value. The TTL value gets reduced by 1 when a packet is forwarded. Some clients will drop a packet on the floor if the TTL is too big. In theory, if every servent is connected to 8 other servents, and a packet has a TTL value of 7, it can reach 87 servents, which is 2097152 individual nodes.

A reply packet begins as a response to a brodcast packet. It's forwarded along the same path untill it gets back to the servent that sent the original brodcast packet.
To keep track of where a packet came from, instead of throwing the IP address around, each packet has a 16 byte message ID, which is just random data. Each servent keeps a hash of the most recent few thousand packets it recieved. The hash table is like a TCP/IP routing table, it stores the message ID and matches it with the IP address it came from. Anonymity is provided because every servent only knows about the servents it's connected to, unless they announce themselves by answering a ping or replying to a query.

How it works (but more in depth):

Connecting:

The initiator opens a TCP connection, and sends
"GNUTELLA CONNECT/0.4\n\n".

Then the receiver sends
"GNUTELLA OK\n\n"

Then it's all packets

Header:
Bytes 0 to 15: Message ID:
The message ID I described earlier. 16 bytes of random data

Byte 16: Function
Byte 17: TTL Remaining:
How many hops left before this packet should be dropped

Byte 18: Hops taken:
How many hops this packet has taken

Bytes 19-22: Data Length:
The length of the function-dependant data. A 32 bit unsigned integer. Little-endian, which is the opposite of network byte order.

Packet types:

Ping:
A ping has no body, and is routed to every available connection.

Pong:
Bytes 0-1: Servent port:
The port of the listening servent. 16 bit, unsigned int in little-endian

Bytes 2-5: Servent IP
The IP address of the listening servent, unsigned integer in big-endian

Bytes 6-9: File Count:
The number of files shared by the servent. 32 bit unsigned integer in little-endian

Bytes 10-14: Total Files Size
Total size of the files shared by the servent, in KB (1024 bytes). 32 bits, unsigned integer in little-endian.

Pong packets are forwarded through the connection the ping came from

Query:

Bytes 0-1: Minimum speed:
The minimum speed of servents which should perform the search and send results. 16 bit unsigned integer in little-endian

Bytes 2+: Search String:
A NULL zero terminated character string.

Brodcast to each available connection, except the one it came from.

Hits (Query Response):

Byte 0: Number of Items:
The number Hit Items (see below) which follow this header. An 8 bit unsigned integer (byte order is irrelevant with one byte).

Bytes 1 - 2: Servent Port:
The listening port number of the servent which found the results. A 16 bit unsigned integer in little-endian byte order.

Bytes 3 - 6: Servent IP:
The IP address of the servent which found the results. A 32 bit unsigned integer in network byte order.

Bytes 7 - 8: Servent Speed:
The speed of the servent which found the results. A 16 bit unsigned integer in little-endian byte order.

Bytes 9 - 10: Unknown:
Unknown.

Bytes 11 +: List of Items:
A Hits Item (see below) for each result found.

Last 16 Bytes: Response ID:
The Response ID of the servent which found the results. 16 bytes of random data.

Forward packet only through the connection from which the Query came.

Push Request: Bytes 0 - 15: Response ID:
The Response ID of the server from which requester wishes to receive a push.

Bytes 16 - 19: File Index:
The File Index of file requested. See Hit Items for more info. A 32 bit unsigned integer in little-endian byte order.

Bytes 20 - 23: Requester IP: The IP address of the servent requesting the push. A 32 bit unsigned integer in network byte order.

Bytes 24 - 25: Requester Port:
The Port number of the servent requesting the push. A 16 bit unsigned integer in little-endian byte order.

Forward packet only through the connection from which the Hits came.

Hits Items:
Bytes 0 - 3: File Index:
Each file shared by a servent has an integer value associated with it. A 32 bit unsigned integer in little-endian byte order.

Bytes 4 - 7: File Size:
The size of the file in octets. A 32 bit unsigned integer in little-endian byte order.

Bytes 8 +: Pathname:
The pathname of the found file. The pathname is double NULL zero terminated.

Downloading

Downloading is all HTTP. A GET request is sent, with a uri that is made from the information in the Hit. It starts with /get/, then the File index number, then the filename.

Uploading

Uploading is done in response to a Push Request. The uploader establishes a TCP connection, and sends GIV, then the File Index number, a colon, the Response ID of the uploader, a slash, the filename, and finally two newlines

Log in or register to write something here or to contact authors.