Smaller, and faster.

"Linux is bloated." That's the reaction many have when they see the six CD-ROM set of Red Hat Linux Deluxe 7.1 alongside the single CD of Windows 2000 Professional, its closest proprietary competitor. This is a misconception; a typical GNU/Linux distribution includes much more functionality than a Windows distribution. (But just try to outmarket Microsoft.) Even so, I've figured out a slick way to reduce the binary footprint of a distribution:

Don't ship as many binaries.

For example, Red Hat for a given architecture has about 2 or 3 CDs of binaries and one CD of SRPMs (source packages). Why not take advantage of the fact that operating system distributions that contain software licensed under the GNU General Public License will ship with source code and a compiler? Ship binaries only of the kernel, the compiler, essential system libraries, and a bare-bones userland compressed with UPX so that it decompresses itself at runtime. Ship the rest of the distribution only as tightly compressed source code, which saves one CD right off the bat. This also allows one set of CDs to work to some extent on multiple architectures. When installing the OS, copy this minimal binary distribution to the destination partition. While the files are being copied, compile the basic packages necessary for a working command line system during installation; don't optimize better than -O.

Now the fun begins.

When you restart your box, you'll have a barely usable system. The compiler will generate the rest of the system in the background, using unused CPU time à la distributed.net. Until it's done, you can play around on the command line or just go to work or let your box sit overnight (assuming a recent Athlon system; you do not want to build Linux from scratch on a 486) until the applications that you told the installer you want done first are done. If you try to start an app that isn't compiled yet, it will be pushed to the front of the queue. By the time everything's done, they'll be compiled and optimized especially for your processor's microarchitecture, producing a faster overall system. If you told the installer that you don't want to keep source around, the source would be deleted.

The general concept for this already exists as the Linux From Scratch distribution. All that remains is to automate the build process and put the distribution on CDs; an effort to do this is underway (http://alfs.linuxfromscratch.org). (In January 2002, I discovered that Gentoo Linux (http://www.gentoo.org/) and Sorcerer GNU Linux (http://sorcerer.wox.org/) had accomplished this.)

One size does not fit all.

As the others mentioned, another way to eliminate bloat is to eliminate unnecessary packages. Do you really need fifteen text editors, when most of the features you need can be found in Nano or Emacs? There are versions of Slackware that fit on 15 floppies (thanks Gone Jackal). Even if you don't want to go that far, you don't need workstation packages in a server distribution or server packages in a workstation distribution (unless the workstation is the server). Why include an office suite and its associated clip art when all you want is the basic userland, Apache, MySQL, Perl, and the Everything Web System? If you're building a firewall for NAT, you need even less; xerces told me about Linux Router Project and Tom's Really Tiny Boot Disk.

don't like this? /msg me; i'll address the issues
Linux distributions are already small; they can easily be smaller than the smallest Win2K installation with equivalent or greater functionality, depending on purpose.

Compare the contents of the six CDs of Red Hat, to the one cd of Windows 2000. The OS itself is equivalent to Windows 2000 Server almost out of the box, and resides on one cd.
The other five are simply EXTRA. Often, those CDs include the source code, which itself can be several times the size of the binaries alone. The source may compress well, but that just adds another step in the installation process.

When one installs an OS, whether it be Linux or Win2K, they usually expect it to be entirely functional after installation (barring some minor tweaking).
I'd hate to have to wait a day while my system compiled itself, unless I planned on doing that (which is why you get the source code on those other 5 CDs!)
On my linux box, MySQL takes over 4 hours to compile (AMD K5-100). I can't imagine how long the kernel would take, not to mention the entire compliment of libraries, binaries, and various other utilities that go towards making a distro fuctional.

Making distros smaller isn't really necessary. Making an install on your machine smaller requires only a bit of planning, and skill, on your part. Or you can pick one of the many pre-planned tiny distros out there. I believe there are a couple floppy-based distros. This is an advantage of <freely-available-source> operating systems.

And yes, as yerricide noted, Redhat is not the only option.

The people who will decide between linux and Win2k will, hopefully, do some research beforehand and realize that in the end, they ARE getting shortchanged by getting only one disc!

Small Distros:
A few gotchas:
While compressed Source code is about the same size as object code (debian and suse manage to fit their source and binaries on the same number of CDs), but it often expands by a factor of 5 times or more when it's unpacked and compiled (MAME goes from 7Mb in a solid archive to 220Mb unpacked and compiled on my machine)

Compilation takes ages, especially on a 486. It could take weeks to install a CDs worth of source code, and some packages (X windows, Mozilla, The Kernel, to name a few) are extremely large and difficult to compile (GCC takes hours to compile, even on high-end systems). This would effectively rule out installing on systems with less than 64Mb of RAM. (And one of linux's big plus points is that it can be installed on a 486 and still be usable.

Having source code sitting on the hard disk while the system is installing itself would mean that it takes up at least twice the hard-disk space it normally would, and once it's finished, the hard-disk would end up half-full. Reading it off the CD would be better, but it would mean that the CD was whirring all night.

Looking at things like MAME, where the majority of code is spent in a loop, and you'd expect the compiler options to have a significant impact, the difference in speed between different processor versions is negligable (less than 10 percent). You'd see roughly the same performance just using binaries optimised for i586 instead of i386 (which distributions like mandrake provide).

The whole point of distributions is that they come compiled 'so you don't have to'. The kind of person that would use something like this is quite capable of getting the source and installing themselves.

A better, but more time-consuming solution to 'linux bloat' would be for the distributer to put some effort into hand-picking packages, so that it doesn't by default install 15 different shells, 20 different tetris games, 5 different window managers, etcetera... Fewer packages rather than smaller packages. Who could be arsed to choose between 25 different text editors (especially with redhat's installer, or dselect)?

It would be nice to see a linux distribution that contained only a few of each type of package, and a wide variety of types, that all fitted well together, all had the same widgets/command-line syntaxes, and could install sensible combinations of these packages with a minimum of fuss. If Microsoft can do this with windows and office, why can't a company like redhat?

ssd - Under most open-source licenses, it is sufficient to provide an offer to supply source-code en lieu of the actual source, which is how companies like red hat can release binary-only distributions.
2003 - And plus ca change. Now there is Gentoo Linux, which does this very thing. Installations can take up to a week on non-bleeding-edge hardware. But it's optimized, which makes it all the more 1337. On the flip side come heavyweights like debian woody, availible on 5 CDs of binaries, or 1 DVD (turn it over for the sourcecode). Same 5 inch disk redhat 5.2 took up, but now there's 5 times more stuff.
Several points here...

what is bloat?

Is having 6 cd's bloat, or is having a minimum installation footprint of 600M bloat? Is a forced installation of sendmail bloat on a machine that will never be an e-mail server? Is it bloat to make the print spooler (and thus ghostscript) a dependancy to multiple packages that are merely able to print, but dont require it (especially when the user owns no printer)?

Is it bloat to include multiple packages of the same purpose (KDE vs. Gnome, sendmail vs. exim, bison vs. yacc, shell fun such as ksh zsh bash tcsh ash sash), or is that just including a wide set of flavors for users? I've always thought the large variety of choices is one of the things that makes Linux fun to use.

Perhaps it is bloat to include the source. (Ok, so it's a legal requirement. Does that make it not bloat?)

background installation is cool

I'd like to see the installer do more stuff in the background. Wouldn't it be neat if it could actually start installing packages before you are done selecting them all? Perhaps it could install the required base packages while you are choosing the rest.

Of course, while this would be very neat, it isn't really possible on small memory machines, and on high end machines, the whole install typically takes less than 5 minutes anyway, so perhaps it's just silly. I still think it'd be a cute trick.

automatic compilation during installation

lj points out that source is larger than binary. On the surface, this is true; however, many of the binaries legally require the source to be included--so if you include one, you have to include both anyway. Is the source still larger?

Also, when you say the binaries are smaller, are you considering the binaries for every architecture, or just your favorite one? Wouldn't it be cool to have a distribution with minimal binaries for every architecture, and then source, and an automated complilation process for the rest?

Of course, this wouldn't work for packages that can't have their compilation process automated, but then, I don't think this really applies to most things that have rpm's.

As to the CD whirring all night or the HD being full, the obvious solution to this is to copy the source you need from the cdrom, compile it, and delete it before going to the next package. I don't think this would be a big deal or a significant performance penalty.

The argument for instant usability and slowness on many machines is a very good point. It costs less than a dollar to produce and mail two cdrom's anyway, so what's the savings?

Log in or register to write something here or to contact authors.