What is the arXiv?
The arxiv is an electronic data storage system accessible via email, ftp and the WWW and based in Cornell (formerly in Los Alamos / LANL), for the use of physicists of all persuasions. If you have an account there, you can submit (upload) your preprint and its title and abstract will appear in a listing the next day (or over the weekend) which is read by thousands of physicists worldwide. (Check it out at http://arxiv.org). If any of them are interested, they can even download the entire paper and print it out at their leisure. Thanks to the SPIRES HEP database, the references in your paper will (after a few days) be accessible from its automatically created webpage as hypertext links. Also in the electronic listing will be the time and date of the preprint, the authors and the email address from which it was sent. If in the fullness of time anyone decides to refer to your paper, you will also be able to link to your citations, and compare them to your competitors' - for in the publish or perish world of physics, you not only have to publish, you have to be seen to be publishing.
To Everythingians, this sort of remote electronic storage of cross-linked documents should sound familiar...
When was it born? - A bit of history
The first 'eprints' appeared on the Los Alamos server in late summer 1991. At that time, everything had to be done via ftp, and the daily listings were sent out by email. (And I was barely out of short trousers). The system was put together because theoretical high-energy physicists had got fed up with the slow and selective workings of the traditional preprint distribution system - namely put a few dozen copies in the mail to your scientific buddies. They wanted to know what other people were working on RIGHT HERE, RIGHT NOW. Opinion is divided on whether this was because the pace of change was getting so fast that work became outdated after a couple of months, or just because people were getting nosier. The earliest eprints were 'hep-th' - short for "high energy physics - theory". This meant string theory, black holes, supergravity, supersymmetry, and that sort of stuff. The first ever was hep-th/9108001:
Title: Exact Black String Solutions in Three Dimensions
Authors: James H. Horne, Gary T. Horowitz
Comments: 17 pages
Journal-ref: Nucl.Phys. B368 (1992) 444-462
9108 is, obviously, the year and month. 001 is the number, which starts from scratch at the beginning of each month. (The journal
reference was added later - the point was to read the other guys' work before it was even submitted to the journal.) Before very long, the number of eprints per month was up in the low hundreds. And the other kinds of physicists wanted in: 'hep-lat' (lattice
) also started in '91, (but averaged under ten a month for some time) 'hep-ph' (phenomenology1
), 'astro-ph' (astrophysics
) and 'cond-mat' (condensed matter
) started spring '92, 'gr-qc' (general relativity
and quantum cosmology
) in summer '92, 'nucl-th' (nuclear theory
) that autumn
, 'hep-ex' (experiment
) in '94, and so on. Go to http://arxiv.org for the full list.
For much of its existence so far, the arxiv's URL was http://xxx.lanl.gov . The story behind the x's is intriguing. In the mid-90's, the founding fathers saw web-crawling robots come along and try to download all of the huge amount of data that lay on the server. If unchecked, the traffic generated by robots would dwarf that of legitimate users, and lead to some awkward questions at LANL's computing service. The potential problem is worse because retrieving a preprint actually required CPU time: they are stored mostly as LaTeX files which have to be run and converted into PostScript or PDF for the convenience of users.
A two-tier defence against robot web-crawlers was implemented: first, use an 'xxx' domain name, which discouraged the robots because they thought it was porn. (Actually, preprints about string theory would get me pretty horny if I were a web crawler robot. But I digress). Second, block access from any site from which multiple rapid-fire requests were received - typically, this might happen while the server was compiling some bit of LaTex, during which time a page is displayed asking (human) users to wait a few seconds. So, if you get impatient to see that paper and press reload a bit too enthusiastically, you could in theory get banned from the source of all knowledge. See http://arxiv.org/RobotsBeware.html. The robots are allowed to see the abstracts but not the papers themselves.
The arxiv, like all good institutions, has had many names over the years: 'electronic bulletin board', 'xxx eprint archive', 'Los Alamos preprint server', etc. Recently, the server moved to Cornell because of the migration of its guiding spirit Paul Ginsparg - himself a physicist with notable contributions in the 'hep-th', 'hep-lat' and 'cond-mat' fields. But the flood tide of preprints continues unabated.
For many (mostly high energy) physicists, 'checking the arxiv' is the first thing they do when getting into work. Well, maybe after getting a coffee. If someone else has had your idea, you want to know. Ditto if they forgot to reference your paper, or (greatest of joys) got something stupendously wrong so that you can email them about it in a concise but devastating manner.
When the latter happens - or in general when people realise that the preprint has a few typos or needs revising for some reason (e.g. you submitted it to a journal and the referee demands changes) - the result is a replacement. Unlike for E2 writeups, every replacement gets a date stamp and version number. It's quite embarrassing if you get to v4 or greater. The arxiv administration don't like replacements (they clutter up the place) but it's too much to ask physicists to wait before they have a definitive, completely and utterly correct version before posting.
You can submit 'comments', but only on your own papers. If you use any curious macros or have a huge number of pictures, then it's best to mention them here. Generally not a good idea to put things like 'Bitchin'' or 'Steven Weinberg can kiss my ass' in the 'Comments' section. Or even 'This paper has been rejected twice by well-known journals please email me with your comments' which translates as 'I am a paranoid crank, please ignore me'. The nicest I ever read was for hep-th/9908142:
Title: String Theory and Noncommutative Geometry
Authors: Nathan Seiberg, Edward Witten
Comments: 100 pages, sorry.
Meaning (for the rest of us), we've found the answers to all the things you were working on and now, sorry, you'll have to read our enormous paper just to keep up. And believe me, 100 pages by Ed Witten
take some reading.
1. Phenomenology in this context means the bit of high energy physics (the subject previously known as particle physics) concerned with trying to bring theory together with the experimental results. This is needed because the theories are so difficult to solve that it takes a full-time corps of theorists just to work out the mathematics. Conversely the experiments are so big and complicated that it takes a small army of people to convert the signals that come out into physically enlightening form. Phenomenologists live in the no man's land between the two.