Sound is probably much simpler than most people think it is. It is caused by something moving back and forth.

Acoustic sounds

When you pluck a string, it vibrates. The vibration causes the pressure of the air to change, which in turn makes your ear drum vibrate at the same frequency (the same speed) as the string. The same thing happens when you hit a drum: it moves back and forth, which moves your ear drum back and forth. Human ears can pick up any sound between about 20Hz and 20kHz, meaning they can pick up any physical vibration that repeats between roughly twenty and twenty thousand times a second.

Resonance

Plucking a string gets it to vibrate, but it isn't the only way. The frequency it vibrates at is known as its natural frequency. You could instead play the same note with a different musical instrument, right next to the string, which would also cause it to vibrate. Everything has a natural frequency which causes it to vibrate when something close to it is already vibrating at the same frequency. This isn't necessarily a good thing, and it's the reason why opera singers can break glass and why wind can make bridges collapse. It is also the reason why certain frequencies are boosted more than others by the hollow body of an acoustic guitar.

Electric sounds

Artificial sounds and recordings are usually played using electric loudspeakers (a notable exception being the acoustic phonograph). A speaker contains a permanent magnet, an electromagnet and a cone-shaped surface. When an electric signal is fed through the speaker, the strength of the electromagnet increases and decreases, making it move away from the permanent magnet, then back towards it. This causes the cone attached to the electromagnet to move back and forth. As it can produce complex, non-repeating movement, it can replicate pretty much any sound, making it very versatile.

Digitally stored sounds

Computers and consumer electronics (such as CD players) take electric sounds one stage further. The position the speaker cone should be in at any given time is stored as a number, which the device keeps track of (in the case of CDs, it is a sixteen bit number, meaning sixteen 1s and 0s make up a number between 0 and 65535, which is how many subtle distances the magnets can be from each other). By reading the series of numbers in the right order, and moving the speaker to the appropriate position each time, the device can reproduce a very close approximation of the original sound (with CD players, the speaker position is updated 44100 times every second). This process is known as pulse code modulation, or PCM for short. While records and tape recordings can already reproduce sounds quite accurately, PCM enables recordings to be easily stored and shared. This is good news for people who collect recordings of sounds, such as music or audiobooks. It's also bad news for people who own the copyright on such recordings, if their licenses don't permit such sharing. This problem affects all media that can be digitised, however, not just sound. The various attempts at preventing the unauthorised copying of such materials are known as digital rights management.

The technical side of sounds

Sound is a type of waveform. This means that you can draw a sound as a graph, with the x-axis (the horizontal one) representing time and the y-axis (the vertical one) representing the position of whatever's making the sound. It also means that any sound can be translated into many sine waves of different frequencies and amplitudes. See Fourier analysis and Fourier synthesis for more details about how to break sounds down into their component sine waves, or how to build up sounds from these building blocks. Pipe organs have long been making musical timbres this way.

So there you have it. Sound can be as simple as something moving back and forth very quickly, or as complex as Fourier transforms. Just be careful not to accidently break your window.