The 'voices' of a language may be regarded as different possibilities for assigning semantic roles to syntactic positions. All sentences have to have a subject, so one of the semantic roles (or θ-roles, with θ for 'thematic') is placed in the subject position. The choice of which goes there gives you active voice and passive voice. In some languages there are further possibilities, and 'middle voice' is a grab-bag term for other ways of assigning roles beyond active and passive.

The θ-roles available depend on the verb. An intransitive verb such as 'laugh' has a single θ-role, typically the AGENT (or it may be the EXPERIENCER). Necessarily, this is the subject of the sentence: Mary laughs. A canonical transitive verb such as 'hit' or 'kiss' has two, an AGENT to do it and a PATIENT that it's done to. Active voice is when the agent is subject, passive voice is when the patient is. (An ergative language often has a construction called the antipassive.)

In English middle voice is used to mean the construction where the patient is the subject of an intransitive but active verb. Compare:

Active: John burns down the house.
Passive: The house is burnt down (by John). Middle: The house burns down.
The agent is optional in the passive: The house is burnt down or The house is burnt down by John. The agent role is assigned to a prepositional phrase. But the middle has no agent slot at all. You can't say *The house burns down by John. Moreover, you can't use qualifiers that focus attention on the agent's role either:
John burns down the house deliberately.
The house is burnt down deliberately. (ungrammatical): *The house burns down deliberately.

John burns down the house for the insurance money.
The house is burnt down for the insurance money. (ungrammatical): *The house burns down for the insurance money.
The English middle is therefore often used in situations where there is no agent, as opposed to an unknown agent, which may be indicated by the passive: Paint was smeared all over the walls (no agent expressed but we know someone did it), cf. This paint smears easily (not talking about any particular act of smearing).

The rearrangement of θ-roles characterizes the Greek and Swedish uses mentioned above. Instead of A doing something to B, or B having something done to them by A, the middle is used for causative orientation (A gets B to do something to C), or reciprocal (A and B do things to each other), or benefactive (A does something for B, or for A).

The analysis of English middles as disallowing any reference to agent roles is given in my sources on current theories of syntax, but I am noding this now because I have just noticed an example where it's semantics that can determine whether this is true. It appears to be when the agentive intention is present in the object by its nature. Buildings aren't designed to burn down for the insurance money. However, suppose you have a safety catch or a fire access window that's designed to break easily. Then you could say, questionably:

This catch breaks easily deliberately.
This catch deliberately breaks easily.
This window breaks easily to let people in.