Okay, so let me try and make sense of the theoretical logic behind thermodynamics and statistical physics. My intent is to develop the basic equations from first principles without resorting to either quantum mechanics (which is what Landau and Lifshitz are doing) or weird Carnot-machines. Let's see how far I can go.

The Axiomatic Way

Equilibrium thermodynamics is based on the principle that the state of a system is completely determined by its volume $V$ and pressure $p$. Furthermore, there is the concept of equilibrium (the zeroeth law). Two systems are in equilibrium when a function $F$ (unknown for the moment) is zero: $F(p_1,V_1,p_2,V_2)=0$. Being in equilibrium is also considered a transitive relationship: thus, if $F(p_1,V_1,p_2,V_2)=0$ and $F(p_1,V_1,p_3,V_3)=0$, then $F(p_2,V_2,p_3,V_3)=0$. Solving for $p_3$ in the latter two and denoting the solution function as $f(p,V,V_3)$, we get

$p_3=f(p_1,V_1,V_3)=f(p_2,V_2,V_3).$

Since the equality persists regardless of the value of $V_3$, we conclude that there exist an equation of state that has the same value for systems that are in equilibrium:

$\phi(p_1,V_1)=\phi(p_2,V_2).$

We define the empirical temperature $\theta$ as

$\theta=\phi(p,V).$

From the basic definition of work, $W=\int\vec{F}\cdot\vec{dr}$, we derive the expression

$W=-\int p dV,$

where the negative sign is used conventionally to indicate that work is invested into the system when it is compressed (i.e., its volume decreases.)

In an isolated system, energy is conserved. Any work done on the system must be accompanied by a change in the system's internal energy $U$:

$\delta U=W.$

For a system not in isolation, its interaction with its surroundings can be described by the difference,

$Q=\delta U-W,$

or, in infinitesimal notation:

$dQ=dU-dW,$

where $dQ$ is the infinitesimal quantity of heat exchanged between the system and its environment.

Now I must stop for a moment and remark on something. Many textbooks use some notation to distinguish between $dQ$ and $dW$ on the one hand, and $dU$ on the other. To $dU$, there corresponds a quantity $U$, the internal energy, that changes at the rate $dU$, but the same is not true for $Q$ or $W$! There is no "reservoir of work" from which we take, or to which we put back some infinitesimal amount of work $dW$. Similarly, there is no "reservoir of heat" (indeed, this would be the 19th century concept of caloricum). Some authors use a crossed $\eth$ symbol to denote $\eth Q$ and $\eth W$, while other authors avoid using a "$d$-notation" altogether, and use the language of differential forms instead. In this language, we would say that neither $\psi$ nor $\omega$ are closed, meaning that there are not necessarily quantities $Q$ or $W$ such that $\psi=dQ$ or $\omega=dW$. The quantity $dU$, on the other hand, is closed, which implies the existence of $U$.

I will be pragmatic (if it quacks like a duck...) and I'll continue to use the $d$-notation simply because it is practical. Just keep in mind that writing $dX$ doesn't always imply that there exists an $X$.

We can rewrite our previous equation using the definition of work as

$dQ=dU+pdV.$

This equation is the first law of thermodynamics, expressing the idea of energy conservation.

The second law of thermodynamics states, simply put, that there are irreversible processes. From this, two conclusions follow. First, that we can express $dQ$ in the form $dQ=\lambda d\phi$; and second, that in this formulation, we can replace $\lambda$ with an "absolute temperature" that is uniquely determined up to a multiplicative constant. Let me demonstrate why these statements are true.

What is an irreversible process? Here's one way to think about it. The state of a system is fully determined by $p$ and $V$, as per our definition. In other words, we don't consider a change in the internal energy $U$ a change of state, if $p$ and $V$ remain constant. So here's an excellent question, then. Suppose you burn some fuel. The internal energy $U$ changes as chemical energy is converted into, what? A change in pressure and/or volume, which means the ability to do mechanical work? Or a change in heat?

A process is called adiabatic if $dQ=0$, i.e., the system does not exchange heat with its environment. The second law states, simply put, that there are states of a system (represented by different values of $p$ and $V$) that cannot be reached by an adiabatic process, i.e., a curve along which $dQ=0$.

That this simple statement has far-reaching consequences is due to Carathéodory's theorem, which states that in this case, there exist functions $\lambda$ and $\phi$ such that $dQ=\lambda d\phi$.

Of course this alone does not determine the functions $\lambda$ and $\phi$. To take that step, it is necessary to make the assumption that heat is additive. In particular, if we have two systems (ignoring interaction energies between the two), then the total change in energy is the sum of individual changes, i.e., $dQ=dQ_1+dQ_2$, and hence

$\lambda d\phi=\lambda_1d\phi_1+\lambda_2d\phi_2,$

or

$d\phi=\frac{\lambda_1}{\lambda}d\phi_1+\frac{\lambda_2}{\lambda}d\phi_2.$

But this means that there exist functions $f_1$ and $f_2$ such that $\lambda_1/\lambda=f_1(\phi_1,\phi_2)=\partial\phi/\partial\phi_1$, and $\lambda_2/\lambda=f_2(\phi_1,\phi_2)=\partial\phi/\partial\phi_2$. Consequently, $\lambda_1/\lambda_2=f_1/f_2$, or

$\log\lambda_1-\log\lambda_2=\log\frac{f_1}{f_2}.$

On the right-hand side, $f_1/f_2$ is not a function of the empirical temperature $\theta$, so if we take the partial derivative with respect to $\theta$, we get

$\frac{\partial(\log\lambda_1)}{\partial\theta}=\frac{\partial(\log\lambda_2)}{\partial\theta}.$

This can only happen if both sides of this equation depend only on $\theta$, i.e., there exists a function

$g(\theta)=\frac{\partial(\log\lambda)}{\partial\theta}.$

What if we replace θ with another function $T(\theta)$? Not just any $T$, but one that satisfies the equation,

$\frac{dT}{d\theta}=Tg(\theta).$

In this case, we have

$\frac{\partial(\log\lambda)}{\partial T}=g(\theta)\left(\frac{\partial T}{\partial\theta}\right)^{-1}=\frac{1}{T}=\frac{\partial(\log T)}{\partial T}.$

The equation $dT/d\theta=Tg(\theta)$ determines $T$ up to a multiplicative constant: $T(\theta)=Ce^{G(\theta)}$, where $dG/d\theta=g(\theta)$. On the other hand,

$\log\lambda=\log T+\log K,$

where $K$ is independent of $T$. Or,

$dQ=TK(\phi)d\phi.$

Solving the equation $K(\phi)d\phi=dS$ for $S$ (determining $S$ up to an additive constant), we get

\begin{align}dQ&=TdS,~~~~{\rm or}\\\frac{dQ}{dS}&=T.\end{align}

In this equation, $T$ is called the absolute temperature (remember, our reasoning determined $T$ up to a multiplicative constant, and if we make this constant positive, $T$ will always be positive as well) and $S$ is called the entropy. We can rearrange our equation to read

\begin{align}TdS-dU-pdV&=0,~~~~{\rm or}\\dU&=TdS-pdV.\end{align}

This is a fundamental equation of thermodynamics that combines the first and second laws.

Remember our earlier discussion about $U$ being a "proper" function? The existence of $U$ implies that, if we write it as a function of $S$ and $V$, we have

$U=\frac{\partial U}{\partial S}dS+\frac{\partial U}{\partial V}dV.$

From this, we obtain

\begin{align}T&=\frac{\partial U}{\partial S},~~~~{\rm and}\\p&=-\frac{\partial U}{\partial V}.\end{align}

* * *

Here are two questions to which I failed to provide an answer up to this point. That is to say that I know these statements to be true, I just don't know how to derive them rigorously, using only the axioms of thermodynamics and nothing else (nothing from statistical physics, in particular):
1. Why is $U$ expressible as $U(S,V)$? (I.e., why is the state describable by these two coordinates? Is it simply that the system is two-dimensional by definition, and $S$ and $V$ are independent, or is there something else needed to formally prove this bit?)
2. Why is $S$ additive? Does it follow from the fundamental equation when we divide a system into two parts, and from the fact that the temperatures in the two parts are equal, and the energies and volumes are additive?

* * *

Since the order of partial differentiation doesn't matter, we also have

$\frac{\partial T}{\partial V}=\frac{\partial(\partial U/\partial S)}{\partial V}=\frac{\partial(\partial U/\partial V)}{\partial S}=-\frac{\partial p}{\partial S}.$

This and other similar relations, called Maxwell's relations, are most compactly expressed by the Jacobian determinant:

$\frac{\partial(T,S)}{\partial(p,V)}=1.$

The Statistical Way

The results of axiomatic thermodynamics are derived from the basic postulates, or axioms, that are the "Laws of Thermodynamics", without making any assumptions about the properties of the underlying matter. So here's a good question: is it possible to derive at least some of those postulates if we make assumptions about the nature of matter?

A good starting point is to take a collection of $N$ identical particles of mass $m$. (A similar reasoning can be developed for particles of varying mass, it just gets more complicated.) We assume that the particles behave in accordance with the laws of classical mechanics. What this means is that the particles' motion is described by second-order differential equations that contain no explicit dependence on the first derivative, i.e., $x''=f(x,t)$. For bodies whose motion is governed by such equations, there are seven additive constants of motion: the energy $E$, the three components of the momentum vector $\vec{P}$, and the angular momentum bivector $\vec{M}$.

Now we divide the system of $N$ particles into two parts, with particle counts $N_1$ and $N_2$ ($N_1+N_2=N$.) We can, in principle, measure the individual positions and momenta of each particle; we can, furthermore, establish a probability density function that tells us the likelihood that the system's particles are in a particular state (as described by their positions and momenta.) We now make a crucial assumption: namely, that the probability density function $\rho$, for the first subsystem does not depend on the state of the second subsystem, and vice versa. In other words, the two subsystems are statistically independent from each other. (In particular, this means that there is no long-range interaction between the two, which means we're in trouble of we're trying to examine the behavior of a charged plasma or a self-gravitating gas. But, I digress.)

The combined probability, then, that the system as a whole is in a particular state is the product of individual probabilities: $\rho=\rho_1\rho_2$, or $\log\rho=\log\rho_1+\log\rho_2$.

Next, we invoke Liouville's theorem which states that, since $\rho$ is not explicitly a function of time, it'll stay constant along the path of a particle in phase space. In other words, it is a constant of motion. Moreover, $\log\rho$ is an additive constant of motion. But we already know that for a mechanical system, the only additive constants of motion are $E$, $\vec{P}$, and $\vec{M}$, so $\log\rho$ must be expressible as a linear combination of these:

$\log\rho=\alpha+\beta E+\vec{\gamma}\cdot\vec{P}+\vec{\delta}\cdot\vec{M}.$

In other words, the statistical distribution of a system is completely determined by its macroscopic properties, namely its total energy $E$, momentum $\vec{P}$, and angular momentum $\vec{M}$.

Furthermore, by choosing a "comoving" coordinate system, we can eliminate $\vec{P}$ and $\vec{M}$, so we are left with

$\log\rho=\alpha+\beta E.$

One observation at this point is that we need not concern ourselves with the specific nature of the statistical distribution function $\rho$. As a matter of fact, any function will do, so long as it produces the appropriate macroscopic properties. In particular, we may choose

$\rho=C\delta(E-E_0),$

where $C$ is some constant and $\delta$ is the Dirac delta function. What this expression is saying is that a probability distribution that gives a probability of 1 for the system to be in the state $E=E_0$ and a probability of 0 for all other values of $E$ is consistent with the formulation so far.

Entropy is defined as the negative average of $\log\rho$:

$S=-\langle\log\rho\rangle=-\int\rho\log\rho dpdq,$

where the integration is meant to be across all possible values of all $p$ and $q$ components. (The total number of these components, i.e., the degree of freedom of the system, will usually be $2DN$ where $D$ is the dimensionality of the system and $N$ is the number of particles involved.)

Since $\rho$ is between 0 and 1 (it is a probability distribution function), the expression under the integral sign will be negative, so the RHS will be positive.

Furthermore, while $\int\rho dpdq=1$ by definition since ρ is a probability distribution function, there are macroscopic states of the system where $\rho$ is nearly flat, and there are cases where it produces a sharp peak. The expression for $S$ will be highest in the latter case. I have no idea how to prove this in the general case (indeed, how to measure "peakiness" in the general case) but if $\rho(x)$ is a normal distribution, $\lambda\rho(\lambda x)$ forms a sharper peak if $\lambda$ is greater. In this case, the definite integral will be $-\lambda(1+\log 2\phi)/2$, so $S=\lambda(1+\log 2\pi)/2$, which is indeed bigger if $\lambda$ is bigger.

So now take a system and divide it into two subsystems. For both subsystems, $\int\rho_idp_idq_i=1$ ($i=1,2$). The combined entropy of this system is

$S=-\langle\log\rho\rangle=-\int\rho_1\rho_2\log\rho_1\rho_2 dp_1dq_1dp_2dq_2.$

Since $\rho_1$ is a function of only $p_1$ and $q_1$, and $\rho_2$ is a function of only $p_2$ and $q_2$, the integral can be expanded and we get

\begin{align}S&=\int\rho_1\rho_2\log\rho_1\rho_2dp_1dq_1dp_2dq_2\\
&=\int\rho_1\rho_2\log\rho+1+\rho_1\rho_2\log\rho_2dp_1dq_1dp_2dq_2\\
&=\int\rho_2\left(\int\rho_1\log\rho_1dp_1dq_1\right)dp_2dq_2+\int\rho_1\left(\int\rho_2\log\rho_2dp_2dq_2\right)dp_1dq_1\\
&=\int\rho_2S_1dp_2dq_2+\int\rho_1S_2dp_1dq_1\\
&=S_1\int\rho_2dp_qdq_2+S_2\int\rho_1dp_1dq_1=S_1+S_2.\end{align}

Hence, $S$ is additive.

* * *

The second law of thermodynamics states that in an equilibrium system, $S$ is maximal. I am looking for a derivation of the second law from the idea that the system is in a maximum probability state. This means that $\rho=\rho_1\rho_2$ is maximal, which presupposes that the two subsystems and their respective probability distributions $\rho_1$ and $\rho_2$ are respectively in their "peakiest" states to allow $\rho$ to be maximal. But, I do not yet know how to express this mathematically.

* * *

So for now, let's just accept that $S$ is maximal. (Landau and Lifshitz do pretty much the same thing, so I am in respectable company.)

If $S$ is maximal in the equilibrium state, in a non-equilibrium, as it tends towards equilibrium, $dS/dt\gt 0$. (This, really, is the second law.)

So now let's say that a system is in equilibrium with maximal $S$. Divide the system into two parts. Remember, both $S$ and $E$ are additive, so $E=E_1+E_2$ and $S=S_1+S_2$. Since $S$ is maximal, it'd be maximal with respect to how we distribute the energy in the two systems: $dS/dE_1=0$. But

$\frac{dS}{dE_1}=\frac{dS_1}{dE_1}+\frac{dS_2}{dE_1}\frac{dE_1}{dE_2}=\frac{dS_1}{dE_1}-\frac{dS_2}{dE_2}=0.$

The quantity $dS/dE$ is therefore the same for all subsystems of an equilibrium system. By definition, $dS/dE=1/T$, and we call $T$ the temperature.

If the entropy is a function of not just the energy but other variables, we'll need to use partial derivatives: $\partial S/\partial E=1/T$, or

$\frac{\partial E}{\partial S}=T.$

There is another important relationship involving the energy and partial derivatives. By definition, pressure is the force perpendicular to a surface divided by area. Force is the change in energy over distance, i.e., $F=-dE/dl$ (the negative sign indicating that as force is being applied, the volume of a body decreases.) Dividing by a surface element $dA$ perpendicular to $dl$ (such that $dV=dAdl$), we get $p=-dE/dV$. As before, if the energy is a function of not just the volume but other variables, we need to use partial derivatives:

$\frac{\partial E}{\partial V}=-p.$

Combining these two equations, we obtain the fundamental equation of thermodynamics:

$dE=TdS-pdV.$

The Gibbs and Maxwell Distributions

Remember that earlier, we determined that $\log\rho$ is a constant of the motion, so $\log\rho=\alpha E+\beta$? Clearly in this equation, $\alpha$ is just a function of our choice of units for $E$; $\beta$, on the other hand, would serve as a normalization constant to ensure that $\int\rho dpdV=1$.

As we discussed earlier, the macroscopic properties of the system do not change if we use $\rho=\delta(E-E_0)$. From this, two important things follow.

First, denoting the volume element in phase space, $dpdV$, with $d\Gamma$, we have

$\int\rho dpdV=\int\delta(E-E_0)d\Gamma=\int\delta(E-E_0)\frac{d\Gamma}{dE}dE=d\Gamma/dE,$

from which $d\Gamma/dE=1$.

Second, the entropy being defined as

$S=-\int\rho\log\rho d\Gamma=-\int\delta(E-E_0)(\alpha E+\beta)d\Gamma=-\int\delta(E-E_0)(\alpha E+\beta)\frac{d\Gamma}{dE}dE=-(\alpha E+\beta).$

Hence, $dS/dE=-\alpha$. But we also know that $dS/dE=1/T$, hence $\alpha=-1/T$. From this, the distribution function becomes

$\rho=Ae^{-E/T}.$

This is the Gibbs distribution. It has quite universal validity; in fact, I think it applies to all systems that are in thermodynamic equilibrium.

For a system in which the potential energy is a function only of the positions $q$ and the kinetic energy is a function only of the momenta $p$, this splits into two parts:

$\rho=A\exp\left(-\frac{U(q)}{T}-\frac{K(p)}{T}\right).$

This is the Maxwell distribution. A lengthy, but not particularly difficult derivation (see Landau & Lifshitz) can be used to determine that the average kinetic energy of particles will be $DT/2$, where $D$ is the number of spatial dimensions.

The Maxwell distribution can be used, among other things, to describe collisionless systems, be they stars in a galaxy, or molecules in a gas.

1Paul Bamberg and Shlomo Sternberg: "A course in mathematics for students of physics 2", Cambridge University Press, 1990
2L. D. Landau and E. M. Lifshitz: "Theoretical Physics V: Statistical Physics I", Nauka, 1976
3M. S. Longair: "Theoretical concepts in physics", Cambridge University Press, 1987