Partial Differential Equations and Neural PDEs

From the mathematics of fields to neural networks that learn physicsJune 10th, 2026
mathematicsdeep learningneural networksPDEsphysics simulation

In the previous post we built up Ordinary Differential Equations from the ground up and saw how a neural network can learn the dynamics of a system directly from data. But we ended on a deliberate cliffhanger: ODEs carry a built-in limitation, and it is worth restating precisely before we lift it.

An ODE describes how a single quantity evolves over time. The unknown function — which we wrote \(z(t)\) — depends on exactly one independent variable: time, \(t\). If you know the state at one instant and you know the rule for its rate of change, you can march the entire system forward. This works beautifully when space simply does not matter. A perfectly stirred liquid has the same temperature everywhere, so a single number describes it at each moment.

The canonical example is to picture a metal rod, heated sharply right in the middle. The middle is suddenly hot; the two ends are still cold. The far right end has no idea yet that anything happened — heat has to physically crawl along the rod, cell by cell, before that end warms up at all. The temperature is genuinely different at every point along the rod, and that whole spatial pattern keeps changing as time passes. A single number \(z(t)=34.5\) cannot possibly capture this.

Or think of a speaker switching on in a room. The sound does not arrive everywhere at once. It spreads outward from the speaker as a wave, reaching the near wall before the far wall, with some regions already vibrating while others are still perfectly silent. Again: the quantity we care about — air pressure — has a different value at every point in space, and that spatial pattern evolves over time.

For systems like these, we need an unknown function that depends on both space and time simultaneously. That is the jump from ODEs to Partial Differential Equations (PDEs).

What is a PDE?

The defining part is very simple. The unknown function now has more than one independent variable. Instead of \(z(t)\), we write something like \(u(x, t)\), where \(x\) denotes a position in space and \(t\) denotes time.

Make this concrete with the metal rod. Here \(u(x, t)\) denotes the temperature at position \(x\) along the rod, at time \(t\). Feed it a position and a moment, and it hands you back a single temperature. The whole object \(u\) is no longer a curve through time — it is a field, a value spread across space that also evolves.

Two clarifications matter before we go further, because both will come back later.

First, PDEs do not always involve time. The multiple independent variables can both be spatial. Imagine a flat 2D metal plate whose edges are held at fixed temperatures. You leave it alone for a very long time — long enough that the temperature across the plate has completely stopped changing. Every point has settled into its final value and nothing moves anymore. That fully settled state is described by the Laplace equation:

\(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = 0\)

Without going into too much detail, the Laplace equation describes a situation where the temperature at any point is the average of its neighbors. Therefore time becomes irrelevant as the system is already perfectly balanced.

Second, distinguish a single PDE from a system of PDEs. Everything above involves one equation describing one unknown function. But many physical systems require several equations describing several unknown functions simultaneously — and those unknowns are tangled together, meaning you cannot solve for one without already knowing the others. The Navier-Stokes equations, which describe how fluids like air and water move, are a classic example of this. They involve multiple unknowns — the velocity of the fluid and its pressure — and these two quantities are so intertwined that neither can be solved for independently. You have to solve all the equations together at the same time. For this post we will stay with a single clean PDE, but it is worth knowing that systems used in physics simulation research are often of this coupled kind.

Partial Derivatives

With a single independent variable, there was only one derivative to speak of: \(\frac{dz}{dt}\), the rate of change of the state as time passes. With several variables, you can suddenly ask several completely independent questions about the same function, and that is exactly what a partial derivative is for.

Think of a partial derivative as taking a slice of the full function. Consider \(\frac{\partial u}{\partial t}(x, t)\). This means: freeze \(x\) at one specific location on the rod — say, the exact middle — and then watch only how the temperature there changes as time ticks forward. While you do this, position is not a variable anymore; it is a fixed constant. What you are left with is a plain single-variable function of \(t\), and you differentiate it in exactly the ordinary way you already know.

Now consider the other slice, \(\frac{\partial u}{\partial x}(x, t)\). This means: freeze \(t\) at one specific moment — take a photograph of the rod at, say, exactly two seconds — and look at how the temperature varies as you walk along the rod in that frozen snapshot. Time is now the fixed constant. Again you are left with a plain single-variable function, this time of \(x\), and again you differentiate it the ordinary way.

The curly \(\partial\) symbol is purely a notational reminder that the function has several variables and that, for this particular derivative, all the other variables are being held fixed. A partial derivative is just an ordinary derivative taken on a slice of the full function.

One more thing worth noting. When writing \(\frac{\partial u}{\partial t}\), it is itself a function of both \(x\) and \(t\). The rate at which temperature changes over time is different at every position on the rod, and different at every moment in time. Writing \(\frac{\partial u}{\partial t}(x, t)\) makes this explicit — it means that you are always asking "how fast is temperature changing at this specific position, at this specific moment?" Change either argument and you get a different answer.

The Heat Equation — The Canonical Example

Let us make all of this concrete with the single most important example, the heat equation. Take a metal rod stretching from \(x = 0\) to \(x = 1\), heated sharply in the middle at the starting moment. The unknown \(u(x, t)\) denotes the temperature at position \(x\) at time \(t\). The equation governing how that temperature evolves is:

\(\frac{\partial u}{\partial t}(x, t) = \alpha \cdot \frac{\partial^2 u}{\partial x^2}(x, t)\)

where \(\frac{\partial u}{\partial t}(x, t)\) denotes how fast the temperature is changing over time at a fixed position \(x\), \(\alpha\) denotes the thermal diffusivity (a constant describing how heat spreads through this particular material), \(\frac{\partial^2 u}{\partial x^2}(x, t)\) denotes the spatial curvature of the temperature profile at position \(x\) at time \(t\).

Reading the equation as a sentence - The left side asks: how fast is the temperature at this point changing right now? The right side answers: it depends on the curvature of the temperature profile at that point. So the whole physical content of the heat equation lives in that second spatial derivative.

The second derivative measures how much the value at a point differs from the local average of its two immediate neighbors. Therefore, if you think logically about the physics, the heat equation is saying, that a point's temperature changes over time in proportion to how much it differents from the average temperature of it's neighbours. If the point is hotter than its neighbours, then according to the laws of thermodynamics, heat will flow out of it and into the cooler neighbors, so its temperature will drop. If the point is cooler than its neighbors, then heat will flow into it and its temperature will rise. If the point is exactly the same as its neighbors, then there is no difference to drive any flow at all (Laplace's equation), so the temperature at that point does not change at all.

We can make this exact with a finite difference approximation:

\(\frac{\partial^2 u}{\partial x^2} \approx \frac{u(x+h, t) + u(x-h, t) - 2u(x, t)}{h^2}\)

where \(\frac{\partial^2 u}{\partial x^2}\) denotes the spatial curvature being approximated, \(u(x+h, t)\) denotes the temperature at the right neighbor a small distance away, \(u(x-h, t)\) denotes the temperature at the left neighbor the same distance away, \(u(x, t)\) denotes the temperature at the point itself, \(h\) denotes the small distance to each neighbor, and \(t\) denotes time. Look at the numerator: \(u(x+h, t) + u(x-h, t)\) is the sum of both neighbors, and subtracting \(2u(x, t)\) is exactly the same as comparing the point twice against the combined total of its neighbors.

The magnitude of the curvature matters, not just its sign. A sharper temperature peak has a larger second derivative and therefore flows heat faster. A gentle, rolling hill in the temperature profile barely changes from one moment to the next. So the heat equation essentially smooths things out over time proportionally to the second derivative. Sharp differences get ironed flat. Peaks sink, valleys fill, and given long enough the whole rod drifts toward a single uniform temperature.

Initial and Boundary Conditions

In the ODE post we saw that an equation alone is not enough to pin down a unique solution — you also need an initial condition, a starting point. PDEs need two kinds of constraint, not just one.

The first is the initial condition, which specifies the complete spatial profile of the field at the single starting moment \(t = 0\). For the rod, a concrete choice might be:

\(u(x, 0) = \sin(\pi x)\)

where \(u(x, 0)\) denotes the temperature at position \(x\) at the starting time \(t = 0\), and \(\sin(\pi x)\) describes the particular starting shape — a smooth bump that is zero at both ends and rises to a maximum in the middle, so the rod begins hot in the centre and cold at the edges. The crucial point is that this is not just one number, the way an ODE's initial condition was. It is the entire spatial picture at the starting instant — a temperature specified at every single position along the rod at once.

The second is the boundary condition, which specifies what happens at the spatial edges of the domain — here \(x = 0\) and \(x = 1\) — for all values of time. A concrete choice is:

\(u(0, t) = 0 \quad \text{and} \quad u(1, t) = 0\)

where \(u(0, t)\) denotes the temperature at the left end of the rod at time \(t\), and \(u(1, t)\) denotes the temperature at the right end at time \(t\). Both ends are clamped to \(0^\circ\text{C}\) for all time — as if each end were permanently held in a bath of ice water. Without this, the equation simply does not know what to do once heat reaches the end of the rod: should the heat pile up there, escape, or bounce back? The boundary condition is what answers that question.

Solving PDEs Numerically

Essentially, if you pick one specific point on the rod, and focus only on this, the PDE at that single point is just an ODE. The temperature there is a single value changing over time (e.g. 20°C dropping towards 0°C), the heat equation tells you its rate of change, and that rate of change happens to depend on the neighboring points — but it is still, in the end, just a number. And a number changing over time is exactly what the Euler method handles.

So we discretize. Chop the rod into evenly spaced grid points \(x_0, x_1, x_2, \ldots, x_N\) separated by a fixed spacing \(\Delta x\). Chop time into small steps of size \(\Delta t\). And write \(u_i(t)\) to denote the temperature at grid point \(x_i\) at time \(t\). Now apply Euler to a single grid point, exactly as done for ODEs:

\(u_i(t + \Delta t) = u_i(t) + \Delta t \cdot \alpha \cdot \frac{\partial^2 u}{\partial x^2}(x_i, t)\)

where \(u_i(t + \Delta t)\) denotes the temperature at grid point \(x_i\) at the next time step, \(u_i(t)\) denotes its temperature now, \(\Delta t\) denotes the size of the time step, \(\alpha\) denotes the thermal diffusivity, and \(\frac{\partial^2 u}{\partial x^2}(x_i, t)\) denotes the spatial curvature at that grid point. This is the exact same structure as the ODE Euler formula from the previous post: current value, plus step size, times rate of change. The only genuinely new question is how to compute that spatial curvature from the grid values we have.

Let us derive it from scratch in two steps. First, the first derivative. A derivative is just slope — rise over run — so the slope between a grid point and its right neighbor is:

\(\frac{\partial u}{\partial x}(x_i, t) \approx \frac{u_{i+1}(t) - u_i(t)}{\Delta x}\)

where \(\frac{\partial u}{\partial x}(x_i, t)\) denotes the slope of the temperature profile at grid point \(x_i\), \(u_{i+1}(t)\) denotes the temperature at the right neighbor, \(u_i(t)\) denotes the temperature at the point itself, and \(\Delta x\) denotes the grid spacing (the run). That is just rise over run between two adjacent beads.

Now the second derivative is the derivative of this slope — how fast the slope itself is changing as you move along the rod. To measure a change in slope, you need two slopes: the slope just to the right of \(x_i\), which is \(\frac{u_{i+1}(t) - u_i(t)}{\Delta x}\), and the slope just to the left of \(x_i\), which is \(\frac{u_i(t) - u_{i-1}(t)}{\Delta x}\). The second derivative is the rate of change between those two slopes — their difference, divided once more by \(\Delta x\). Subtracting the left slope from the right slope and dividing through, the \(u_i(t)\) terms collect together and you are left with:

\(\frac{\partial^2 u}{\partial x^2}(x_i, t) \approx \frac{u_{i+1}(t) - 2u_i(t) + u_{i-1}(t)}{\Delta x^2}\)

where \(\frac{\partial^2 u}{\partial x^2}(x_i, t)\) denotes the spatial curvature at grid point \(x_i\), \(u_{i+1}(t)\) denotes the right neighbor's temperature, \(u_i(t)\) denotes the point's own temperature, \(u_{i-1}(t)\) denotes the left neighbor's temperature, and \(\Delta x^2\) denotes the grid spacing squared. This is the exact same shape as the finite-difference formula from the heat-equation section — the point compared against the combined total of both neighbors, never each one separately.

There is a clean way to feel what this formula is doing. Imagine walking along the rod, the temperature profile as the ground beneath your feet. The first derivative tells you how steep the ground is right now — how hard you are climbing or descending. The second derivative tells you whether that steepness is increasing or decreasing as you walk — whether the slope under your feet is getting steeper or flattening out (I know this is a cheesy AI explanation, but I think it really gets the point across). The formula above is simply what "derivative of the derivative" becomes when you only have discrete grid points to work with.

Plug the curvature back into the Euler update, applied across the whole grid at once, and you get the complete numerical scheme:

\(u(t + \Delta t) = u(t) + \Delta t \cdot \alpha \cdot \frac{u_{i+1}(t) - 2u_i(t) + u_{i-1}(t)}{\Delta x^2}\)

where \(u(t)\) denotes the full vector of temperatures at all grid points at the current time step, \(u(t + \Delta t)\) denotes the full vector of temperatures at the next time step, \(\Delta t\) denotes the time-step size, \(\alpha\) denotes the thermal diffusivity, \(u_{i+1}(t)\) denotes each point's right neighbor, \(u_i(t)\) denotes each point itself, \(u_{i-1}(t)\) denotes each point's left neighbor, and \(\Delta x^2\) denotes the grid spacing squared.

Euler only ever steps in the time direction. Space is not something you march through. You read the neighboring grid values to compute the rate of change at the current instant — that is what the spatial fraction is doing — and then Euler nudges the entire field forward by one small step in time. The structure is identical to the ODE case from the previous post; You are essentially computing an ODE for each grid point, but the rate of change for each point depends on the values of its neighbors. In an ideal world, you'd have to compute infinitly many ODEs. This is of course impossible, however, you try to approximate it with a finite grid and try to keep the deltas as close to zero as possible.

The heat equation, one Euler step at a time

step 0 · t = 0.0000
01x = 0x = 1

Hover the rod to inspect a grid point — see why it moves the way it does.

Pause and hover a point to compare it against the average of its two neighbours. The sign of that comparison is exactly the sign of the curvature ∂²u/∂x² — and that is what decides whether the point warms or cools.
u(xi, t) — temperature at this grid point, over time
Hover a point on the rod to plot how its temperature changes over time.

Neural PDEs — Learning the Physics from Data

Now we can lift this into the world of neural networks, and the parallel to Neural ODEs is almost exact. In a Neural ODE, the rate-of-change function \(f(z(t), t)\) was unknown, so we replaced it with a neural network \(F_\theta(z(t), t)\) that learned the dynamics from data. In a Neural PDE, we do precisely the same thing — but the input to the network is no longer a single number or a small state vector. It is the entire spatial field.

\(u(t + \Delta t) = u(t) + \Delta t \cdot F_\theta(u(t))\)

where \(u(t)\) denotes the full spatial grid of values at the current time step — a vector holding the temperature at every single grid point simultaneously — \(F_\theta\) denotes a neural network with learnable parameters \(\theta\), \(u(t + \Delta t)\) denotes the full spatial grid at the next time step, and \(\Delta t\) denotes the time-step size. The network receives the entire current spatial snapshot and, in a single forward pass, outputs the rate of change at every grid point at once.

What has this network actually learned? In the hand-written numerical scheme above, we computed the spatial operator analytically — we knew the heat equation, so we knew to compare each point against its neighbors. The neural network has instead approximated that operator from data. If you trained it on recordings of heat diffusing along a rod, it will have implicitly discovered that the rate of change at each point depends on how that point's temperature compares to its neighbors. It has never been shown the heat equation formula. It does not "know" the equation in any symbolic sense. But it has learned to behave as if it does — to produce the same smoothing dynamics the real equation produces. Therefore, the good thing about working with neural networks is that they will do the hard work of solving (or at least simulating) the PDE for you, without you having to derive it by hand. However, it is still a good thing to understand the underlying concepts of PDEs, which is why I went through the heat equation in such detail, even though it's just a single simple example.

The heat equation is only one PDE among many, and the thing that gives it its particular physical character is the spatial operator sitting on the right-hand side. Change that operator and you change the physics entirely. Different PDEs pair different spatial operators with different time derivatives, and the resulting behavior can be worlds apart.