How To Split Specular And Diffuse In Real Images

File this under things that people think is harder than it actually is. One thing I’ve got set up at home is the ability to use polarized light to split out the diffuse and specular components from real images. Artists spend lots of time looking at photo reference, but splitting the specular and diffuse really helps you understand a material. In the title image, the left is the diffuse light only, the middle is diffuse and specular light, and the right is specular light only. Btw, I’m not the expert here, but this is how I understand it.

1. Original Image:

2. Diffuse Only:

3. Specular Only:

If you want to do this yourself, you are going to need:

1. A camera that lets you shoot in manual mode. I.e. an SLR.
2. A remote shutter control so you can take multiple shots without touching the camera.
3. A light source. I use a lamp from Ikea.
4. Some polarizing film. I got mine from You want the “Fully Laminated Linear Polarizer Sheets”. Make sure you get a linear polarizer, not a circular polarizer.

At the most basic level, in computer graphics we assume that objects have both diffuse and specular reflectance. Here is some light hitting a surface with a nearby camera. By the way, this model is a huge over-simplification.

Some of the light that hits the surface will skip off the edge. If we have a white light and a blue surface, the light will remain white. We call this specular.

Some other light will be absorbed by the surface, some electrons will get excited, and a new photon will be emitted in a random direction. If we have a white light and a blue surface, this outgoing light will be blue. We call this diffuse.

Hopefully, you already know that. If you didn’t, you do now. The cool thing is that light can be polarized. You can learn more about it from your good friend wikipedia. Additionally, specular light retains its incoming polarization but diffuse light does not. Since diffuse light is retransmitted in random directions with random polarization, we can say that it is unpolarized.

Here is the setup.

It’s pretty basic. I have a camera with everything set to full manual mode. That means:

  1. Manual Exposure
  2. Manual Aperture
  3. Manual White Balance
  4. Manual ISO
  5. Auto Focus Off
  6. Make sure the color profile is sRGB instead of Adobe 98

Also, you see that I have a remote shutter release. That way I can take multiple shots without touching the camera. If you have to touch the camera to take multiple shots, I guarantee you will have misalignment. And we have a light. Let’s take a closer look at that light.

The light is just a halogen lamp from Ikea, but with a linear polarizer in front of it. I attached the polarizer with electrical tape because I’m a classy guy. As a warning, halogen lamps get pretty hot, so my polarizer has melted and warped a bit.

What we’re going to do is take two shots. But I’m going to hold a polarizer in front of the camera. Note that we have polarizers both in front of the camera and the light source. If you think I came to LA to be a video game programmer, you’re wrong. It’s just how I pay the rent while I pursue my dream of being a hand model.

Notice the orange tape on the polarizer? I’m now going to rotate the polarizer 90 degrees and take another shot. Check out how the orange tape has moved.

Here is what our first shot looks like. If our alignment is perfect, then we should have no specular in it. Of course, we usually will have a little because it’s impossible by hand to get our polarizers to align perfectly.

How does it work? Polarized light is hitting the surface. The specular light bounces off the surface and heads towards the camera. It hits the polarizer, which is aligned perpendicular to the polarization of the light, so it all gets absorbed by the polarizer. Meanwhile, the diffuse light that gets absorbed and retransmitted by the surface is unpolarized. The polarizer then absorbs half of that diffuse light, and the rest hits the camera. So that image has 50% of the diffuse and 0% of the specular. Now for the second image.

In this guy, the polarization of the specular light coming towards the camera is aligned with the polarizer, so all that light goes through. Meanwhile, the diffuse light is still unpolarized, so half of it gets absorbed. This image has 50% of the diffuse, and 100% of the specular.

If the first image is image A and the second is image B, the diffuse image is 2*A and the specular image is B-A. Of course, these images are stored with the sRGB profile. So here is the shader code to compare the two images and separate them out, and store the result as an sRGB image. As always, this code is not actually tested.

float LinearToSrgb(float val)
   float ret;
   if (val <= 0.0)
      ret = 0.0f;
   else if (val <= 0.0031308f)
      ret = 12.92f*val;
   else if (val <= 1.0f)
      ret = (pow(val, 0.41666)*1.055f)-0.055f;
      ret = 1.0f;
   return ret;

float SrgbToLinear(float val)
   float ret;
   if (val <= 0.0f)
      ret = 0;
   else if (val <= 0.04045f)
      ret = val / 12.92f;
   else if (val <= 1.0f)
      ret = pow((val + 0.055f)/1.055f,2.4f);
      ret = 1.0f;
   return ret;

int g_bSpecOrDiff;

float4 ps_main( float2 texCoord  : TEXCOORD0 ) : COLOR
   float3 srcA = tex2D(Texture0, texCoord ).rgb;
   float3 srcB = tex2D(Texture1, texCoord ).rgb;
   float3 linA = SrgbToLinear(srcA);
   float3 linB = SrgbToLinear(srcB);
   float3 linDiff = linA*2;
   float3 linSpec = linB-linA;
   float3 texDiff = LinearToSrgb(linDiff);
   float3 texSpec = LinearToSrgb(linSpec);
   float3 ret = g_bSpecOrDiff ? texDiff : texSpec;
   return ret;

If you do that right, your diffuse image should look like so:

And your specular image should look like this:

As you can tell, this process doesn’t work perfectly.

  1. You will usually have some extra specular in your diffuse-only image.
  2. Some objects don’t behave nicely. In the specular-only image, the handle looks a little blue. I think that the handle might be dialectric which screws up the frequencies a bit.
  3. If your specular or diffuse image is clamping at 1.0, it will mess up those pixels. It’s better to underexpose these image than to overexpose.
  4. Make sure you use a remote shutter or you will have alignment problems.

Hopefully that wasn’t too much trouble, and you can do specular/diffuse separation yourself. Whew, long post.

25 Responses to “How To Split Specular And Diffuse In Real Images”

  1. That’s pretty cool.
    Does this approach work with faces too? It could be a cheap technique to generate face textures, especially if you can take multiple pictures from multiple angles.
    Perhaps a rig which can rotate around the subject, and automatically rotate the filter.
    (which would all be a lot more complicated obviously)

  2. Yes, specular facial “texture” can be extracted using light polarization.

    This is the way Image Metrics can reproduce actors face with a high fidelity: (in Section Separating Subsurface and Specular Reflection).

  3. that works great. Personaly I’ just use a double circular filter and it does work as well (maybe not just as accurately though). since you can adjust the angle of the polarize filter, and every light got a polarity, you can find a way to get to layer and subtract the difference between them. (that’s also a paper from Debevec).
    but your approach is probably better and way more accurate ! because I don’t get perfect result, and it is kind of hard to set the circular filter with always the same settings !
    I did it for skin and separate oil layer from diffuse, works nice though !

  4. yeah well not subtract exactly but here is the paper :

  5. Francoisgfx: Circular polarizers work in most cases. The problem with them is that they don’t work as well at grazing angles. But if you’re shots are mostly straight on, they should be fine.

    Sander: Yep, it works fine for faces. But you need to have a very fast way of taking multiple exposures. In my setup, there are at least several seconds between shots. These light rigs are much faster, which you need because you can’t keep your head perfectly steady.

  6. Very clever thanks John.
    Could you please explain what the conversion from sRGB to linear means, why not use straight the color values?


  7. Having searched a bit through google, I assume that you wanted to make the computations in linear gamma corrected space.
    But why didnt you use then just the
    to convert to and from linear space as suggested here
    and in a previous entry of yours about tonemapping


  8. Hi Dimi,

    Yes, the goal is to do work in linear instead of gamma space. Take a look at this other post: That has the visual comparison between sRGB and Gamma 2.2. They are pretty close, but the sRGB curve more accurately maps to what your camera is doing. Of course, most camera’s response curves deviate quite a bit from that curve.

  9. [...] Des couches: How To Split Specular And Diffuse In Real Images [...]

  10. ha- this rules John
    Love it!

    I have an approximation to the original filmic data-fit that I gave you. It doesn’t hold the curve over all input values – but it’s really close (the R^2 error is 0.9998 when the dynamic range of the input is 0-2). Anyway, bang on it, see what you think:

    // apply filmic / gamma(1/2.2) transform
    float3 x = max(0,LinearColor-0.004);
    ColorOut = (x.rgb / ((1.0351 * x.rgb) + 0.16));

    Cheers big man,

  11. Hey Jim, long time no see. Coming to Siggraph?

    I looked at the curve, and it doesn’t really work for me. The bottom end doesn’t look right. Basically, you have a function x/(Ax+B). For the Reinhard function, R(x)=x/(x+1). If you multiply them out, your function turns into (1/A)R((A/B)x). In other words, your formula is equivalent to the following 3 steps.

    1. Exposure Bias: Multiply incoming intensity by A/B=6.469. That’s the same as increasing your exposure by about 2.7 F-Stops.
    2. Reinhard With No Gamma: x/(x+1)
    3. White Point: Multiply the result by 1/A=0.966.

    One more interesting sidenote: the formula doesn’t ever get to 1.0. If you set X to a million, you still converge to 0.966. You will never go above code 246 out of 255. That’s probably why it doesn’t hold up after 2.0.

    How are you computing the error metric? I assume you are doing even steps along X and calculating (Expected-Actual)^2? That would under-represent the results at the bottom end. Try doing the same metric with exponential steps along X and using the error metric (1-Expected/Actual)^2. So yes, I’m sticking to the original one, which is still awesome. (-:


  12. meh, I’ve just been trying to shave a few instructions off … and get down to 4 cycles/pixel. The polynomial ‘ratio’ approximation is 5 cycles/pix – (assuming perfect rcp paring).

    Over the weekend, I also found the top-out at 0.966.

    I modified my testing harness to move linearly over the 0-1 domain, then log steps (up to 16777216). I also modified to error metric, as you suggested – to keep an eye on the low end.

    Some notes on the original approximation:

    The s-curve of HP’s shader requires *either* a ratio of 2 polynomials, or a 3rd degree spline. (His function has 2 inflection points). The ratio of polynomials has the same ‘mechanism of action’ as Reinhard: the denominator exerts influence as x increases. So, one polynomial creates the bottom curve, and the other (inverse-ish) polynomial creates the top curve — and limit at 1.0

    A cubic should be able to capture this easily, but I haven’t looked at it yet– I’ve been trying to force simpler approximations to fit ;-)

    Speaking of a cubic eval..
    I don’t think I ever gave you the vector/matrix form:
    (this is just a re-form of the polynomial, a generalization that works well for extending to 3rd order)

    //$$note: input color ‘x’ is assumed to be in linear space

    float3 x = max(0,LinearColor-0.004);

    float3 xx = x*x; // power-vec x^2

    // constants:
    const float3×3 powerMat=float3×3( xx.r,x.r,1,
    xx.b,x.b,1) ;
    const vector coffA = {6.2f,0.5f,0.00f};
    const vector coffB = {6.2f,1.7f,0.06f};

    // evaluate filmic, apply gamma(2.2)

    Color = mul(powerMat,coffA)/(mul(powerMat,coffB));

    Like the polynomial evaluation, this vec/mat form evaluates in 5 cycles (on r6xx and above). Also: this form may be more optimal on ‘other’ hardware that prefers vector operations over scalar.

    This form can be extended to cubic evaluation – with minimal additional cost (it’s just float4 vectors, rather than float3..and the mad’s go from .xyz to .xyzw). As I mentioned above, a cubic *should* be able to approximate the entire func — with 2 inflection points. This evaluation would get rid of the pesky divide (scalar rcp’s)… and the approximation would become *completely* vector/matrix (dots or mads)

    Just some thoughts, sorry for filling up your comments, I’ll start my own blog … and we can ping-pong ideas.

    Cheers john!

  13. Hi Jim,

    Np, that’s what the comments are for. (-: Actually, the real reason you need to do error sampling in log space is because of the lower end of the curve. Suppose that we are in 2.2 gamma space. If a shot has a full histogram, half of the image will be below 128 and half will be above. But the linear intensity value that gets you to 128 is 0.218. If you estimate error linearly using input values between 0 and 1.0, you will significantly under-represent the first half of your perceptible range.

    It gets even worse the lower you go. The difference between .95 and 1.0, in gamma 2.2 space, is the difference between 249 and 255. The difference between 0 and 0.05 is the difference between 0 and 66. That’s why you really need to sample your error values in log-space, starting from the very bottom.

    Btw, it’s ok to top out above 1.0. That just means that your white point clamps a little earlier. Also, you don’t need to sample all the way to infinity. HPs original curve hit white around 67, so you want to stop your range around there.

    For performance, our post is on SPUs, so a recip() function is 4 instructions. When we write code, we essentially write all our code in scalar math, and do 4 pixels at a time. I think it ends up being about 12 cycles? But that’s 12 cycles on 3.2 Ghz chip, with 6 cores, doing 4 pixels at a time. We have to remember that graphics cards are clocked slower (I think RSX is 600Mhz). For SPUs, it’s plenty fast.


  14. “If you estimate error linearly using input values between 0 and 1.0, you will significantly under-represent the first half of your perceptible range.”

    Really good point! The metric should be *perceptual error* (aka, the error that you can perceive). And- perceptual space == log space. Got it.

  15. “it’s plenty fast”

    No. Such. Thing.

  16. Touche, salesman.

  17. ok, give this a run

    float3 x = max(0,LinearColor-0.004);
    Color = x/(0.1609134f+1.0165501f*x-0.0004327f*x*x);

    It compiles to mad/mul/mad/mad + the 3 scalar rcps
    This is a solid 4.0 cycles on my r8xx

    On the bottom end, this one comes out *slower*, giving more lows. Midway though, at code 116, the first approx is 0.404893 and this one is 0.404587 (where 1.0 is 255). On the high end, at 4.595 we have 0.9598 vs 0.9527

    It seems to hold up, and the expense reduction is statistically significant.

    No joking about the ‘fast’ thing — remember SSS? Each iteration can off a fraction. With enough iterations, it’s worthwhile (as we’ve seen)

    Let me know what you think. I’ll know I’ve succeeded when you switch approximations ;-)


  18. It seems a little unsafe. What do you do when x=1000? In real images, that does actually happen. In the shot of my cello in my GDC talk, the sky is literally 1000x brighter than the inside of my cello case, so we do need to account for really bright values.

    This equation is basically x/(Ax^2+Bx+C), but keep in mind that the old function reduces to (x+A)/(Bx^2+Cx+D)+E. The only difference in cost is actually adding A, because you can get E for free when you do the multiply-add with the denominator and E. So that adds three cycles (R,G,B) per 4 pixels. With 720*1280=921600 pixels, that difference costs 691k cycles. On the cell at 3.2Ghz, that costs or 0.216ms on 1 SPU, or 0.034ms spread over 6 SPUs.

  19. Not safe? I’m not following.

    your quote:

    “Btw, it’s ok to top out above 1.0. That just means that your white point clamps a little earlier. Also, you don’t need to sample all the way to infinity. HPs original curve hit white around 67, so you want to stop your range around there.”

    It doesn’t matter – I’m bored of this trajectory anyway. Pulling cycles off the polynomial solution feels like a dead-end.

    The vec/mat space (as I touched on above) is interesting. Link:

    Rolling all of the standard dials into one v*M is elegant. Concatenating the tonemap and gamma(2.2) into the evaluation sounds like it might be worthwhile. Not trivial though.

    something to think about


  20. Hey Jim,

    You’re right, that was a little cryptic. By unsafe, I mean that at the end it converges to 0. So that’s bad, because x goes up, and then comes down. There is also a point where the denominator is 0, so it would have a divide by zero at one very special value.

    With the original filmic curve with a white point, you have to do a saturate to keep the value between 0 and 255. But with with a curve that comes back down, you would need to add an additional clamp before it comes back down, which would take away the speed gains.

    Cool link, especially the Hue rotation that preserves luminance! (-:

  21. [...] OOoooo, nice website. This is particularly nice! Thanks for that Ruud! Reply With Quote + Reply to [...]

  22. Great post! I will use this technique in my Computer Graphics course next semester to show students the diffuse and specular components of different objects.

    One small correction: in both images you capture, the diffuse light that recieves the camera is 38% of the original diffuse light not 50%. Ideal linear polarizers let through 50%, but they dont exist in real life. The ones you used from have a HN38 factor meaning that they let through 38% of all unpolarized light they receieve.

  23. Hi Alejandro, good point. As I understand it, the filter basically acts partially as a neutral density filter. In other words, the filter allows 76% of the specular light and 38% of the diffuse light, correct? What matters of course is the ratio between the specular and diffuse light (2:1), but that’s still a good point.


  24. [...] Habble has written a very interesting article showing how to split specular and diffuse in real images. He later wrote another article showing that specular often contributes more than we expect for [...]

  25. Which is brighter? Scattered silver glitter or glossy paint that isn’t silver?