Difference between revisions of "Satellite image data concepts"

From Wonkpedia
Jump to navigation Jump to search
Line 179: Line 179:
 
==== Atmospheric correction ====
 
==== Atmospheric correction ====
   
Over long distances, even in clear weather, the atmosphere scatters and absorbs light. This is why distant hills are low-contrast and blueish (blue light is scattered more). What a satellite actually measures is called top-of-atmosphere radiance, or TOA. This is a measurement of nothing more than the amount of energy received per second, per pixel, per band. It can be measured pretty objectively. However, it’s often not what you want. For one thing, it’s too blue. For another, the amount of blueness and related effects will vary semi-randomly with atmospheric conditions (humidity, maybe dust storms or wildfire smoke, etc.) and predictably with season.
+
Over long distances, even in clear weather, the atmosphere scatters and absorbs light. This is why distant hills are low-contrast and blueish (blue light is scattered more). What a satellite actually measures is called top-of-atmosphere radiance, or TOA. This is a measurement of nothing more than the amount of energy received per second, per pixel, per band. It can be measured pretty objectively. However, it’s often not what you want. For one thing, it’s too blue. For another, the amount of blueness and related effects will vary semi-randomly with atmospheric conditions (humidity, maybe dust storms or wildfire smoke, etc.) and predictably with season (sun distance and angle).
   
Therefore, a reasonable desire is to basically normalize the sun and remove the effects of the atmosphere. What we’re trying to model here is called surface reflectance (SR). The main issue is that we don’t know the true state of the atmosphere at the moment the image was acquired. The best we can do is to model it and subtract it out. This is one of ''the'' problems in remote sensing, and you could earn a PhD by improving [https://en.wikipedia.org/wiki/Atmospheric_radiative_transfer_codes#Table_of_models one of the major models] by a few percent.
+
Therefore, a reasonable desire is to basically normalize the sun and remove the effects of the atmosphere. What we’re trying to get to here is called surface reflectance (SR). The main issue is that we don’t know the true state of the atmosphere at the moment the image was acquired. The best we can do is to model it and subtract it out. This is one of ''the'' problems in remote sensing, and you could earn a PhD by improving [https://en.wikipedia.org/wiki/Atmospheric_radiative_transfer_codes#Table_of_models one of the major models] by a few percent.
   
The good news is there’s a brutally simple method that works pretty well most of the time. Dark object subtraction means assuming that the darkest pixel in the image should be pure black. Therefore, if you subtract out however much blue (and green, and so on) signal is present in the darkest pixel, you will have canceled out all the haze. It’s annoying how well this works considering how basic it is. It’s roughly equivalent to the auto-adjust tool in an image editor like Photoshop, or, to be a little more exact, a little like using the eyedropper in the Levels tool to set the black point to the darkest pixel.
+
The good news is there’s a brutally simple method that works pretty well most of the time. Dark object subtraction means assuming that the darkest pixel in the image should be pure black. Therefore, if you subtract out however much blue (and green, and so on) signal is present in the darkest pixel, you will have canceled out all the haze. It’s annoying how well this works considering how basic it is. It’s roughly equivalent to the automatic contrast adjustment tool in an image editor like Photoshop, or, to be a little more exact, like using the eyedropper in the Levels tool to set the black point to the darkest pixel.
   
 
Correction to reflectance may or may not attempt to correct for terrain effects (i.e., relighting the scene). Different pipelines have different conventions for how far to correct or what to call different kinds of correction.
 
Correction to reflectance may or may not attempt to correct for terrain effects (i.e., relighting the scene). Different pipelines have different conventions for how far to correct or what to call different kinds of correction.

Revision as of 22:17, 20 July 2022

This page provides an organized list of ideas useful for understanding image data from satellites. It is intended for people with some background or practical knowledge who want to fill in the gaps. Since many concepts are intrinsically cross-cutting, they can’t be forced into a single perfectly hierarchical taxonomy; the goal is merely to keep related ideas reasonably near each other.

We might divide up the kinds of knowledge it’s useful to have when working with satellite data like this:

Layers of abstraction in remote sensing knowledge
Practice This page Theory
Learning how to answer questions by actually using data in Photoshop, QGIS, numpy, etc. Learning technical vocabulary and concepts that apply across sources Learning rigorously defined principles based in physics, geostatistics, etc.

All of these kinds of knowledge are important to an OSINT practitioner. This page only covers the middle range – ideas that are more abstract than what you can learn from the pixels themselves, but less abstract than what you would get in a higher-level college course.

Within those bounds, the organizational arc here is broadly from the more abstract (orbits) through the relatively concrete (how sensors work) to the practical (what a geotiff is).

Orbits and pointing

As an example of a typical optical Earth observation orbit, let’s take Landsat 9’s parameters from Wikipedia:

  • Regime: Sun-synchronous orbit. This means the orbit is designed to always pass overhead at about the same local solar time. Put another way, any two Landsat 9 images of a given spot at a given time of year will have the same angle of sunlight on the surface, and the same angle between the surface and the sensor. Specifically, it always crosses the equator on its southbound half-orbit at 10:00 (and, therefore, on its northbound half-orbit at 22:00). This mid-morning window is the sweet spot for most optical imaging purposes. In most climates where cumulus clouds are common, they generally form around midday as the mixed layer rises. It’s also claimed that this is the heritage of cold war IMINT workers wanting shadows to estimate structure heights. (If you image around noon, you get places with vertical shadows in the tropics. This gives you depth perception problems, like you get walking though brush with a headlamp instead of a hand-held flashlight. Citation needed, though.) Virtually all commercial satellite imagery that you see on commercial maps has shadows that point west and away from the equator – in fact, as of 2022, this is so consistent that if you see a shadow pointing a different direction, it’s a good hint that the imagery is actually aerial (taken from a plane/UAV/balloon inside the atmosphere), not satellite.
  • Altitude: 705 km (438 mi). This is basically chosen to be as close to the surface as reasonably possible without grazing the atmosphere enough to perturb the orbit. It is substantially higher than the International Space Station, for example, but ISS has to constantly boost itself back up and that’s expensive. (ISS does occasionally underfly imaging satellites.) For comparison, if Earth were the size of a 30 cm (12 inch) desktop globe, Landsat 9’s orbit would be at 17 mm (2/3 inch) – grazing your knuckles if you held the globe like a basketball. (Developing some intuition about this relative size can help understand the practicalities of things like off-nadir imaging.)
  • Inclination: 98.2°. This is the angle at which the satellite crosses the equator. It makes the orbit slightly retrograde, which is part of the equation for staying sun-synchronous. A consequence is that although orbits like this one are sometimes called polar in a loose sense, they never exactly cross the pole – Landsat 9 always misses the south pole on its left and the north pole on its right. This leaves two relatively small polar gaps that are never imaged.
  • Period: 99.0 minutes. This is the time it takes to do one full orbit. This is another variable constrained by the requirements of syn-synchrony and the lowest reasonable altitude.
  • Repeat interval: 16 days. Every 16 days, Landsat 9 is in exactly the same spot relative to Earth (± very small deflections due to space weather, micrometeorites, tides, maneuvers to avoid debris, etc.) and takes an image that can be exactly co-registered with the previous cycle’s. Furthermore, pairs (or mini-constellations) like Landsat 8 and 9 or Sentinel-2A and 2B are in identical orbits but half-phased such that, from a data user’s perspective, they act like a single satellite with half the repeat time. (Specifically, 8 days for Landsat 8/9 and 5 days for Sentinel-2A/B.) More or less by definition, constellations are designed to fill in each other’s gaps; for example, the wide-swath, low-resolution MODIS instruments are on a pair of satellites with near-daily coverage, but one mid-morning and the other mid-afternoon.

We used Landsat 9 here because it’s familiar to most people in the industry and is well documented. Other imaging satellites will have different sets of capabilities and constraints. For example, the Landsat series is on-nadir (looking straight down) more than 99% of the time. It only rolls to the side to look away from its ground track for exceptional events, e.g., major volcanic eruptions. But a high-res commercial satellite, e.g., in the Airbus Pléiades or Maxar WorldView constellations, is constantly looking off-nadir. One of these satellites might point its optics in easily half a dozen directions on a given orbit, and would only very rarely happen to look straight down.

Commercial users typically want images that are on-nadir and settle for images less than about 30° off-nadir. Around that angle, atmospheric and terrain correction starts getting hard, tall things are seen from the side as well as from above and block whatever’s behind them (an effect called layover), and the practical utility of imagery falls off for most purposes. But the area within 30° of nadir is quite large: about 400 km or 250 mi wide, according to some light trig.

High-resolution commercial satellites schedule collections in a process called tasking (as in “Tokyo is tasked for tomorrow”). This is in contrast to the survey mode collection used by Landsat, Sentinel, etc., which are essentially always collecting when they’re over land.

Resolutions

Satellite instruments can be thought of as identifying features (a deliberately abstract term) in any of a number of dimensions. The dimension(s) we think of most often is spatial: x and y, or equivalently longitude and latitude or east and north, on Earth’s surface. But a sensor needs a nonzero amount of resolving power in the other dimensions as well in order to be useful.

The idea of resolving power has formal definitions in optics, for example, but here we will be informal and common-sensical about what it means to actually resolve something. In particular, resolution is usually defined in terms of points (in some dimension), but in the real world we only rarely care about points of any kind; we’re usually more interested in objects and patterns.

As an example, imagine we’re looking for a bright white napkin left on a freshly paved asphalt runway. Even if our data is at a resolution of, say, 25 cm, and the napkin is only 10 cm across, we will probably be able to find the napkin because the pixels it overlaps will be noticeably brighter, assuming good radiometric resolution. In this case, we’ve beaten the nominal spatial resolution of the sensor – we haven’t technically resolved the napkin, but we’ve found it, which is what we wanted.

On the other hand, imagine that there are F-16s on the runway, and we want to know whether they’re F-16As or F-16Cs. Unless we have outside information (about markings, say), it’s entirely possible that we can’t tell. The details we need simply aren’t clearly visible from above. Therefore, we cannot determine whether there are F-16As at this airfield – despite the fact that F-16As are much larger than the resolution of the sensor. This seems painfully obvious when spelled out, but people who should know better routinely make versions of this mistake when working on real questions.

These two examples with spatial resolution illustrate that you can’t think of resolution (of any kind) as simply the ability to see a thing of a given size. Sometimes you’ll have better data than you’d think from looking at the number alone and sometimes you’ll have worse. Be skeptical of blanket statements that you definitely can or can’t see x at resolution y. Often, it’s really a situation where you can see some % of xs at resolution y under conditions z, and it’s just a question of whether trying is worth the time.

Resolutions are in a multi-way tradeoff in sensor design. As one of several important factors, increasing each kind of resolution multiplies data volumes, and getting data from a satellite to the ground is expensive and sometimes physically limited. In a sense, you can’t get satellite data that does everything (is super sharp and hyperspectral and …) for the same reason you can’t get a blender that’s also a toaster and a dishwasher. The laws of physics might not preclude it, but the constraints of sensible engineering absolutely do. What you see in practice are satellites that push for some kinds of resolution at the expense of others. Knowing how to mix and match to answer a particular question is a valuable skill.

Spatial

If someone says “this is a high-resolution sensor” we understand this by default to mean spatial resolution. This is also called ground sample distance (GSD) or ground resolved distance (GRD), and is the dimensions of the pixels of the data. (Theoretically, you could oversample your data and have pixels smaller than what’s actually resolvable, but that’s not an urgent consideration here.) We usually assume that the pixels are square or close enough, so you see this given as a single length dimension: 50 cm, 15 m, etc.

There’s some sleight of hand with definitions here. If we think about standard optical instruments, which are basically telescopes with CCDs, they do not have an intrinsic ground sample distance. They have an intrinsic angular resolution – a fraction of the arc that each pixel covers. This only becomes a distance on Earth’s surface if we assume the sensor is pointed at Earth at a given distance and angle. The nominal resolutions of optical satellite instruments are given for the altitude of the satellite (which can change) and looking on nadir (straight down). That’s a best case. When looking to the side, at rough terrain, the pixels can cover larger areas, inconsistent areas from one part of the image to another, and areas that are not square. Some of these problems get better and others get worse after orthorectification (see below).

This is why it pays to be very cautious about measuring things based purely on pixel-counting, especially in imagery that’s been through some proprietary or undocumented processing pipeline. It’s more reliable to (1) have a very clear sense of what scale distortions are likely present in the image, and (2) reference measurements to objects of safely assumed dimensions.

An old-school IMINT way to measure what spatial resolution means in practice is the National Imagery Interpretability Rating Scale (NIIRS).

An often overlooked consideration on spatial resolution is that pixel area is the square of pixel side length, and it’s what matters most. (We’ll assume square pixels for this discussion.) If you consider a square meter of ground, you can envision it covered by exactly 1 pixel at 1 m GSD. At “twice” that GSD, 50 cm, it’s covered by 4 pixels – but 4 is not twice 1. At 25 cm GSD, which sounds like 4× the resolution, it’s covered by 16 pixels, which is far more than 4× as clear. Perceived sharpness, information in a technical sense, and (most importantly) the practical ability to interpret fine details goes up in proportion to pixel count, not as the inverse of pixel edge length. In other words, 10 m imagery is more than 3× as clear as 30 m imagery, all else being equal.

Spectral

Spectral resolution is the ability to distinguish different frequencies (wavelengths) of light or other energy. We often measure it as a number of bands, where bands are like the R, G, and B channels in everyday color imagery. Grayscale imagery has 1 band. RGB imagery has 3. RGB + near infrared (a common combination) has 4. Multispectral sensors on more advanced satellites often have about half a dozen to a dozen bands, typically covering the visible range and then parts of the near to moderate infrared spectrum.

We often measure into the infrared (IR) for three main reasons:

  1. Infrared light is scattered less than visible and especially blue light is by the atmosphere. This allows for more clarity and contrast – basically, better radiometric resolution (see below). Another way of saying this is that IR light cuts through haze.
  2. Healthy plants strongly reflect near infrared (NIR) light. If we could see only slightly deeper shades of red, we’d see trees and grass glowing hot pink. This means infrared is useful for vegetation monitoring (for example, with NDVI), which is useful for agriculture but also for anything that affects plants. You can use infrared to spot subtle tracks and traces on vegetation that might be invisible in ordinary imagery. (For example, you might be able to detect a road under a forest canopy by noting that a line of trees is thriving slightly less than last year.)
  3. Things that are camouflaged in visible light, deliberately or not, are often easily distinguishable in infrared. Specifically, green paint tends to absorb IR (unlike plants) and stand out like a sore thumb. Since everyone knows this now, sophisticated actors no longer assume that you can hide a tank (for example) by painting it green, but you can still find things in infrared that you wouldn’t have in visible. You see more stuff when you have more frequencies available.

For these reasons, and others as well, optical satellites have always been biased toward the IR side of the spectrum.

Many optical sensors have one spatially sharp band with low spectral resolution, typically covering the visible range and some infrared, and multiple bands that are spectrally sharp but spatially coarse. These will be called the panchromatic or pan and (collectively) multispectral bands. They are merged for visualization in a process called pansharpening (see below). Sentinel-2, for example, does not have a pan band, but it collects different bands at different spatial resolutions roughly in proportion to their assumed importance – visible and NIR are 10 m, some other IR bands are 20 m, and then there are some “bonus” atmospheric bands at only 60 m.

Sensors that focus specifically on spectral resolution (sometimes with hundreds of bands) are called hyperspectral.

Here we’ve used optical and infrared wavelengths as examples, but the basic principles are similar for, e.g., radio frequency bands. In general, for any kind of observation, multiple spectral bands help resolve ambiguities in the scene and open up useful avenues for inter-band comparison.

Temporal

Temporal resolution is resolution in time. This is also called revisit time or cadence. As mentioned above, temporal resolution for medium-resolution open data survey-style satellites (Landsat 8 and 9, Sentinel-2A and 2B, Sentinel-1A, and others) is typically around two weeks per satellite or one week per constellation. For weather satellites (with very low spatial resolution) it can be as quick as 30 seconds in certain cases. PlanetScope and many low spatial resolution science satellites are approximately daily.

High-res commercial satellite constellations are a special case, because, as we’ve seen, their collections are based on tasking. This means that if there’s some point that they never have a reason to collect, their actual revisit time might be infinite. If there’s a major geopolitical crisis and every possible image is taken, even from extreme angles, it might be more often than once a day. Realistically, over moderately populated areas of no special interest, it might be once or twice a year; in deserts, it might be multiple years.

Radiometric

Radiometric resolution is often overlooked, but it’s especially interesting to OSINT. It’s essentially bit depth: the number of levels of light (or other energy) that the sensor can distinguish in a given band. Older or cheaper satellites might have a radiometric resolution of 8 or 10 bits; newer and better ones are typically 12 to 14.

High bit depth opens up many possibilities – for example:

  • You can stretch contrast to account for obscurations like haze, thin clouds, and smoke.
  • You can stretch contrast to find extremely faint traces on near-homogeneous backgrounds: wakes on water surfaces, paths on snowfields, offroading by light vehicles. Initial testing suggests Landsat 9 OLI (which has excellent radiometric resolution) can pick up the tracks of single trucks on the Sahara, despite the tracks being made out of sand on sand and much smaller than a single pixel of spatial resolution. It can also pick up bright city lights at night.
  • Band math, such as calculating band ratios or distances in spectral angle, gets more stable and accurate.

In OSINT we usually can’t afford a lot of highest spatial resolution imagery. However, the excellent radiometric resolution of a lot of free data (since it was designed for science) gives us a side route into seeing things that someone hoped would not be noticed.

Radiometric resolution can be increased at the cost of spectral resolution by averaging bands. Under idealizing assumptions, the standard deviation of the noise of an image average is 1/sqrt(n), where n is the number of input images with unit standard deviation noise. (In practice, noise will be positively correlated between the bands of most sensors, so you’ll fall at least somewhat short.)

Another way to look at radiometric resolution is to think about the total signal to noise ratio, or SNR, of the image. Some of the noise is what we usually mean by noise – semi-random grainy or streaky false signals inserted into the image by sensor flaws, cosmic rays, and so on. But some of it will be quantization noise, a.k.a. rounding errors or aliasing: output imprecision due to the inability to represent all possible values of real data. This latter kind of noise is the problem that increases as bit depth goes down. (This is analogous to the idea of talking about effective spatial resolution as a combination of the sampling resolution and the point spread function being sampled. But we’re getting off the main track here.)

Modalities

A sensor’s modality is the form of energy it senses and the general principles it uses to construct useful data. For example, microphones are sensors whose modality is measuring air pressure to record sound, barometers are sensors whose modality is using air pressure to record weather-scale atmospheric events, and everyday cameras are sensors whose modality is measuring visible light to record focused images.

Optical

Here we’ll define the optical domain as anything transmitted by Earth’s atmosphere in the windows between about 300 nm and 3 μm. This includes near ultraviolet (here, “near” means “near visible”, not “almost”), visible, near infrared, and shortwave infrared light, but not thermal infrared. You might also see this range described as, for example, VNIR + SWIR – visible, near infrared, and shortwave infrared. We’ll use Landsat as an example again, since its OLI sensor (on Landsat 8 and 9) is well-known and fairly typical of rich multispectral sensors. Its bands are:

OLI and OLI2 bands[1]
Name Wavelength range in nm (FWHM) Primary uses Visible to human eyes
Coastal/aerosol 435 to 451 Deep blue-violet. Water is very transparent in this band, so it can see into shallows. Also picks up Raleigh scattering from aerosols, helping model atmospheric effects and distinguish clouds v. dust v. smoke. Yes
Blue 452 to 512 For true color. Useful for water. Better SNR than the coastal/aerosol band. Yes
Green 533 to 590 For true color. Chlorophyll (land vegetation, plankton, etc.). Around the peak illumination of the sun. Yes
Red 636 to 673 For true color. Absorbed well by chlorophyll. Shows soil. Yes
NIR (near infrared) 851 to 879 Reflected extremely well by chlorophyll and healthy leaf structures. Often the brightest band. No
SWIR1 (shortwave infrared 1) 1,567 to 1,651 Cuts through thin clouds well. Reflectivity correlates with dust/snow grain size – informative about surface texture. Note that this range in nm is 1.567 to 1.661 μm. No
SWIR2 (shortwave infrared 2) 2,107 to 2,294 Similar to SWIR1; some surfaces are easily distinguished by their differences in SWIR1 v. SWIR2. Flame/embers and lava glow strongly here. No
Pan (panchromatic) 503 to 676 Twice the linear resolution of all the other bands, since its wide bandwidth can integrate more photons at a given noise level. Used for pansharpening. This and the next are given out of spectral order. Yes
Cirrus 1,363 to 1,384 Deliberately not in an atmospheric window – almost entirely absorbed by water vapor in the lower atmosphere, but strongly reflected by high clouds. Allows for better atmospheric correction by spotting thin clouds. No

Band names are semi-standard in the sense that, for example, green will always means some version of visible green. However, exact bandpasses can vary quite a bit between sensors. Intercomparing bands from different sensors on the assumption that they must match will often lead to problems – check the actual numbers, not the names.

Bands can be processed and combined in many, many useful ways. For example, you can run statistics like principal component analysis on a set of bands to find correlations and outliers. You can use band ratios like NDVI, NDWI, or NBR, which index properties like vegetation health, surface moisture, and burn scars. You can treat multispectral values as vectors to be clustered, compared, or decomposed. You can derive a “contra-band” by subtracting some bands out of another band that covers them.

You almost always learn more by comparing bands than from one band alone. Features that are unremarkable in a single grayscale image can become meaningful if you notice that they don’t fit the usual relationship between that band and some other band(s).

True and false color

True color imagery puts red, green, and blue sensed bands in the red, green, and blue bands of the output image. It looks more or less like it would to an astronaut with binoculars. What’s called true color is often not quite, because the sensor bands don’t correspond exactly to the primaries used in standards like sRGB, but the difference is rarely important.

Humans have 30 million years of evolutionary hard-wiring and several decades of individual practice in interpreting true color images, and therefore you should favor true color whenever reasonably possible.

However, often false color is the way to go. This means putting anything but red, green, and blue bands (in that order) in the channels of the image you’re looking at. You might not even use bands directly at all; you might derive indexes or other more processed pseudo-bands. You could pull in data from another modality. Most often, however, people simply choose the bands that are most useful to them and put them in the visible channels in spectral order (i.e., the longest wavelength goes in the red channel and the shortest in blue). For any widely used sensor, a web search should give you a selection of “zoos” demonstrating popular band combinations – for example, here’s one for Landsat 8/9, but you can find dozens of others.

Band combinations are usually given by sensor-specific band numbers: 987 or 9-8-7 means band 9 is in the red channel and so on. (Annoyingly, this means that, e.g., Landsat 8/9 combination 543 and Sentinel-2 combination 843 are basically the same thing despite having different numbers.)

Pansharpening

Many sensors, including virtually all current-generation commercial data at about 1 m or sharper spatial resolution, have a spatially sharp but spectrally coarse panchromatic (pan) band and a set of spatially coarser but spectrally sharper multispectral bands. The nominal spatial resolution of the sensor will be for the pan band alone, and the multispectral bands’ pixels will be (typically) some multiple of 2 larger on an edge. For example, Landsat 8 and 9 have 15 m pan bands and 30 m multispectral bands (2×, linearly). The Pléiades and WorldView constellations have roughly 50 cm pan bands and 2 m multispectral bands (4×). SkySat, unusually, produces imagery (with some preprocessing) at 57 cm pan, 75 cm multispectral (~1.3×).

For visualization purposes, we combine panchromatic and visible data into a single image. As an intuitive model of this process, imagine overlaying a translucent, sharp black-and-white image (the pan band) onto a blurry color image (the RGB bands) of the same scene. You can actually do this quite literally and get a semi-acceptable result, or work harder to get a better result. “Real” automated pansharpening algorithms range from the very basic to the extremely sophisticated.

The point to remember is that most satellite imagery with good spatial resolution is pansharpened, and this creates some artifacts. In particular, when you are zoomed all the way in to 100% (pixel-for-pixel screen resolution), you have actually overzoomed all the color or multispectral information. Any pansharpening algorithm can only estimate a likely distribution of color. It’s like superresolution with neural networks – it may be statistically likely to be correct, it may be perfect in some cases, it may help you interpret what’s there, but it is necessarily a process of inventing information. And that entails risks.

Georeferencing and orthorectification

Much of this applies outside optical as well – move?

A raw satellite image of land is an angled view of a rough surface. (Even nominally nadir-pointing satellites acquire imagery that is off-nadir toward its edges.) If you imagine riding on a satellite and looking off to, say, the west, you will see the eastern sides of hills and buildings at flatter angles than you see the western sides – if you can see them at all. To turn a raw image into something that is projected orthographically, like a map, you have to use a terrain model – a 3D map of the planet’s surface. Then you can use information about where the satellite was and the angle its sensor was pointing, and for each pixel in the output image, you can project it out to see at what latitude and longitude it must have intersected the ground. Then you move all the pixels to their coordinates in some convenient projection, and you’ve essentially taken the image out of perspective and made it orthographic.

Except:

  • Earth’s surface is rough at every scale, and even “porous” or multiply defined in the sense that there are features like leafless trees that make it hard to define where the optical surface actually is at any given scale.
  • There is no perfectly accurate, precice, global, completely up-to-date terrain model of the Earth, let alone at a reasonable price. SRTM is pretty good but it’s only about 30 m, stops short of the arctic, and is 20+ years out of date: there are entire lakes, highway cuts, and reclaimed islands that don’t exist in it.
  • Satellites typically only know where they’re pointing to within the equivalent of about 10 pixels (which, to be fair, is usually an extremely small fraction of a degree), so the pointing data can only narrow things down, not actually tell you where you are.
  • Continental drift means that a continent can move by easily 1 px over the lifetime of a high-end commercial satellite; a major earthquake can discontinuously distort a small region by several m.
  • To properly pin down an image (i.e., to check the reported pointing angle), you need to know the exact 3D location of 3 visible points within it, and realistically more like 10.
  • All these errors can combine.
  • No matter what, you can’t recover occluded features, i.e. things you can’t see in the original data. If you want a high-res satellite image of something like a canyon, you realistically need half a dozen images at very specific angles, which is extremely hard.

We could go on! Georeferencing and orthorectification is a difficult problem. It’s easier for lower-resolution satellites, because a given angular error comes out to fewer pixels. Also, survey-mode satellites like Landsat and Sentinel-2, which are nadir-pointing anyway, put a lot of effort into doing this well. Two Landsat scenes will almost always coregister to well within a pixel. Sentinel-2 is a little less reliable, especially toward the poles. Commercial imagery is often displaced by far more than you would think. One way to see this is to step back in Google Earth Pro’s history tool, especially somewhere relatively remote and rugged.

Here’s a farm in Nepal: 28.553, 84.2415. Just step back in time and watch it jump around underneath the pin. If you really want to be scared, watch the cliff to its north. This is why imagery analysts who understand imagery pipelines rarely use a whole lot of significant digits in their coordinates! You don’t really know where anything on Earth is, in absolute terms, to within more than a few meters at best if all you have to go on is a satellite image.

Atmospheric correction

Over long distances, even in clear weather, the atmosphere scatters and absorbs light. This is why distant hills are low-contrast and blueish (blue light is scattered more). What a satellite actually measures is called top-of-atmosphere radiance, or TOA. This is a measurement of nothing more than the amount of energy received per second, per pixel, per band. It can be measured pretty objectively. However, it’s often not what you want. For one thing, it’s too blue. For another, the amount of blueness and related effects will vary semi-randomly with atmospheric conditions (humidity, maybe dust storms or wildfire smoke, etc.) and predictably with season (sun distance and angle).

Therefore, a reasonable desire is to basically normalize the sun and remove the effects of the atmosphere. What we’re trying to get to here is called surface reflectance (SR). The main issue is that we don’t know the true state of the atmosphere at the moment the image was acquired. The best we can do is to model it and subtract it out. This is one of the problems in remote sensing, and you could earn a PhD by improving one of the major models by a few percent.

The good news is there’s a brutally simple method that works pretty well most of the time. Dark object subtraction means assuming that the darkest pixel in the image should be pure black. Therefore, if you subtract out however much blue (and green, and so on) signal is present in the darkest pixel, you will have canceled out all the haze. It’s annoying how well this works considering how basic it is. It’s roughly equivalent to the automatic contrast adjustment tool in an image editor like Photoshop, or, to be a little more exact, like using the eyedropper in the Levels tool to set the black point to the darkest pixel.

Correction to reflectance may or may not attempt to correct for terrain effects (i.e., relighting the scene). Different pipelines have different conventions for how far to correct or what to call different kinds of correction.

Atmospheric correction is usually not key for OSINT purposes, but any time you find yourself taking exact measurements of pixel values, you should at least know whether you’re working in TOA or in SR, and if SR, you should have a sense of what the pipeline was.

Common optical sensor types

This section is a stub. Please start it!

  1. Pushbroom
  2. Whiskbroom
  3. Full-frame

Thermal

This section is a stub. Please start it!

Synthetic aperture radar

Synthetic aperture radar, or SAR, creates images with radio waves in wavelengths around 1 cm to 1 m.

As a very first approximation, a SAR image is comparable to an optical image that shows objects that reflect radio waves instead of those that reflect visible light.

SAR is not just a regular camera but for radar instead of light

Beyond the fact that both modalities create images, SAR works on completely different principles from standard optical imaging, and understanding it requires understanding those principles.

This page will only lightly outline how SAR works; for the math, please refer to SERVIR’s SAR Handbook (forest-oriented but with solid fundamentals), the NOAA/NESDIS Synthetic Aperture Radar Marine User’s Manual, or another good text. Here we will only point out some key ideas. If you want to get full value out of SAR, you should expect to invest at least a few hours in learning how it actually works. There’s a reason SAR experts tend to be a bit snobbish about it: it’s complex, subtle, and highly rewarding.

SAR is active

Optical sensors are almost all passive: they use energy that objects are already reflecting (usually from the sun) or producing (for example, in the thermal infrared). In contrast, SAR is active: it sends out a pulse of radar energy, roughly analogous to the flash on a camera.

Sar resolves space with time, not with focus

SAR’s resolution is based on the timing of returning signals. It does not pass the energy it senses through a focusing lens or mirror the way an optical sensor does. This leads to properties that are highly unintuitive if you think of it as merely “optical but in a difference frequency” – for example, it does not loose resolution with distance; there is no exact equivalent of perspective.

Speckle

Like a laser beam, a SAR signal interferes with itself. At a given moment and a given point, its waves may be canceling out or adding up. This means that a SAR image is intrinsically grainy or stippled-looking. This is not the same as sensor noise, because the effect is physically real and not a problem of errors in measurement. It can be mitigated by downsampling, averaging images from different “pings”, or applying despeckling filters. (A simple local median works reasonably well, but there’s a range of sophistication all the way up to sensor-specific filters based on physical models, extra inputs, fancy machine learning, etc.)

Retroreflection and multiple reflection

One consequence of SAR being active sensing is that it sees very bright returns from concave right angles made out of metal, which act as corner reflectors. (Notice how road signs and markers seem to glow disproportionately in headlights – it’s because those are retroreflectors in the optical range.) Highly developed cities, for example, are very retroreflective to radar. This shows up especially where the angle of the sensor’s view aligns to a street grid, when it’s called the cardinal effect. (See, for example, this academic paper, where they propose using retroreflection specifically to classify urban landcover. In general, there are very few radio-frequency corner reflectors in nature, and retroreflection is a good sign that you’re looking at a building, vehicle, etc.)

Where the reflection is separated enough from the first reflecting surface that you can see both independently, we use the term multiple reflection (or mirroring or ghosting). This most often happens where tall buildings or bridges are next to or over water. A radio wave may hit the water, then a bridge, then return to the sensor; another may hit a bridge, then the water, and return to the sensor, and so on, and you’ll see images of multiple bridges.

Layover and shadowing

Layover (a.k.a. relief displacement) is an effect that makes objects at higher elevations appear closer to the sensor. This happens because the radio waves from the top of a vertical object arrive back at the sensor (which is above and to the side of the object) before the radio waves from its base. This is most obvious with truly vertical objects like radio towers and skyscrapers, but surfaces that have any vertical component (hills, for example) will show some degree of layover. Ultimately, layover comes from the difference between slant range, which is what the sensor actually measures – distance from the sensor – and ground range, which is what we tend to intuitively want or expect when we look at a map-like image.

The painfully counterintuitive aspect, if you’re looking at a SAR image as if it were an ordinary optical image, is that layover goes in the opposite direction – buildings, for example, lean toward the sensor. For example, if you take a normal photo of a tall building from the south, it will cover the ground to its north. This feels normal because cameras, telescopes, etc., work on the same basic principle as the eye. But if you collect a SAR image of the same building from the south, it will cover the ground to its south. (Also, it won’t actually mask that ground, it will just add its signal in.)

Shadowing is the lack of data returned from surfaces facing away from the sensor. The shadowed side of terrain is stretched out as part of layover.

SAR imagery can be terrain corrected. Basically, this is a process that uses (1) the satellite’s position and the characteristics of its instrument and (2) a DEM or other model of the terrain it was looking at, and uses these to warp the SAR imagery into map coordinates and account for shadowing. Whether this is worthwhile will depend on the quality of the terrain correction algorithm and the data you can give it, and on what you need to analyze.

In general, be cautious with terrain correction, because it can never fully correct for all effects (e.g., BDRF of different landcovers), and it can magnify small problems in input data. Sometimes it’s better to have a strange-looking image that you know how to interpret than a “normalized” one with subtle errors.

Clouds and many other materials are generally transparent to SAR

SAR frequencies are typically chosen to cut through weather. While this is a massive advantage of SAR over optical (the average place on Earth is cloudy roughly half the time), it’s also not absolute. Heavy rain, for example, can show up as ghostly features in some bands, so be on the lookout for it. If you see something you can’t interpret that might be weather-related, check the weather for the place at the time of image acquisition!

More generally – beyond the specific case of water vapor in air – SAR interacts with materials differently than light does. For example, it reflects more off liquid water, so you can’t see into shallows with SAR the way you can with optical. On the other hand, it interacts less with certain very dry materials, so it can cut through loose sand, dead vegetation, and so on. (For example, SAR is used to map ancient river systems under the Sahara because it can image bedrock under loose, dry sand.) The details of SAR signal interaction depend on wavelength, angle, and other factors; if you’re doing more than casual interpretation of data from a given sensor, it’s a good idea to look it up and familiarize yourself.

Polarimietry and interferometry

Thus far we have only considered backscatter images: maps of the intensity of reflected radio energy. But a good deal of SAR’s value is beyond this kind of data. As well as recording how much energy is in reflected radio waves, SAR sensors characterize the radio waves themselves.

Let’s use Sentinel-1 as an example for polarimetry. S1 sends radio waves in the vertical polarization, abbreviated V, and records them in both vertical and horizontal, or H, polarizations. In practice, this means that when you download an S1 frame in the usual way, you see two images, labeled VV (where the sensor transmitted V and measured V) and VH (where it transmitted V and measured H). The ratio of the two bands therefore tells you (in a general, statistical way, within the constraints of speckling) how much the surface at a given pixel tends to return a radio signal at that frequency and angle in the same polarization.

Why do we care? Because direct reflection and corner reflectors tend to return waves at the same polarization (for Sentinel-1, always VV), while volumes that scatter waves return proportionally more cross-polarized (VH) waves. The second category is mainly vegetation and soil, while the first is corner reflectors, metal, and so on – proportionally more artificial surfaces. You can literally get a PhD in the nuances of SAR polarimtery, but at the most basic level, it tells you something about surface properties that no other sensor would.

Interferometry with SAR, or inSAR, compares wave phase between observations. The phase of a wave is where it is in its cycle when received. Using sound as an example, measuring the phase of a sound at a given moment means not just its volume and pitch but that the sound wave is, say, 23% of the way into its high pressure half, or exactly at the lowest-pressure point.

Suppose we make a SAR image of an area and record not only the amplitude but also the phase of the signal at every pixel. Now, after some time, the satellite’s orbit repeats, and at exactly the same moment in this new orbit (and therefore at exactly the same point in space relative to Earth), we take the same image again. There’s been some change over time that might represent, say, the soil drying out, a road being built, or a tree falling over. But the change in phase over relatively large, coherent regions can be interpreted as the surface getting nearer or farther away by (potentially) very small fractions of a wavelength – on the order of cm. This is an idealized version of inSAR.

Geologists use this to map earthquakes, but you can also use it for drought (because dry land sags), tunneling, dam and building subsidence, underground explosion monitoring, and so on – in theory, anything that changes the distance between the satellite and the surface. You can even use decoherence (the breakdown of continuity between observations, which makes inSAR hard) for damage detection.

When inSAR works, it’s like magic. You can pick up extremely subtle effects over large areas. It does have limits, like that you can only measure displacement towards or away from the satellite(s), which for SAR is always at least somewhat to the side, which is not necessarily in the direction you actually care about (say, up/down). And as you would expect, it tends to require a lot of very good data (because, for example, satellite orbits are never absolutely perfect repeats), expertise, and minutes to days of fine-tuning.

LIDAR

This section is a stub. Please start it!

  1. 3D (survey-style) lidar
  2. 2D (transect-style) lidar

Image delivery

Processing levels

In theory, data processing levels are standard across the industry. In practice, different providers tend to make up their own definitions as necessary, and you should refer to source-specific documentation. But typically, the commonly seen processing levels are:

Level 0: Unprocessed data, more or less as downlinked to the ground station. Generally not sold or publicly released.

Level 1: Basic data in sensor units, for example TOA radiance. Often has a letter suffix with source-specific meaning, e.g., T to indicate a terrain-corrected version.

Level 2: Derived data in geophysical units, for example surface reflectance. Has been through high-level processing (e.g., atmospheric correction) that contains estimation or modeling.

As a rule of thumb, use level 1 if the imagery itself is the focus and you want to analyze the data in a custom way; use level 2 if you just want something that works out of the box to achieve some further goal. But again, the practical meaning of the levels depends on the dataset, so check to make sure you’re getting what you want.

Formats and projections

Image data generally comes in formats optimized for large payloads and good metadata. These include NetCDF, NITF, JPEG2000, and GeoTIFF. GDAL, which is included in QGIS, can read virtually any reasonable format. If you have a choice, GeoTIFF is usually the best.

A geographic projection is basically an invertible function (a reversible, one-to-one relationship) from a sphere to a plane. (If you know enough about geodesy to be saying “the spheroid, actually” right now, go read something more appropriate to your level of expertise 😉.) In other words, for a longitude and a latitude on Earth, a given projection gives you a corresponding x and y that you use to store and display the data.

Typical georeferencing metadata says: (1) here is the projection of the data in this file, and (2) here is where the data in this file lies on the abstract 2D plane defined by that projection.

You may also encounter data that is not projected in any strictly defined way. This might be as simple as a photo taken with a phone out a plane window. In theory you could define a projection for it if you knew parameters like the 3D GPS location of the phone, the angle it was pointed at, its camera’s field of view, and the small distortions introduced by its lens. But in practice it’s usually easier to find known points in the image and “tie down” or georeference the image based on those points. Given at least 3 but ideally more known points, you(r software) can warp the image into some standard projection. It’s deriving an arbitrary projection from pixel space to geographical coordinates by running a regression on the pixel-to-location pairs you provide. These known points are called ground control points, or GCPs. Some data, like Sentinel-1 SAR, is provided unprojected but with GCPs. This leaves more work for the user, but also more flexibility if you want to adjust the GCPs.

There several standard ways to represent projections, notably WKT, proj, and EPSG codes. We’ll give EPSG codes here.

Probably the most common projection you will see for raw data is Universal Transverse Mercator, or UTM. It’s actually a family of projections with the same formula but different parameters, each adapted to a different meridional slice of Earth’s surface. These UTM zones are named with numbers and north/south hemispheres: Paris is in UTM zone 31N, Geneva is in 32N, and Sydney is in 56S. (If you’ve used the MGRS grid system, this should sound familiar, but it’s not identical.) Within a zone, UTM is very close to equal-area and conformal, which are the most important properties for a projection if you want to do analytical work. Equal-area means 1 km² is the same number of pixels at any point in the projection, and conformal means that 1 km is the same number of pixels in every direction from any given point within the projection. (On a non-conformal map, circles appear as ovals, squares are rectangles, etc. This is a massive pain in the ass.) UTM is EPSG:32XYY, where X is 6 for N and 7 for S, and YY is the zone number, so for example 13S is EPSG:32713.

For display on standard web maps, people often use web Mercator, a.k.a. spherical Mercator, which is not equal-area at the global scale, but is conformal. This is why web maps make Greenland far too big, but it remains approximately the right shape. For local analysis, web Mercator is fine (equivalent to UTM, actually), and can be a decent choice if you understand the issues with scale across large areas. EPSG:3857.

The other projection you’ll see the most is equirectangular or plate carrée, which uses longitude and latitude directly as x and y coordinates on a plane. It is neither equal-area nor conformal, and basically only exists because the math is easy. It’s often used by people who should know better. Its non-conformality means that any time you’re working near the poles, everything is squeezed, and you’re either overzooming one dimension, losing data in the other, or both. If you just want to scatterplot some points as quickly as possible, equirectangular is fine, but avoid it when doing anything with imagery. EPSG:4326. (Note that this is the EPSG of WGS84, the geodetic standard that defines things like the prime meridian. Many, many other projections refer to WGS84 in their definitions. But using WGS84 as a projection itself, instead of as an ingredient in a projection, is the equirectangular projection.)

The details of projections are notoriously tricky; it’s hard to work with them in a strictly correct and optimal way at all times. It’s the kind of topic that attracts pedantry and flamewars, unfortunately. Here’s some advice, none of it ironclad:

  1. Most imagery data, if it’s projected at all, is already in an appropriate projection as it arrives from the data provider. If you can, leave it as-is. Every reprojection involves resampling the data, which generally loses information.
  2. You should rarely have to explicitly think about projections. The whole point of a projection is to let you think in terms of pixels and/or meters, and if that’s not happening, something is wrong. Make sure you’re taking full advantage of your tools’ ability to handle these things automatically. Fighting your projection means something is wrong.
  3. If you’re working on a multi-source project, choose a suitable projection at the start and project all data into it once, when you import it.
  4. Most pain around projections comes from accidentally mixing projections. Don’t do that.
  5. The local UTM is usually a reasonable choice.

Bundles

Imagery is most often supplied in bundles, which are directories with image data files, usually separated by band or polarization (at least at level 1), and text (XML, json, etc.) metadata files. Some analysis tools will have plugins that will open specific types of bundles as single objects, automatically applying calibration data found in the metadata and so forth. In other situations you might open the image file and have to parse the metadata with your own code or by hand. If you’re getting to know a new imagery source, going through and understanding the purpose of everything delivered in a bundle is a great way to start.

DN and PN

Image formats generally store integers, since they losslessly compress better and are often easier to work with than floating point numbers. However, this presents a problem if, for example, the units being represented are reflectance, which ranges from 0 to 1. If we round every reflectance value to either 0 or 1, we’re delivering 1-bit data that’s probably close to totally useless. To address this, we might scale up to, say, 0 through 100 and say that instead of recording reflectance fraction, we’re recording reflectance percentage – fraction × 100. That still leaves us with less than 7 bits of radiometric resolution, though. Really, we’d like to be able to scale our values into an arbitrary range, maybe 0 through 65,535 to make full use of a 16-bit image, and send it with some metadata that tells how to get it back into some absolute or physically meaningful unit. You could even change the scaling factor per scene to optimize for bright v. dark, for example. And this is what providers generally do. The values actually stored in the image format are called digital numbers, or DN, and the values after scaling (typically with a multiplicative and an additive coefficient) are physical numbers, or PN.

Not all providers do this. For example, Sentinel-2 level 1C data has a globally constant scaling factor, which means different bands have a defined relationship even if you read raw, unscaled pixels out of them, which is great. However, it’s the most common approach. Basically, don’t assume that pixels actually mean anything with an absolute definition, especially compared to pixels from another band or scene, unless you know that they’re PN.

For most OSINT-relevant analysis, working in DN is a venial sin at worst and often justifiable. But it is useful to know what it means and to recognize situations where you should convert to PN. Any tool designed to work with remote sensing data will at least have some affordance for DN to PN scaling, and, again, may be able to parse the parameters out of a bundle (or in-image-file metadata) and apply them transparently so you never have to think about it.