Categorical scales
Introduction
So far, the scales we have discussed have only dealt with continuous, quantitative data. Discrete, and particularly categorical data (aka qualitative data), must be mapped using categorical scales (aka qualitative scales).
Categorical data can generally not be measured with a numerical result. Categorical data are typically discrete and segregated into well-defined groups. Examples are countries, blood types, political parties, etc.
Ordinal data are categorical data with intrinsically ordered value categories. Examples are ordering adverbs like "small", "medium", "large", scores for creating a ranking, Likert scales, letter grades for coursework, etc
Ordinal scales
Ordinal scales have a discrete domain and range. Scale domains in D3 are JavaScript arrays (or iterators) that will be read in the order they are given.
const color = d3.scaleOrdinal(["a", "b", "c"], ["red", "green", "blue"]);
console.log(color("a")); // logs: "red"
console.log(color("b")); // logs: "green"
console.log(color("c")); // logs: "blue"
The default domain is an empty array.
An ordinal scale returns undefined
if the range is not defined.
If an input value that is not in the domain, is passed to the scale, it is implicitly assigned to the domain
(provided that unknown
is not explicitly set to a value).
Range values are mapped in the order that domain values are assigned.
After all range values have been mapped, it starts with the first range value again.
const color = d3.scaleOrdinal(["a", "b", "c"], ["red", "green", "blue"]);
console.log(color("p")); // logs: "red"
console.log(color("p")); // logs: "red"
console.log(color("q")); // logs: "green"
console.log(color("r")); // logs: "blue"
console.log(color.domain()); // logs: [ "a", "b", "c", "p", "q", "r" ]
const color = d3.scaleOrdinal(["red", "green", "blue"]);
console.log(color("a")); // logs: "red"
console.log(color("b")); // logs: "green"
console.log(color("c")); // logs: "blue"
console.log(color(0)); // logs: "red"
console.log(color(1)); // logs: "green"
console.log(color(2)); // logs: "blue"
console.log(color.domain()); // logs: [ "a", "b", "c", 0, 1, 2 ]
D3 provides predefined categorical color schemes:
const colorScale = d3.scaleOrdinal(d3.schemeCategory10);
const nurseryRhyme = "She sells seashells by the seashore and the shells she sells are seashells, I'm sure.";
const words = nurseryRhyme.split(' ');
const container = d3.select("#container");
for (let i = 0; i < words.length; i++) {
container
.append("span")
.style("background", () => colorScale(i))
.text(words[i])
}
Result:
Another example:
const persons = [
"M16.5 …",
"M18.5 …",
"M21.292 …",
"M16.5 …",
];
const colorScale = d3.scaleOrdinal()
.domain(persons)
.range(d3.schemeCategory10.slice(0, -10 + persons.length)); // sliced range
const svg = d3.select("#svgContainer");
svg.selectAll("path")
.data(persons)
.join("path")
.attr("d", d => d)
.style("fill", (d,i) => colorScale(i))
.attr("transform", (d,i) => `translate(${i*35},0)`);
console.log(colorScale.range()); // logs: [ "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728" ]
Result:
SVG graphics from iconmonstr.
Band scales
Band scales are like ordinal scales. The domain is discreet, but the range is continuous and numeric. Band scales are typically used for bar charts with a categorical dimension.
Next an example bar chart. The height of the bars is not connected to data yet. This example is to illustrate just the band scale, defining the bar width and bar positions along the x-axis.
// Bars (with padding) must fit in a rectangle with dimensions:
const width = 100,
height = 100;
const barsHeight = 90; // make all bars same height for now
const svg = d3.select("svg")
.attr("width", "100%")
.attr("height", "100%")
.attr("viewBox", `0 0 ${width} ${height + 20}`)
.attr("style", "width: 300px; height: auto;");
// rect in which the bars fit (for illustration):
svg.append("rect")
.attr("x", 0)
.attr("y", 0)
.attr("width", width)
.attr("height", height)
.attr("style", "fill: #ddd; stroke: none;");
const data = ["a", "b", "c", "d"];
const x = d3.scaleBand()
.domain(data)
.range([0, width])
.padding(0.1); // padding(value) (value in interval [0, 1])
// between (and before/after) the bars.
// If the padding would have been zero,
// the bandwidth (width of the bars) would have been 25,
// and bar "a" would have started at 0.
console.log(x.bandwidth()); // logs: 21.951219512195124
console.log(x("a")); // logs: 2.439024390243901
// Draw the bars:
svg.append("g")
.attr("fill", "red")
.selectAll()
.data(data)
.join("rect")
.attr("x", (d) => x(d))
.attr("y", height - barsHeight)
.attr("height", barsHeight)
.attr("width", x.bandwidth());
// Add the x-axis.
svg.append("g")
.attr("transform", `translate(0,${height})`)
.call(d3.axisBottom(x).tickSizeOuter(0));
Result:
In the next example we connect data to the bar height. The bar chart shows the frequency of letters in the Dutch language (discreet domain, continuous and numeric range).
const width = 928, // width of the SVG
height = 500, // height of the SVG
margin = {
top: 40,
right: 10,
bottom: 40,
left: 50,
}
const svg = d3.select("svg")
.attr("width", "100%")
.attr("height", "100%")
.attr("viewBox", `0 0 ${width} ${height}`)
.attr("style", "width: 100%; height: auto; border: 1px solid #ddd;");
const data = [{"letter": "a", "frequency": 7.49}, {"letter": "b", "frequency": 1.58}, {"letter": "c", "frequency": 1.24}, {"letter": "d", "frequency": 5.93}, {"letter": "e", "frequency": 18.91}, {"letter": "f", "frequency": 0.81}, {"letter": "g", "frequency": 3.40}, {"letter": "h", "frequency": 2.38}, {"letter": "i", "frequency": 6.5}, {"letter": "j", "frequency": 1.46}, {"letter": "k", "frequency": 2.25}, {"letter": "l", "frequency": 3.57}, {"letter": "m", "frequency": 2.21}, {"letter": "n", "frequency": 10.03}, {"letter": "o", "frequency": 6.06}, {"letter": "p", "frequency": 1.57}, {"letter": "q", "frequency": 0.009}, {"letter": "r", "frequency": 6.41}, {"letter": "s", "frequency": 3.73}, {"letter": "t", "frequency": 6.79}, {"letter": "u", "frequency": 1.99}, {"letter": "v", "frequency": 2.85}, {"letter": "w", "frequency": 1.52}, {"letter": "x", "frequency": 0.036}, {"letter": "y", "frequency": 0.035}, {"letter": "z", frequency: 1.39}];
data.sort((a, b) => b.frequency - a.frequency);
const scaleX = d3.scaleBand()
.domain(data.map(d => d.letter)) // Array of letters, sorted by descending frequency
.range([margin.left, width - margin.right])
.padding(0.1)
.round(false);
// Range of the y-scale needs to be flipped
// because the y-axis must go from bottom to top,
// but SVG y-coordinates go from top to bottom.
const scaleY = d3.scaleLinear()
.domain([0, d3.max(data, (d) => d.frequency)])
.range([height - margin.bottom, margin.top]);
// Add bars:
svg.append("g")
.attr("fill", "steelblue")
.attr("stroke", "none")
.selectAll()
.data(data)
.join("rect")
.attr("x", (d) => scaleX(d.letter))
.attr("y", (d) => scaleY(d.frequency))
.attr("height", (d) => scaleY(0) - scaleY(d.frequency)) // scaleY(0) = height - margin.bottom
.attr("width", scaleX.bandwidth())
.append("title") // see 1)
.text(d => `${d.letter}: ${d.frequency}%`);
// Add x-axis:
svg.append("g")
.attr("transform", `translate(0,${scaleY(0)})`) // scaleY(0) = height - margin.bottom
.attr("style", "fill:none; font-size:16px; font-family:sans-serif;")
.call(d3.axisBottom(scaleX).tickSizeOuter(0)); // see 2)
// Add y-axis:
svg.append("g")
.attr("transform", `translate(${margin.left},0)`)
.attr("style", "fill:none; font-size:16px; font-family:sans-serif;")
.call(d3.axisLeft(scaleY).tickFormat((y) => `${y}%`))
.call(g => g.select(".domain").remove()) // see 2)
.call(g => g.append("text")
.attr("x", -margin.left/2)
.attr("y", margin.top - 16)
.attr("fill", "currentColor")
.attr("text-anchor", "start")
.text("↑ Frequency")
);
Result:
Data source: Wikipedia - Letter frequency
The above example is an adapted version of this Observable bar chart example.
1) See MDN Webdocs — <title>
- the SVG accessible name element.
2) Removes the start end end tick of the x-axis and removes the y-scale domain line. See next chapter axes: styling the ticks.
Point scales
Point scales are basically band scales with paddingInner
fixed to 1 (and consequently the bandwidth fixed to zero).
Point scales are typically used for scatterplots with a categorical dimension.
Next example shows a chard of alphabet letter frequencies in English, Dutch and French.
const width = 928, // width of the SVG
height = 600, // height of the SVG
margin = {
top: 40,
right: 10,
bottom: 40,
left: 50,
}
const svg = d3.select("#letter-frequency-point-scale")
.attr("width", "100%")
.attr("height", "100%")
.attr("viewBox", `0 0 ${width} ${height}`)
.attr("style", "width: 100%; height: auto;");
const dataEnglish = [{"letter":"a", "frequency":8.167}, {"letter":"b", "frequency":1.492}, {"letter":"c", "frequency":2.782}, {"letter":"d", "frequency":4.253}, {"letter":"e", "frequency":12.702}, {"letter":"f", "frequency":2.228}, {"letter":"g", "frequency":2.015}, {"letter":"h", "frequency":6.094}, {"letter":"i", "frequency":6.966}, {"letter":"j", "frequency":0.253}, {"letter":"k", "frequency":1.772}, {"letter":"l", "frequency":4.025}, {"letter":"m", "frequency":2.406}, {"letter":"n", "frequency":6.749}, {"letter":"o", "frequency":7.507}, {"letter":"p", "frequency":1.929}, {"letter":"q", "frequency":0.095}, {"letter":"r", "frequency":5.987}, {"letter":"s", "frequency":6.327}, {"letter":"t", "frequency":9.056}, {"letter":"u", "frequency":2.758}, {"letter":"v", "frequency":0.978}, {"letter":"w", "frequency":2.360}, {"letter":"x", "frequency":0.250}, {"letter":"y", "frequency":1.974}, {"letter":"z", "frequency":0.074}];
const dataDutch = [{"letter": "a", "frequency": 7.49}, {"letter": "b", "frequency": 1.58}, {"letter": "c", "frequency": 1.24}, {"letter": "d", "frequency": 5.93}, {"letter": "e", "frequency": 18.91}, {"letter": "f", "frequency": 0.81}, {"letter": "g", "frequency": 3.40}, {"letter": "h", "frequency": 2.38}, {"letter": "i", "frequency": 6.5}, {"letter": "j", "frequency": 1.46}, {"letter": "k", "frequency": 2.25}, {"letter": "l", "frequency": 3.57}, {"letter": "m", "frequency": 2.21}, {"letter": "n", "frequency": 10.03}, {"letter": "o", "frequency": 6.06}, {"letter": "p", "frequency": 1.57}, {"letter": "q", "frequency": 0.009}, {"letter": "r", "frequency": 6.41}, {"letter": "s", "frequency": 3.73}, {"letter": "t", "frequency": 6.79}, {"letter": "u", "frequency": 1.99}, {"letter": "v", "frequency": 2.85}, {"letter": "w", "frequency": 1.52}, {"letter": "x", "frequency": 0.036}, {"letter": "y", "frequency": 0.035}, {"letter": "z", frequency: 1.39}];
const dataFrench = [{"letter": "a", "frequency": 7.636}, {"letter": "b", "frequency": 0.901}, {"letter": "c", "frequency": 3.260}, {"letter": "d", "frequency": 3.669}, {"letter": "e", "frequency": 14.715}, {"letter": "f", "frequency": 1.066}, {"letter": "g", "frequency": 0.866}, {"letter": "h", "frequency": 0.937}, {"letter": "i", "frequency": 7.529}, {"letter": "j", "frequency": 0.813}, {"letter": "k", "frequency": 0.074}, {"letter": "l", "frequency": 5.456}, {"letter": "m", "frequency": 2.968}, {"letter": "n", "frequency": 7.095}, {"letter": "o", "frequency": 5.796}, {"letter": "p", "frequency": 2.521}, {"letter": "q", "frequency": 1.362}, {"letter": "r", "frequency": 6.693}, {"letter": "s", "frequency": 7.948}, {"letter": "t", "frequency": 7.244}, {"letter": "u", "frequency": 6.311}, {"letter": "v", "frequency": 1.838}, {"letter": "w", "frequency": 0.049}, {"letter": "x", "frequency": 0.427}, {"letter": "y", "frequency": 0.708}, {"letter": "z", frequency: 0.326}];
const letters = [];
dataDutch.forEach((elm) => letters.push(elm.letter));
const x = d3.scalePoint()
.domain(letters)
.range([margin.left, width - margin.right])
.padding(0.3)
.round(false);
const y = d3.scaleLinear()
.domain([0, Math.max(
d3.max(dataEnglish, (d) => d.frequency),
d3.max(dataDutch, (d) => d.frequency),
d3.max(dataFrench, (d) => d.frequency)
)])
.range([height - margin.bottom, margin.top]);
// English points
svg.append("g")
.attr("fill", "red")
.attr("fill-opacity", 0.5)
.attr("stroke", "red")
.selectAll()
.data(dataEnglish)
.join("circle")
.attr("cx", (d) => x(d.letter))
.attr("cy", (d) => y(d.frequency))
.attr("r", 6)
.append("title")
.text(d => `${d.letter} in English: ${d.frequency}%`);
// Dutch points
svg.append("g")
.attr("fill", "orange")
.attr("fill-opacity", 0.5)
.attr("stroke", "orange")
.selectAll()
.data(dataDutch)
.join("circle")
.attr("cx", (d) => x(d.letter))
.attr("cy", (d) => y(d.frequency))
.attr("r", 6)
.append("title")
.text(d => `${d.letter} in Dutch: ${d.frequency}%`);
// French points
svg.append("g")
.attr("fill", "blue")
.attr("fill-opacity", 0.5)
.attr("stroke", "blue")
.selectAll()
.data(dataFrench)
.join("circle")
.attr("cx", (d) => x(d.letter))
.attr("cy", (d) => y(d.frequency))
.attr("r", 6)
.append("title")
.text(d => `${d.letter} in French: ${d.frequency}%`);
svg.append("g")
.attr("transform", `translate(0, ${height - margin.bottom})`)
.attr("style", "fill:none; font-size:16px; font-family:sans-serif;")
.call(d3.axisBottom(x).tickSizeOuter(0));
svg.append("g")
.attr("transform", `translate(${margin.left}, 0)`)
.attr("style", "fill:none; font-size:16px; font-family:sans-serif;")
.call(d3.axisLeft(y).tickFormat((y) => `${y}%`))
.call(g => g.append("text")
.attr("x", -margin.left)
.attr("y", margin.top - 20)
.attr("fill", "currentColor")
.attr("text-anchor", "start")
.text("↑ Frequency")
);
Result:
Data source: Wikipedia - Letter frequency