Distinct List Of Values
Minimum MongoDB Version: 4.2
Scenario
You want to query a collection of persons where each document contains data on one or more languages spoken by the person. The query result should be an alphabetically sorted list of unique languages that a developer can subsequently use to populate a list of values in a user interface's "drop-down" widget.
This example is the equivalent of a SELECT DISTINCT statement in SQL.
Sample Data Population
Drop any old version of the database (if it exists) and then populate a new persons
collection with 9 documents:
db = db.getSiblingDB("book-distinct-values");
db.dropDatabase();
// Insert records into the persons collection
db.persons.insertMany([
{
"firstname": "Elise",
"lastname": "Smith",
"vocation": "ENGINEER",
"language": "English",
},
{
"firstname": "Olive",
"lastname": "Ranieri",
"vocation": "ENGINEER",
"language": ["Italian", "English"],
},
{
"firstname": "Toni",
"lastname": "Jones",
"vocation": "POLITICIAN",
"language": ["English", "Welsh"],
},
{
"firstname": "Bert",
"lastname": "Gooding",
"vocation": "FLORIST",
"language": "English",
},
{
"firstname": "Sophie",
"lastname": "Celements",
"vocation": "ENGINEER",
"language": ["Gaelic", "English"],
},
{
"firstname": "Carl",
"lastname": "Simmons",
"vocation": "ENGINEER",
"language": "English",
},
{
"firstname": "Diego",
"lastname": "Lopez",
"vocation": "CHEF",
"language": "Spanish",
},
{
"firstname": "Helmut",
"lastname": "Schneider",
"vocation": "NURSE",
"language": "German",
},
{
"firstname": "Valerie",
"lastname": "Dubois",
"vocation": "SCIENTIST",
"language": "French",
},
]);
Aggregation Pipeline
Define a pipeline ready to perform the aggregation:
var pipeline = [
// Unpack each language field which may be an array or a single value
{"$unwind": {
"path": "$language",
}},
// Group by language
{"$group": {
"_id": "$language",
}},
// Sort languages alphabetically
{"$sort": {
"_id": 1,
}},
// Change _id field's name to 'language'
{"$set": {
"language": "$_id",
"_id": "$$REMOVE",
}},
];
Execution
Execute the aggregation using the defined pipeline and also view its explain plan:
db.persons.aggregate(pipeline);
db.persons.explain("executionStats").aggregate(pipeline);
Expected Results
Seven unique language names should be returned sorted in alphabetical order, as shown below:
[
{language: 'English'},
{language: 'French'},
{language: 'Gaelic'},
{language: 'German'},
{language: 'Italian'},
{language: 'Spanish'},
{language: 'Welsh'}
]
Observations
-
Unwinding Non-Arrays. In some of the example's documents, the
language
field is an array, whilst in others, the field is a simple string value. The$unwind
stage can seamlessly deal with both field types and does not throw an error if it encounters a non-array value. Instead, if the field is not an array, the stage outputs a single record using the field's string value in the same way it would if the field was an array containing just one element. If you are sure the field in every document will only ever be a simple field rather than an array, you can omit this first stage ($unwind
) from the pipeline. -
Group ID Provides Unique Values. By grouping on a single field and not accumulating other fields such as total or count, the output of a
$group
stage is just every unique group's ID, which in this case is every unique language. -
Unset Alternative. For the pipeline to be consistent with earlier examples in this book, it could have included an additional
$unset
stage to exclude the_id
field. However, partly to show another way, the example pipeline used here marks the_id
field for exclusion in the$set
stage by being assigned the$$REMOVE
variable.