MongoDB: Join and Format Array of Data
Aggregation pipeline
If you have an array of id and you want to retrieve data from another table and filter the properties to return you will use the aggregate feature of mongo and create a pipeline like so:
$match- to retrieve the document you're interested in and the list of id you want to proceed$lookup- to join the other table$groupor$project- format your data
Let's make this simple with a real life example.
Imagine you have a Library, this library contains a list of unavailable Books.
In your Library collection you store an array of Id representing all the unavailable books for this library.
Our objective: retrieve all unavaible books (name and id) for a specific library.
Library {
id: string,
name: string,
unavailable_books: string[]
}
Book {
id: string,
title: string,
author: string
}
// Mongo playground link to play with it :
// https://mongoplayground.net/p/2TIGfemo8g-
{
$match: {
id: libraryId
}
}
{
$lookup: {
from: 'books', // The collection to join
localField: 'unavailable_books', // The property containing your ids
foreignField: 'id',
as: 'unavailableBooks',
},
}
With this query you retrieve all the properties of both collections.
The format step
Now that we have all the data. We can use multiple approaches.
$group
You can create a $group step an use $push to create and array containing only the properties you want:
//Mongo playground link: https://mongoplayground.net/p/oxE7ZQohjSE
{
$match: {
id: libraryId
}
},
{
$lookup: {
from: "books",
localField: "unavailable_books",
foreignField: "id",
as: "unavailableBooks"
}
},
{
$group: {
_id: "$id",
unavailable: {
$push: {
id: "$unavailableBooks.id",
title: "$unavailableBooks.title"
}
}
}
}
// Result
[
{
"_id": 1,
"unavailable": [
{
"id": [
1,
3,
4
],
"title": [
"Hamlet",
"1984",
"Don Quichotte"
]
}
]
}
]
But it return the book id and name in two separate array which is not what we want.
$unavailableBooks it return the same result but "unavailable" contains an object with empty array: Mongo Playground Link$project
Using a $project makes things easy because you can use $cond
$cond cannot be used in a $group step$cond work exactly like the if/else condition:
$cond: {
"if": { condition },
"then": //Do something,
"else": // Do something else
}
Now we can say: If unavailableBooks is empty, return [] otherwise push my data in "unavailable"
By using the $map operator instead of the $push :
$push cannot be used in a $project step// MongoPlayground with final result
// https://mongoplayground.net/p/GZgHMe4mZVr
{
$match: {
id: 1
}
},
{
$lookup: {
from: "books",
localField: "unavailable_books",
foreignField: "id",
as: "unavailableBooks"
}
},
{
$project: {
_id: "$id",
name: "$name",
unavailable: {
$cond: {
"if": {
$eq: [
"unavailableBooks",
[]
]
},
"then": [],
"else": {
$map: {
input: "$unavailableBooks",
as: "d",
in: {
firstName: "$$d.id",
lastName: "$$d.title"
}
}
}
}
}
}
}
// Result
[
{
"_id": 1,
"name": "Library 1",
"unavailable": [
{
"firstName": 1,
"lastName": "Hamlet"
},
{
"firstName": 3,
"lastName": "1984"
},
{
"firstName": 4,
"lastName": "Don Quichotte"
}
]
}
]
Conclusion
This is what I've found for now. It wasn't easy to get my head around this.
I feel like using the $project step to do so is strange as this step is supposed to filter wich field must be return but I didn't found a solution that work with the $group step yet.
If you know any other method to get the same result from a $group let me know!