MongoDB: Join and Format Array of Data

Aggregation pipeline

·

3 min read

If you have an array of id and you want to retrieve data from another table and filter the properties to return you will use the aggregate feature of mongo and create a pipeline like so:

  1. $match - to retrieve the document you're interested in and the list of id you want to proceed

  2. $lookup - to join the other table

  3. $group or $project - format your data

Let's make this simple with a real life example.

Imagine you have a Library, this library contains a list of unavailable Books.

In your Library collection you store an array of Id representing all the unavailable books for this library.

Our objective: retrieve all unavaible books (name and id) for a specific library.

Library {
    id: string,
    name: string,
    unavailable_books: string[]
}

Book {
    id: string,
    title: string,
    author: string
}
// Mongo playground link to play with it : 
// https://mongoplayground.net/p/2TIGfemo8g-
{
    $match: {
        id: libraryId
    }
}
{
    $lookup: {
        from: 'books', // The collection to join
        localField: 'unavailable_books', // The property containing your ids
        foreignField: 'id',
        as: 'unavailableBooks',
    },
}

With this query you retrieve all the properties of both collections.

The format step

Now that we have all the data. We can use multiple approaches.

$group

You can create a $group step an use $push to create and array containing only the properties you want:

//Mongo playground link: https://mongoplayground.net/p/oxE7ZQohjSE
{
    $match: {
      id: libraryId
    }
  },
  {
    $lookup: {
      from: "books",
      localField: "unavailable_books",
      foreignField: "id",
      as: "unavailableBooks"
    }
  },
  {
    $group: {
      _id: "$id",
      unavailable: {
        $push: {
          id: "$unavailableBooks.id",
          title: "$unavailableBooks.title"
        }
      }
    }
  }
// Result
[
  {
    "_id": 1,
    "unavailable": [
      {
        "id": [
          1,
          3,
          4
        ],
        "title": [
          "Hamlet",
          "1984",
          "Don Quichotte"
        ]
      }
    ]
  }
]

But it return the book id and name in two separate array which is not what we want.

💡
If you don't have any data in your $unavailableBooks it return the same result but "unavailable" contains an object with empty array: Mongo Playground Link

$project

Using a $project makes things easy because you can use $cond

💡
$cond cannot be used in a $group step

$cond work exactly like the if/else condition:

$cond: {
    "if": { condition },
    "then": //Do something,
    "else": // Do something else
}

Now we can say: If unavailableBooks is empty, return [] otherwise push my data in "unavailable"

By using the $map operator instead of the $push :

💡
$push cannot be used in a $project step
// MongoPlayground with final result
// https://mongoplayground.net/p/GZgHMe4mZVr
{
    $match: {
      id: 1
    }
  },
  {
    $lookup: {
      from: "books",
      localField: "unavailable_books",
      foreignField: "id",
      as: "unavailableBooks"
    }
  },
  {
    $project: {
      _id: "$id",
      name: "$name",
      unavailable: {
        $cond: {
          "if": {
            $eq: [
              "unavailableBooks",
              []
            ]
          },
          "then": [],
          "else": {
            $map: {
              input: "$unavailableBooks",
              as: "d",
              in: {
                firstName: "$$d.id",
                lastName: "$$d.title"
              }
            }
          }
        }
      }
    }
  }
// Result 
[
  {
    "_id": 1,
    "name": "Library 1",
    "unavailable": [
      {
        "firstName": 1,
        "lastName": "Hamlet"
      },
      {
        "firstName": 3,
        "lastName": "1984"
      },
      {
        "firstName": 4,
        "lastName": "Don Quichotte"
      }
    ]
  }
]

Conclusion

This is what I've found for now. It wasn't easy to get my head around this.

I feel like using the $project step to do so is strange as this step is supposed to filter wich field must be return but I didn't found a solution that work with the $group step yet.

If you know any other method to get the same result from a $group let me know!