MongoDB内嵌数组文档条件查询

MongoDB的查询一项简单高效,但要对内嵌的数组文档查询并且按条件返回匹配的项目则需要费点小事。
假设MongoDB中有个集合的内容为:

Blog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"_id" : ObjectId("58a3b24f934f070f5e34699b"),
"className" : "mongo.service.Blog",
"author" : "test1",
"post" : "This page displays a list of Java EE applications and stand-alone application modules that have been installed to this domain.",
"comments" : [
{
"author" : "a1",
"date" : ISODate("2017-02-15T01:43:43.092Z"),
"content" : "nice"
},
{
"author" : "a2",
"date" : ISODate("2017-02-15T01:43:43.092Z"),
"content" : "i dont give sh*t"
},
{
"author" : "a3",
"date" : ISODate("2017-02-15T01:43:43.092Z"),
"content" : "dope"
}
]
}
{
"_id" : ObjectId("58a3b24f934f070f5e34699c"),
"className" : "mongo.service.Blog",
"author" : "test1",
"post" : "The deployment has been successfully installed.",
"comments" : [
{
"author" : "a4",
"date" : ISODate("2017-02-15T01:43:43.109Z"),
"content" : "gg"
},
{
"author" : "a5",
"date" : ISODate("2017-02-15T01:43:43.109Z"),
"content" : "good job"
}
]
}

comments是个内嵌的数组文档,现在假设我要查找comments里author等于a3的评论,如使用find的话

1
db.Blog.find({"comments.author":"a3"});

这种查询会把匹配这个条件的整个文档返回出来,包括那些author并不是a3的评论,因为查询匹配父文档
这样的返回结果显然是不符合要求,并且在内嵌文档数量很多的情况下很浪费资源。
所以我们只能使用聚合查询解决这种问题。

Aggregate
1
2
3
4
5
6
db.Blog.aggregate(
{"$match":{"comments.author" : "a3"}},
{"$project":{"_id":1,"comments":"$comments"}},
{"$unwind":"$comments"},
{"$match":{"comments.author" : "a3"}}
)

  • 第一个$match可有可无,假设文档中有大量的数据,可以先通过条件定位到主文档
  • $project投射哪些字段,这里提取了id和comments(提取了文档中的$comments这个字段并且改名为comments)
  • $unwind将数组拆成一条条独立的文档
  • $match最后的这个条件尝试匹配这些独立的文档。

所以最终的返回结果。

1
2
3
4
5
6
7
8
{
"_id" : ObjectId("58a3b24f934f070f5e34699b"),
"comments" : {
"author" : "a3",
"date" : ISODate("2017-02-15T01:43:43.092+0000"),
"content" : "dope"
}
}

匹配的数据虽然由comments数组变为对象但仍然是一个内嵌的文档,我们需要转换成一个独立的主文档,这样在某些情况下更加方便,比如java领域对象中的序列化,所以需要将上面Aggregate改改。

Aggregate
1
2
3
4
5
6
7
8
9
10
11
12
db.Blog.aggregate(
{"$match":{"comments.author" : "a3"}},
{"$unwind":"$comments"},
{"$match":{"comments.author" : "a3"}},
{"$project":{
"_id":0,
"author":"$comments.author",
"date":"$comments.date",
"content":"$comments.content"
}
}
)

  • 最后一步增加了投射$project,将主文档的_id隐藏,匹配的内嵌文档中每个字段投射出来,返回结果变为。
1
2
3
4
5
{
"author" : "a3",
"date" : ISODate("2017-02-15T01:43:43.092+0000"),
"content" : "dope"
}

将数据整成想要的样子完全是为了在开发中能方便的进行对象序列化,以MongoDB ORM框架Morphia为例,上面集合对应的JavaBean为

Blog
1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* @author akalxs@gmail.com
*/
@Data
@Entity("Blog")
public class Blog {
@Id
private ObjectId _id;
private String author;
private String post;
@Embedded
private List<Comment> comments;
}

Comment
1
2
3
4
5
6
7
8
9
/**
* @author akalxs@gmail.com
*/
@Data
public class Comment {
private String author;
private Date date;
private String content;
}

测试代码

Test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Test
public void getCommentByAuthor() {
Datastore ds = getDatastore();
Query q = ds.createQuery(Blog.class).field("comments.author").equal("a3");
ds.createAggregation(Blog.class)
.match(q)
.unwind("comments")
.match(q)
.project(
Projection.projection("author", "comments.author"),
Projection.projection("date", "comments.date"),
Projection.projection("content", "comments.content")
)
.aggregate(Comment.class)
.forEachRemaining(
comment -> System.out.println("评论:" + comment)
);
}

输出结果已转成我们想要的对象Comment

1
评论:Comment(author=a3, date=Wed Feb 15 09:43:43 CST 2017, content=dope)