In the world of database optimization, indexes are like a double-edged sword. While they can dramatically improve query performance, having too many indexes can actually harm your database’s overall efficiency. This guide will walk you through the process of identifying and removing unnecessary indexes in MongoDB, using a real-world example to illustrate the concepts.

The Problem: Index Overhead

Creating indexes for every query might seem like a good idea at first, but it can lead to several issues:

  • Write Performance Degradation: Each write operation (insert/update/delete) requires corresponding updates to all affected indexes. This adds an O(log n) operation per index to every write.
  • Storage Amplification: Indexes consume additional storage space proportional to the indexed data size plus metadata, typically 4-8 bytes per document entry per index.
  • Memory Pressure: Indexes must be loaded into RAM for optimal performance, consuming valuable memory resources from the working set.
  • Background Maintenance: Index maintenance processes consume CPU resources during rebalancing, compaction, and stats recalculation.
  • Oplog Overhead: In replica sets, index updates add entries to the oplog, increasing replication lag potential.

A Real-World Example

Let’s examine a practical scenario using a meetings collection in MongoDB. This collection stores information about different school courses, with documents structured like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{
    _id        : { type: String, required: true }, // indexed by default
    title      : { type: String, required: true, trim: true },
    project_id : {
      type     : String,
      ref      : 'Project',
      required : true,
      index    : true,
    },
    meeting_date : {
      type     : Number,
      required : true,
      index    : true,
    },
    venue_id   : {
      type     : String,
      required : true,
      index    : true,
    },
    status : {
      type    : Number,
      enum    : Object.values(meetingConstants.STATUS),
      default : meetingConstants.STATUS.SCHEDULED,
    },
    creator_id : {
      type     : String,
      ref      : 'User',
      required : true,
      index    : true,
    },
    creator_name : {
      type     : String,
      required : true,
      trim     : true,
    },
    is_active : {
      type    : Boolean,
      default : true,
    }
  }
  
  
  // Compound index
  meetingSchema.index(
  { creator_id: 1, meeting_date: 1 },
  {
    name                    : 'creator_id_1_meeting_date_1'
  }
);
  

Current Index Structure

The collection currently has indexes on every field:

  • _id (default index)
  • { project_id: 1 }
  • { meeting_date: 1 }
  • { venue_id: 1 }
  • { creator_id: 1 }
  • Compound index { creator_id: 1, meeting_date: 1 }

Step 1: Evaluating Index Usage

The first step in optimizing your indexes is to understand how they’re being used. MongoDB provides the $indexStats aggregation stage to help you analyze index usage:

1
db.courses.aggregate([{ $indexStats: {} }])

This command returns detailed statistics about each index, including:

  • Index name
  • Key structure
  • Number of operations
  • Last access time
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
[
  {
    name: '_id_',
    key: {
      _id: 1
    },
    ,
    accesses: {
      ops: 9773,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
        _id: 1
      },
      name: '_id_',
      ns: 'pp_dev.meetings'
    }
  },
  {
    name: 'creator_id_1',
    key: {
      creator_id: 1
    },
    ,
    accesses: {
      ops: 1203,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
        creator_id: 1
      },
      name: 'creator_id_1',
      ns: 'pp_dev.meetings',
      background: true
    }
  },
  {
    name: 'meeting_date_1',
    key: {
      meeting_date: 1
    },
    ,
    accesses: {
      ops: 73,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
        meeting_date: 1
      },
      name: 'meeting_date_1',
      ns: 'pp_dev.meetings',
      background: true
    }
  },
  {
    name: 'creator_id_1_meeting_date_1',
    key: {
      creator_id:   1,
      meeting_date: 1
    },
    ,
    accesses: {
      ops: 2432,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
	      creator_id:   1,
	      meeting_date: 1
      },
      name: 'creator_id_1_meeting_date_1',
      ns: 'pp_dev.meetings',
      background: true
    }
  },
  {
    name: 'project_id_1',
    key: {
      project_id: 1
    },
    ,
    accesses: {
      ops: 210,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
        project_id: 1
      },
      name: 'project_id_1',
      ns: 'pp_dev.meetings',
      background: true
    }
  },
  {
    name: 'venue_id_1',
    key: {
      venue_id: 1
    },
    ,
    accesses: {
      ops: 0,
      since: 2024-08-27T07: 59: 39.448Z
    },
    spec: {
      v: 2,
      key: {
        venue_id: 1
      },
      name: 'venue_id_1',
      ns: 'pp_dev.meetings',
      background: true
    }
  }
]

Analyzing the Results

From the statistics, we can identify several issues:

  1. The venue_id_1 index has never been used (0 operations)
  2. The creator_id_1 and meeting_date_1 indexes are redundant because they’re covered by the compound index { creator_id: 1, meeting_date: 1 }
  3. project_id_1 show very low usage compared to others

Analyzing Explain Plans

To fully understand index utilization, we should analyze query execution plans:

1
db.meetings.find({ creator_id: "user123" }).explain("executionStats")

Sample execution stats:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "pp_dev.meetings",
    "indexFilterSet": false,
    "parsedQuery": { "creator_id": { "$eq": "user123" } },
    "winningPlan": {
      "stage": "FETCH",
      "inputStage": {
        "stage": "IXSCAN",
        "keyPattern": { "creator_id": 1 },
        "indexName": "creator_id_1",
        "isMultiKey": false,
        "direction": "forward",
        "indexBounds": { "creator_id": [ "[\"user123\", \"user123\"]" ] }
      }
    },
    "rejectedPlans": [
      {
        "stage": "FETCH",
        "inputStage": {
          "stage": "IXSCAN",
          "keyPattern": { "creator_id": 1, "meeting_date": 1 },
          "indexName": "creator_id_1_meeting_date_1",
          "isMultiKey": false,
          "direction": "forward",
          "indexBounds": {
            "creator_id": [ "[\"user123\", \"user123\"]" ],
            "meeting_date": [ "[MinKey, MaxKey]" ]
          }
        }
      }
    ]
  },
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 47,
    "executionTimeMillis": 2,
    "totalKeysExamined": 47,
    "totalDocsExamined": 47,
    "executionStages": {
      "stage": "FETCH",
      "nReturned": 47,
      "executionTimeMillisEstimate": 0,
      "works": 48,
      "advanced": 47,
      "needTime": 0,
      "needYield": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "docsExamined": 47,
      "alreadyHasObj": 0,
      "inputStage": {
        "stage": "IXSCAN",
        "nReturned": 47,
        "executionTimeMillisEstimate": 0,
        "works": 48,
        "advanced": 47,
        "needTime": 0,
        "needYield": 0,
        "saveState": 0,
        "restoreState": 0,
        "isEOF": 1,
        "keyPattern": { "creator_id": 1 },
        "indexName": "creator_id_1",
        "isMultiKey": false,
        "direction": "forward",
        "indexBounds": { "creator_id": [ "[\"user123\", \"user123\"]" ] },
        "keysExamined": 47,
        "seeks": 1,
        "dupsTested": 0,
        "dupsDropped": 0
      }
    }
  }
}

Step 2: Safely Testing Index Removal

Before permanently removing indexes, it’s wise to test the impact of their removal. MongoDB provides the hideIndex() method for this purpose:

1
2
3
db.courses.hideIndex("project_id_1")
db.courses.hideIndex("creator_id_1")
db.courses.hideIndex("meeting_date_1")

This allows you to:

  • Monitor query performance
  • Identify any unexpected impacts
  • Safely revert if needed

Ongoing Index Usage Monitoring

Implement regular index usage monitoring:

1
2
3
4
5
6
7
8
9
// Schedule weekly index usage analysis
function analyzeIndexUsage() {
  db.meetings.aggregate([{ $indexStats: {} }]).forEach(function(index) {
    print(index.name + ": " + index.accesses.ops + " operations");
    if (index.accesses.ops < 100) {
      print("CANDIDATE FOR REMOVAL: " + index.name);
    }
  });
}

Write vs. Read Performance Analysis

Calculate the index impact ratio:

1
Index Impact Ratio = (write operations × index count) / (read operations × index hit rate)

A high ratio indicates excessive index overhead relative to benefits.

Storage Efficiency Analysis

Periodically assess index storage efficiency:

1
2
3
4
const stats = db.meetings.stats();
const indexSizeTotal = Object.values(stats.indexSizes).reduce((a, b) => a + b, 0);
const indexRatio = indexSizeTotal / stats.size;
console.log("Index to data ratio: " + indexRatio);

A ratio over 0.5 often indicates index bloat.

Step 3: Removing Unnecessary Indexes

After confirming that the hidden indexes aren’t needed, you can permanently remove them using dropIndexes():

1
db.courses.dropIndexes(["project_id_1", "creator_id_1", "meeting_date_1"])

The Optimized Index Structure

After optimization, the collection maintains only the most valuable indexes:

  • _id (default index)
  • Compound index { creator_id: 1, meeting_date: 1 }

Best Practices for Index Management

  1. Regular Monitoring: Use $indexStats regularly to track index usage
  2. Compound Indexes: Prefer compound indexes over single-field indexes when possible
  3. Testing: Always test index changes in a staging environment first
  4. Documentation: Keep track of why each index exists
  5. Performance Metrics: Monitor query performance before and after index changes

Conclusion

Proper index management is crucial for maintaining optimal MongoDB performance. Optimizing MongoDB indexes requires balancing read performance against write overhead. By regularly evaluating and removing unnecessary indexes, you can:

  • Improve write performance
  • Reduce storage requirements
  • Maintain efficient query execution
  • Optimize resource utilization

Remember that index optimization is an ongoing process. Regular monitoring and maintenance will help keep your database running at peak efficiency.