MongoDB 索引與批次操作最佳化

MongoDB 的效能取決於索引和批次操作的運用。選擇正確的索引型別，例如單欄位、複合或唯一索引，能有效加速查詢速度。批次操作則能大幅提升資料處理效率，尤其在大量資料更新時，initializeUnorderedBulkOp() 和 bulkWrite() 方法更是不可或缺的工具。理解不同儲存引擎的特性，例如 WiredTiger 的檔案級平行控制和 MMAP 的集合級鎖定機制，對於選擇合適的引擎至關重要。此外，熟悉 Java 和 Python 驅動程式，能更有效地操作 MongoDB 資料函式庫。

MongoDB 索引管理與批次操作詳解

MongoDB 提供了多種索引型別以最佳化查詢效能，並支援批次操作提升資料處理效率。本文將探討 MongoDB 的索引管理與批次操作技術。

索引型別與應用

MongoDB 支援多種索引型別，包括單欄位索引、複合索引、唯一索引等。正確選擇索引型別對於最佳化查詢效能至關重要。

複合索引

db.people.createIndex({name: 1, age: -1})

此範例建立了一個複合索引，先按 name 欄位升冪排序，再按 age 欄位降冪排序。複合索引的欄位順序非常重要，它決定了索引能否支援排序操作。

唯一索引

db.collection.createIndex( { "user_id": 1 }, { unique: true } )

唯一索引確保指定欄位的值在集合中是唯一的。若集合中已存在重複值，建立唯一索引將會失敗。

單欄位索引

db.people.createIndex({name: 1})

單欄位索引對單一欄位進行排序，MongoDB 可以雙向遍歷該索引，因此排序方向在此類別索引中並不重要。

索引管理操作

建立索引

使用 createIndex 方法建立索引。

刪除索引

db.people.dropIndex("nameIndex")

或

db.people.dropIndex({name: 1})

可使用索引名稱或索引定義檔案刪除索引。

列出索引

db.people.getIndexes()

此命令傳回集合上的所有索引資訊。

內容解密：

db.people.createIndex({name: 1, age: -1})：此程式碼建立了一個複合索引，先按 name 升冪，再按 age 降冪排序。
db.collection.createIndex( { "user_id": 1 }, { unique: true } )：此程式碼在 user_id 欄位上建立唯一索引，確保該欄位的值在集合中是唯一的。
db.people.dropIndex("nameIndex") 和 db.people.dropIndex({name: 1})：這兩種方法都可用於刪除索引，前者使用索引名稱，後者使用索引定義。
db.people.getIndexes()：列出集合上的所有索引，傳回包含索引資訊的檔案陣列。

批次操作最佳化

對於大規模資料更新，使用批次操作可以顯著提升效能。

使用 `initializeUnorderedBulkOp()` 進行批次更新（適用於 MongoDB 2.6 至 3.2 版本）

var bulk = db.test.initializeUnorderedBulkOp(),
    counter = 0;

db.test.find({
    "salary": { "$exists": true, "$type": 2 },
    "dob": { "$exists": true, "$type": 2 }
}).snapshot().forEach(function(doc){
    var newSalary = parseInt(doc.salary),
        newDob = new ISODate(doc.dob);

    bulk.find({ "_id": doc._id }).updateOne({
        "$set": { "salary": newSalary, "dob": newDob }
    });

    counter++;
    if (counter % 1000 == 0) {
        bulk.execute();
        bulk = db.test.initializeUnorderedBulkOp();
    }
});

使用 `bulkWrite()` 進行批次更新（適用於 MongoDB 3.2 及以上版本）

var cursor = db.test.find({
        "salary": { "$exists": true, "$type": 2 },
        "dob": { "$exists": true, "$type": 2 }
    }),
    bulkUpdateOps = [];

cursor.snapshot().forEach(function(doc){
    var newSalary = parseInt(doc.salary),
        newDob = new ISODate(doc.dob);

    bulkUpdateOps.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": { "$set": { "salary": newSalary, "dob": newDob } }
        }
    });

    if (bulkUpdateOps.length === 1000) {
        db.test.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});

if (bulkUpdateOps.length > 0) {
    db.test.bulkWrite(bulkUpdateOps);
}

內容解密：

var bulk = db.test.initializeUnorderedBulkOp()：初始化一個無序批次操作物件，用於 MongoDB 2.6 至 3.2 版本的批次更新。
bulk.find({ "_id": doc._id }).updateOne({...})：在批次操作中加入更新操作。
bulk.execute()：執行批次操作，每 1000 個操作執行一次。
db.test.bulkWrite(bulkUpdateOps)：在 MongoDB 3.2 及以上版本中，使用 bulkWrite() 方法執行批次寫入操作，同樣每 1000 個操作執行一次。

MongoDB 儲存引擎與驅動程式解析

MongoDB 提供了多種儲存引擎和驅動程式，以滿足不同使用情境和效能需求。本文將探討 MongoDB 的儲存引擎和驅動程式，包括 WiredTiger、MMAP、In-memory 等儲存引擎，以及 Java 和 Python 驅動程式的使用方法。

WiredTiger 儲存引擎

WiredTiger 是 MongoDB 預設的儲存引擎（MongoDB 3.2 版本後），它支援 LSM trees 來儲存索引。LSM trees 適合寫入大量隨機插入的工作負載，因為它們能夠提供更快的寫入效能。

WiredTiger 的主要特性

無 inplace 更新：更新檔案時，會插入新檔案並刪除舊檔案。
檔案級平行控制：假設兩個寫入操作不會影響相同的檔案，如果發生衝突，其中一個操作將被回復並稍後重新執行。
支援 Snappy 和 zLib 壓縮演算法：Snappy 是預設的壓縮演算法，雖然壓縮率較低，但 CPU 使用率較低。

如何使用 WiredTiger 引擎

mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath>

注意事項：

newWiredTigerDBPath 不應包含其他儲存引擎的資料。要遷移資料，需要先匯出資料，然後再匯入新的儲存引擎。
資料遷移步驟：

mongodump –out mongod –storageEngine wiredTiger –dbpath mongorestore


#### #### 內容解密：
此段落介紹瞭如何啟用 WiredTiger 儲存引擎並遷移資料。關鍵步驟包括指定新的資料函式庫路徑、匯出現有資料、啟用 WiredTiger 引擎，最後再匯入資料。這些步驟確保了資料的完整性和相容性。

### MMAP 儲存引擎

MMAP 是另一種可插拔的儲存引擎，它使用 `mmap()` Linux 命令將檔案對映到虛擬記憶體，最佳化讀取呼叫。MMAP 的缺點是無法同時處理相同集合的兩個寫入操作，因此它具有集合級別的鎖定機制。

#### MMAP 的主要特性

*   使用 `mmap()` 對映檔案到虛擬記憶體。
*   最佳化讀取效能。
*   集合級別的鎖定機制：一次只能處理一個集合的寫入操作。

### In-memory 儲存引擎

In-memory 儲存引擎將所有資料儲存在記憶體（RAM）中，以實作更快的讀取和存取速度。

### 其他儲存引擎

*   **mongo-rocks**：一個鍵值儲存引擎，與 Facebook 的 RocksDB 整合。
*   **Fusion-io**：由 SanDisk 建立的儲存引擎，允許繞過作業系統檔案系統層，直接寫入儲存裝置。
*   **TokuMX**：由 Percona 建立的儲存引擎，使用 fractal tree 索引。

### Java 驅動程式

MongoDB 的 Java 驅動程式提供了一系列方法來與 MongoDB 資料函式庫互動。

#### 範例：使用 Java 驅動程式查詢集合資料

```java
import org.bson.Document;
import com.mongodb.BasicDBObject;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;

MongoClient mongoClient = new MongoClient(new ServerAddress("localhost", 27017));
MongoDatabase db = mongoClient.getDatabase("testdb");
MongoCollection<Document> collection = db.getCollection("testcollection");

BasicDBObject searchQuery = new BasicDBObject();
searchQuery.put("name", "dev");

MongoCursor<Document> cursor = collection.find(searchQuery).iterator();
try {
    while (cursor.hasNext()) {
        System.out.println(cursor.next().toJson());
    }
} finally {
    cursor.close();
}

#### 內容解密：

此範例展示瞭如何使用 Java 驅動程式連線到 MongoDB 資料函式庫，並查詢特定條件的集合資料。主要步驟包括建立 MongoClient 物件、取得資料函式庫和集合物件、建立查詢條件、執行查詢並遍歷結果。

Python 驅動程式（PyMongo）

PyMongo 是 MongoDB 的 Python 驅動程式，提供了一種便捷的方式來與 MongoDB 資料函式庫互動。

範例：使用 PyMongo 連線到 MongoDB

from pymongo import MongoClient

uri = "mongodb://localhost:27017/"
client = MongoClient(uri)
db = client['test_db']
collection = db['test_collection']

collection.save({"hello": "world"})
print(collection.find_one())

#### 內容解密：

此範例展示瞭如何使用 PyMongo 連線到 MongoDB 資料函式庫，並執行簡單的插入和查詢操作。關鍵步驟包括建立 MongoClient 物件、取得資料函式庫和集合物件、插入資料並查詢資料。

MongoDB 分片與複製集設定

MongoDB 是一個強大的 NoSQL 資料函式庫，支援分片（Sharding）和複製集（Replica Set）兩種重要的功能，以實作水平擴充套件和高用性。

分片（Sharding）設定

分片是一種將資料分散儲存在多個伺服器上的技術，可以提高資料函式庫的效能和可擴充套件性。

分片環境設定

要設定分片環境，需要三個主要元件：

組態伺服器（Config Server）：儲存分片叢集的中繼資料。
複製集（Replica Sets）：儲存實際的資料。
Mongos：作為查詢路由器，負責將請求路由到正確的分片。

組態伺服器設定

在 mongod.conf 檔案中新增以下組態：

sharding:
  clusterRole: configsvr
replication:
  replSetName: <setname>

啟動組態伺服器：mongod --config <config_file>

複製集設定

建立複製集，請參考複製集設定章節。

Mongos 設定

在 mongos.conf 檔案中新增以下組態：

sharding:
  configDB: <configReplSetName>/cfg1.example.net:27017

啟動 Mongos：mongos --config <config_file>

組態分片

連線到 Mongos，使用以下命令組態分片：

sh.addShard("<replica_set_name>/<host>:<port>")
sh.enableSharding("<database>")
sh.shardCollection("<database>.<collection>", { <key> : <direction> })
sh.status()

複製集（Replica Set）設定

複製集是一組維護相同資料集的 mongod 例項，可以提高資料函式庫的可用性和資料安全性。

基本組態（三節點）

建立資料夾：

mkdir /srv/mongodb/data/rs0-0 mkdir /srv/mongodb/data/rs0-1 mkdir /srv/mongodb/data/rs0-2


2. **啟動 `mongod` 例項**：
   ```bash
mongod --port 27017 --dbpath /srv/mongodb/data/rs0-0 --replSet rs0
mongod --port 27018 --dbpath /srv/mongodb/data/rs0-1 --replSet rs0
mongod --port 27019 --dbpath /srv/mongodb/data/rs0-2 --replSet rs0

組態複製集：

mongo –port 27017 rs.initiate() rs.add(":27018") rs.add(":27019")


4. **測試設定**：
   ```javascript
rs.status()

複製集狀態檢查

使用以下命令檢查複製集狀態：

rs.status()

MongoDB 複製集狀態詳解

命令：`rs.status()`

此命令用於檢查複製集的狀態。

{
  "set" : "ReplicaName",
  "date" : ISODate("2016-09-26T07:36:04.935Z"),
  "myState" : 1,
  "term" : NumberLong(-1),
  "heartbeatIntervalMillis" : NumberLong(2000),
  "members" : [
    {
      "_id" : 0,
      "name" : "<IP>:<PORT>",
      "health" : 1,
      "state" : 1,
      "stateStr" : "PRIMARY",
      ...
    },
    {
      "_id" : 1,
      "name" : "<IP>:<PORT>",
      "health" : 1,
      ...
    }
  ],
  "ok" : 1
}

此輸出提供了複製集的詳細狀態，包括每個成員的健康狀態、角色（主或從）等資訊。

玄貓 BlackCat

技術愛好者，專注於分享程式開發、雲端技術與 AI 應用的心得體會。

MongoDB 索引與批次操作最佳化

MongoDB 索引管理與批次操作詳解

索引型別與應用

複合索引

唯一索引

單欄位索引

索引管理操作

建立索引

刪除索引

列出索引

內容解密：

批次操作最佳化

使用 initializeUnorderedBulkOp() 進行批次更新（適用於 MongoDB 2.6 至 3.2 版本）

使用 bulkWrite() 進行批次更新（適用於 MongoDB 3.2 及以上版本）

內容解密：

MongoDB 儲存引擎與驅動程式解析

WiredTiger 儲存引擎

WiredTiger 的主要特性

如何使用 WiredTiger 引擎

#### 內容解密：

Python 驅動程式（PyMongo）

範例：使用 PyMongo 連線到 MongoDB

#### 內容解密：

MongoDB 分片與複製集設定

分片（Sharding）設定

分片環境設定

組態伺服器設定

複製集設定

Mongos 設定

組態分片

複製集（Replica Set）設定

基本組態（三節點）

複製集狀態檢查

MongoDB 複製集狀態詳解

命令：rs.status()

玄貓 BlackCat

使用 `initializeUnorderedBulkOp()` 進行批次更新（適用於 MongoDB 2.6 至 3.2 版本）

使用 `bulkWrite()` 進行批次更新（適用於 MongoDB 3.2 及以上版本）

命令：`rs.status()`