雲端服務模擬與測試資料函式庫實務應用

在雲端服務開發中，確保程式碼品質和穩定性至關重要。因此，有效地進行單元測試和整合測試不可或缺。本文將探討如何使用 Python 的 mock 和 pytest 以及 Moto 函式庫對雲端服務（例如 GCS 和 AWS S3）進行模擬測試，同時也將探討如何有效地使用測試資料函式庫進行更全面的測試。這些技術能有效隔離外部依賴，確保測試結果的穩定性和可重複性，並提升開發效率。

雲端服務模擬測試的實務應用

在進行雲端服務的單元測試時，如何有效地模擬（mock）外部服務是至關重要的技術挑戰。本篇文章將探討如何針對Google Cloud Storage（GCS）及Amazon Web Services（AWS）S3進行模擬測試，並介紹相關的最佳實踐。

GCS模擬測試實作

delete_temp方法實作

from google.cloud import storage

def delete_temp(bucket_name, prefix):
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blobs = bucket.list_blobs(prefix=prefix)
    for blob in blobs:
        blob.delete()

測試delete_temp方法

from google.cloud.storage import Blob
import mock

@mock.patch('cloud_examples.storage', autospec=True)
def test_delete_temp(storage):
    blob = mock.Mock(spec=Blob)
    blob.delete.return_value = None
    mock_bucket = storage.Client.return_value.get_bucket.return_value
    mock_bucket.list_blobs.return_value = [blob, blob]
    
    cloud_examples.delete_temp("fake_bucket", "fake_prefix")
    assert blob.delete.call_count == 2
    
    client_mock = storage.Client.return_value
    client_mock.get_bucket.assert_called_with("fake_bucket")
    client_mock.get_bucket.return_value.list_blobs.assert_called_with("fake_prefix")

內容解密：

使用@mock.patch裝飾器來模擬cloud_examples.storage模組。
建立一個blob模擬物件，並指定其delete方法的回傳值為None。
設定mock_bucket的list_blobs方法回傳包含兩個相同blob物件的列表。
驗證delete方法被呼叫的次數是否正確。
檢查get_bucket和list_blobs方法是否以正確的引數被呼叫。

AWS S3模擬測試實作

delete_temp_aws方法實作

import boto3

def delete_temp_aws(bucket_name, prefix):
    s3 = boto3.client('s3', region_name='us-east-1')
    objects = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
    object_keys = [{'Key': item['Key']} for item in objects['Contents']]
    s3.delete_objects(Bucket=bucket_name, Delete={'Objects': object_keys})

使用Moto測試delete_temp_aws方法

def test_delete_temp_aws(s3):
    s3.create_bucket(Bucket="fake_bucket")
    s3.put_object(Bucket="fake_bucket", Key="fake_prefix/something", Body=b'Some info')
    
    delete_temp_aws("fake_bucket", "fake_prefix")
    obj_response = s3.list_objects_v2(Bucket="fake_bucket", Prefix="fake_prefix")
    assert obj_response['KeyCount'] == 0

內容解密：

使用Moto提供的s3 fixture來建立模擬的S3環境。
在模擬環境中建立一個bucket並上傳一個物件。
呼叫delete_temp_aws方法刪除指定prefix下的物件。
驗證刪除操作是否成功，檢查指定prefix下的物件數量是否為0。

雲端服務模擬測試的最佳實踐

選擇合適的模擬工具：針對不同的雲端服務選擇最合適的模擬工具，如GCS使用mock.patch，AWS S3使用Moto。
建立獨立的測試環境：確保測試環境與實際雲端環境隔離，避免意外修改或刪除生產資料。
驗證關鍵操作：重點驗證與雲端服務互動的關鍵操作是否正確執行。
保持測試的自包含性：盡量讓測試案例自包含，不依賴外部狀態或資源。
適當使用fixture：利用pytest等測試框架提供的fixture機制來簡化測試準備工作。

使用Moto進行雲端服務模擬測試

在進行雲端服務相關的單元測試時，使用模擬（mocking）是一種常見且有效的方法。Moto是一個用於模擬AWS服務的Python函式庫，它能夠攔截對AWS API的呼叫，並提供一個虛擬的環境來進行測試。

設定AWS憑證Fixture

@pytest.fixture(scope="function")
def aws_credentials():
    os.environ['AWS_ACCESS_KEY_ID'] = 'testing'
    # ...

這個fixture設定了環境變數中的AWS憑證。雖然這些憑證是虛假的，但它們的存在是必要的，因為Boto3客戶端需要它們來進行身份驗證。

建立S3客戶端Fixture

@pytest.fixture(scope="function")
def s3(aws_credentials):
    with mock_s3():
        yield boto3.client('s3', region_name='us-east-1')

內容解密：

mock_s3()：Moto提供的上下文管理器，用於模擬S3服務。
boto3.client('s3', region_name='us-east-1')：在模擬環境中建立一個S3客戶端。
yield：將S3客戶端提供給測試函式使用。當測試函式執行完畢後，上下文管理器會自動清理模擬環境。

測試範例

def test_delete_temp_aws(s3):
    s3.create_bucket(Bucket="fake_bucket")
    s3.put_object(Bucket="fake_bucket", Key="fake_prefix/something", Body=b'Some info')
    obj_response = s3.list_objects_v2(Bucket="fake_bucket", Prefix="fake_prefix")
    assert len(obj_response['Contents']) == 1

內容解密：

s3.create_bucket(Bucket="fake_bucket")：在模擬的S3環境中建立一個名為fake_bucket的儲存桶。
s3.put_object(Bucket="fake_bucket", Key="fake_prefix/something", Body=b'Some info')：在fake_bucket中建立一個名為fake_prefix/something的物件，內容為Some info。
s3.list_objects_v2(Bucket="fake_bucket", Prefix="fake_prefix")：列出fake_bucket中以fake_prefix為字首的物件。
assert len(obj_response['Contents']) == 1：斷言列出的物件數量為1。

狀態管理的重要性

使用scope="function"確保每個測試函式都有一個乾淨的模擬環境，避免狀態汙染。
使用yield而不是return，確保在測試函式執行期間，模擬環境保持有效。

驗證模擬效果

若註解掉建立儲存桶和物件的程式碼，再執行測試：

def test_delete_temp_aws(s3):
    # s3.create_bucket(Bucket="fake_bucket")
    # s3.put_object(Bucket="fake_bucket", Key="fake_prefix/something", Body=b'Some info')
    obj_response = s3.list_objects_v2(Bucket="fake_bucket", Prefix="fake_prefix")
    assert len(obj_response['Contents']) == 1

將會遇到錯誤：

botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist

這驗證了模擬環境確實生效，並且能夠正確地捕捉到儲存桶不存在的情況。

使用測試資料函式庫進行測試

在測試涉及資料函式庫操作的程式碼時，使用測試資料函式庫是一種常見的做法。這樣可以避免對生產資料函式庫造成影響，同時也能減少測試的成本。

測試資料函式庫的需求

理想情況下，測試資料函式庫應該與生產資料函式庫使用相同的資料函式庫型別，以確保測試結果的一致性。
在某些情況下，使用不同的資料函式庫型別（如SQLite）可以減少測試的開銷，但需要確保操作的行為一致。

醫療資料管理系統的ETL流程測試範例

在一個醫療資料管理系統中，ETL（提取、轉換、載入）流程涉及多個資料函式庫操作。為了測試這個流程，我們建立了一個測試資料函式庫，並在其中填充了必要的資料。

ETL流程描述

從Patient Data表中提取資料。
解析Treatment Info欄位，並與Treatment和Delivery Mechanism查詢表進行比對。
將比對結果更新到Match Results表中。
將需要審核的記錄移到Review表中，將成功比對的記錄移到Ingest表中。

測試重點

驗證中間表（如Match Results）是否正確填充。
確保資料在不同步驟之間的流轉正確。

使用Plantuml圖示展示ETL流程

@startuml
skinparam backgroundColor #FEFEFE
skinparam defaultTextAlignment center
skinparam rectangleBackgroundColor #F5F5F5
skinparam rectangleBorderColor #333333
skinparam arrowColor #333333

title 使用Plantuml圖示展示ETL流程

rectangle "解析Treatment Info" as node1
rectangle "OK=True" as node2
rectangle "OK=False" as node3
rectangle "比對" as node4

node1 --> node2
node2 --> node3
node3 --> node4

@enduml

此圖示展示了ETL流程的主要步驟和資料流向。

圖示解密：

Patient Data表：原始資料表，包含需要處理的醫療資料。
Treatment和Delivery Mechanism查詢表：用於比對的參考資料表。
Match Results表：儲存比對結果的中間表。
Ingest表：儲存成功比對的記錄。
Review表：儲存需要審核的記錄。

透過使用測試資料函式庫和仔細設計的測試案例，我們能夠有效地驗證ETL流程的正確性，確保系統在生產環境中的穩定性和可靠性。

使用測試資料函式庫進行單元測試

在進行ETL（Extract, Transform, Load）流程的單元測試時，使用測試資料函式庫是一種常見且有效的方法。透過測試資料函式庫，可以模擬真實的資料處理流程，並驗證ETL步驟的正確性。

建立與管理測試資料函式庫

為了確保測試的可靠性和一致性，測試資料函式庫的建立和管理至關重要。可以使用pytest的fixture功能來實作測試資料函式庫的建立和銷毀。

@pytest.fixture(scope="session")
def test_db():
    engine = setup_test_db()
    yield engine
    teardown_test_db(engine)

在這個範例中，test_db fixture負責建立測試資料函式庫，並在測試完成後銷毀它。yield關鍵字確保了資料函式庫引擎在測試完成後被正確地關閉。

程式碼解密：

@pytest.fixture(scope="session")：定義了一個名為test_db的fixture，其作用域為整個測試會話。
engine = setup_test_db()：呼叫setup_test_db函式來建立測試資料函式庫。
yield engine：將資料函式庫引擎提供給測試使用。
teardown_test_db(engine)：在測試完成後，呼叫teardown_test_db函式來銷毀測試資料函式庫。

自訂命令列選項以控制測試資料函式庫的保留

有時，為了除錯測試失敗的原因，需要保留測試資料函式庫。可以透過新增自訂的命令列選項來實作這一功能。

def pytest_addoption(parser):
    parser.addoption(
        "--persist-db", action="store_true",
        help="Do not teardown the test db at the end of the session",
    )

@pytest.fixture(scope="session")
def test_db(request):
    engine = setup_test_db()
    yield engine
    if request.config.getoption("--persist-db"):
        return
    teardown_test_db(engine)

程式碼解密：

pytest_addoption函式：用於新增自訂的命令列選項--persist-db。
request.config.getoption("--persist-db")：檢查是否使用了--persist-db選項。如果使用了，則不銷毀測試資料函式庫。

管理測試資料函式庫中的表

在測試資料函式庫中，有些表是靜態的，不需要在每次測試後被清理，而有些表則需要每次測試後被清理。可以使用不同scope的fixture來管理這些表。

@pytest.fixture(scope="session")
def test_conn(test_db):
    test_engine = create_engine(f"postgresql://{creds}@{host}/test_db")
    test_conn = test_engine.connect()
    create_tables(test_conn)
    yield test_conn
    test_conn.close()
    test_engine.dispose()

@pytest.fixture(scope="function")
def patient_table(test_conn):
    test_conn.execute("""
        CREATE TABLE patient_data (
            id SERIAL PRIMARY KEY,
            treatment_name VARCHAR(255)
        )
    """)
    yield
    test_conn.execute("DROP TABLE patient_data")

程式碼解密：

test_conn fixture：用於建立靜態表，如treatment和delivery_mechanism，並在整個測試會話中保持連線。
patient_table fixture：用於建立patient_data表，並在每次測試後清理該表。

在測試中使用測試資料函式庫

在編寫測試案例時，可以使用上述fixture來插入測試資料，並驗證ETL流程的正確性。

def test_match_success(test_conn, patient_table):
    test_conn.execute("""
        INSERT INTO patient_data VALUES (1, 'Drug A tablet 0.25mg')
    """)
    # 進行ETL流程並驗證結果

程式碼解密：

test_match_success函式：一個測試案例，使用test_conn和patient_table fixture。
INSERT INTO patient_data：向patient_data表中插入測試資料。

除錯失敗的測試

當測試失敗時，可以透過多種方式來除錯，例如使用自訂的--persist-db選項來保留測試資料函式庫，或使用斷點和print陳述式來檢查資料函式庫狀態。

程式碼解密：

--persist-db選項：保留測試資料函式庫以便除錯。
斷點和print陳述式：用於檢查資料函式庫狀態和除錯。

玄貓 BlackCat

技術愛好者，專注於分享程式開發、雲端技術與 AI 應用的心得體會。