如何使用Pandas和pytest进行TDD？

# test_cei.py import pandas as pd def test_clean_table_cols() -> None: df = pd.DataFrame( { "full_valued": [1, 2, 3], "all_missing1": [None, None, None], "some_missing": [None, 2, 3], "all_missing2": [None, None, None], } ) expected = pd.DataFrame({"full_valued": [1, 2, 3], "some_missing": [None, 2, 3]}) result = cei.clean_table_cols(df) pd.testing.assert_frame_equal(result, expected)

2条回答

网友

1楼 · 编辑于 2024-06-24 12:59:32

是的，这段代码实际上是一个集成测试，这可能不是一件坏事

即使使用pandas是一个固定的设计决策，仍然有很多很好的理由从外部库中提取测试就是其中之一。从外部库进行抽象允许独立于库测试业务逻辑。在这种情况下，从熊猫中提取将使上述内容成为一个单元测试。它将测试与库的交互

要应用此模式，我建议看一下ports and adapters architecture pattern

然而，这确实意味着您不再测试pandas提供的功能。如果这仍然是您的特定意图，那么集成测试不是一个坏的解决方案

网友

2楼 · 编辑于 2024-06-24 12:59:32

您可能会发现tdda（测试驱动数据分析）很有用，引用文档：

The tdda package provides Python support for test-driven data analysis (see 1-page summary with references, or the blog). The tdda.referencetest library is used to support the creation of reference tests, based on either unittest or pytest. The tdda.constraints library is used to discover constraints from a (Pandas) DataFrame, write them out as JSON, and to verify that datasets meet the constraints in the constraints file. It also supports tables in a variety of relation databases. There is also a command-line utility for discovering and verifying constraints, and detecting failing records. The tdda.rexpy library is a tool for automatically inferring regular expressions from a column in a Pandas DataFrame or from a (Python) list of examples. There is also a command-line utility for Rexpy. Although the library is provided as a Python package, and can be called through its Python API, it also provides command-line tools."

另见Nick Radcliffe's PyData talk on Test-Driven Data Analysis

相关问题更多 >

编程相关推荐

热门问题

热门文章