{"componentChunkName":"component---src-templates-acg-portal-new-template-tsx","path":"/wmo9zna4x","result":{"data":{"markdownRemark":{"html":"<h2 id=\"简介\"><a href=\"#%E7%AE%80%E4%BB%8B\" aria-label=\"简介 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>简介</h2>\n<p>空白字符标准化器 - 将文本中不同种类的空白符号替换成标准空格</p>\n<h3 id=\"功能描述\"><a href=\"#%E5%8A%9F%E8%83%BD%E6%8F%8F%E8%BF%B0\" aria-label=\"功能描述 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>功能描述</h3>\n<ul>\n<li>空白字符识别：自动识别各种Unicode空白字符</li>\n<li>标准化处理：将所有空白字符替换为标准空格</li>\n</ul>\n<h2 id=\"算子参数\"><a href=\"#%E7%AE%97%E5%AD%90%E5%8F%82%E6%95%B0\" aria-label=\"算子参数 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>算子参数</h2>\n<h3 id=\"输入\"><a href=\"#%E8%BE%93%E5%85%A5\" aria-label=\"输入 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>输入</h3>\n<table>\n<thead>\n<tr>\n<th>输入</th>\n<th>含义</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>text</td>\n<td>待处理的文本列，要求元素类型为字符串</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"输出\"><a href=\"#%E8%BE%93%E5%87%BA\" aria-label=\"输出 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>输出</h3>\n<table>\n<thead>\n<tr>\n<th>输出</th>\n<th>含义</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>normalized_text</td>\n<td>标准化后的文本列</td>\n</tr>\n</tbody>\n</table>\n<h2 id=\"调用示例\"><a href=\"#%E8%B0%83%E7%94%A8%E7%A4%BA%E4%BE%8B\" aria-label=\"调用示例 permalink\" class=\"anchor\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>调用示例</h2>\n\n    <div class=\"code-block-wrapper\">\n        <div class=\"code-block\">\n            <div class=\"code-block-header\">\n                <span class=\"code-block-name\">Plain Text</span>\n                <button class=\"code-copy-btn\" data-tooltip-text=\"\">\n                    <svg xmlns=\"http://www.w3.org/2000/svg\" width=\"16\" height=\"16\" viewBox=\"0 0 16 16\" fill=\"none\"> <path fill-rule=\"evenodd\" clip-rule=\"evenodd\" d=\"M5.57894 3.45614C5.57894 3.38832 5.63392 3.33333 5.70175 3.33333H12.5439C12.6117 3.33333 12.6667 3.38832 12.6667 3.45614V10.2982C12.6667 10.3661 12.6117 10.4211 12.5439 10.4211H11.7544V5.70175C11.7544 4.89754 11.1025 4.24561 10.2982 4.24561H5.57894V3.45614ZM4.24561 4.24561V3.45614C4.24561 2.65194 4.89754 2 5.70175 2H12.5439C13.3481 2 14 2.65194 14 3.45614V10.2982C14 11.1025 13.3481 11.7544 12.5439 11.7544H11.7544V12.5439C11.7544 13.3481 11.1025 14 10.2982 14H3.45614C2.65194 14 2 13.3481 2 12.5439V5.70175C2 4.89754 2.65194 4.24561 3.45614 4.24561H4.24561ZM3.33333 5.70175C3.33333 5.63392 3.38832 5.57894 3.45614 5.57894H10.2982C10.3661 5.57894 10.4211 5.63392 10.4211 5.70175V12.5439C10.4211 12.6117 10.3661 12.6667 10.2982 12.6667H3.45614C3.38832 12.6667 3.33333 12.6117 3.33333 12.5439V5.70175Z\" fill=\"currentColor\"></path> </svg>\n                    复制\n                </button>\n            </div>\n            <div class=\"code-block-content\">\n                <pre class=\"language-text\"><code><span class=\"line-number\">1</span>from __future__ import annotations\n<span class=\"line-number\">2</span>\n<span class=\"line-number\">3</span>import os\n<span class=\"line-number\">4</span>import daft\n<span class=\"line-number\">5</span>from daft import col\n<span class=\"line-number\">6</span>\n<span class=\"line-number\">7</span>from daft.aihc.common.udf import aihc_udf\n<span class=\"line-number\">8</span>from daft.aihc.functions.text.whitespace_normalizer import WhitespaceNormalizer\n<span class=\"line-number\">9</span>\n<span class=\"line-number\">10</span>if __name__ == &quot;__main__&quot;:\n<span class=\"line-number\">11</span>    if os.getenv(&quot;DAFT_RUNNER&quot;, &quot;native&quot;) == &quot;ray&quot;:\n<span class=\"line-number\">12</span>        import ray\n<span class=\"line-number\">13</span>        ray.init(dashboard_host=&quot;0.0.0.0&quot;, ignore_reinit_error=True)\n<span class=\"line-number\">14</span>        daft.set_runner_ray()\n<span class=\"line-number\">15</span>    daft.set_execution_config(actor_udf_ready_timeout=6000, min_cpu_per_task=0)\n<span class=\"line-number\">16</span>\n<span class=\"line-number\">17</span>    samples = {\n<span class=\"line-number\">18</span>        &quot;text&quot;: [\n<span class=\"line-number\">19</span>            &quot;Hello\\u00a0World&quot;,           \n<span class=\"line-number\">20</span>            &quot;中文\\u3000全角空格\\u3000测试&quot;, \n<span class=\"line-number\">21</span>            &quot;Tab\\t分隔\\t文本&quot;,    \n<span class=\"line-number\">22</span>            &quot;多   个   普   通   空   格&quot;, \n<span class=\"line-number\">23</span>        ]\n<span class=\"line-number\">24</span>    }\n<span class=\"line-number\">25</span>\n<span class=\"line-number\">26</span>    ds = daft.from_pydict(samples)\n<span class=\"line-number\">27</span>    ds = ds.with_column(\n<span class=\"line-number\">28</span>        &quot;url_ratio&quot;,\n<span class=\"line-number\">29</span>        aihc_udf(\n<span class=\"line-number\">30</span>            UrlRatioCalculator,\n<span class=\"line-number\">31</span>            construct_args={},\n<span class=\"line-number\">32</span>        )(col(&quot;text&quot;)),\n<span class=\"line-number\">33</span>    )\n<span class=\"line-number\">34</span>    ds.show()</code></pre>\n            </div>\n        </div>\n    </div>\n  ","fields":{"slug":"wmo9zna4x","title":"空白字符标准化器","date":"2026-04-23","extractedHeadings":[]},"headings":[{"value":"简介","depth":2},{"value":"功能描述","depth":3},{"value":"算子参数","depth":2},{"value":"输入","depth":3},{"value":"输出","depth":3},{"value":"调用示例","depth":2}]}},"pageContext":{"isCreatedByStatefulCreatePages":false,"slug":"wmo9zna4x","prev":{"id":"Umn1dxohu","name":"工作流","path":"Umn1dxohu","filePath":"操作指南/工作流/工作流模板示例.md","seo":null,"parentIds":["ilib2qygp","Em66d2kok"],"parents":[{"id":"ilib2qygp","documentId":"bfa43a8b-968a-41a1-8c9d-906507eeaed9","name":"操作指南","repoName":"AIHC","filePath":"操作指南","disabled":false,"path":"ilib2qygp","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":null},{"id":"Em66d2kok","documentId":"a6a38f6a-0da4-4a3a-8aa4-5b64c05390cf","name":"工作流","repoName":"AIHC","filePath":"操作指南/工作流","disabled":false,"path":"Em66d2kok","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":null}]},"next":{"id":"wmoa03wob","name":"中文简繁体转换器","path":"wmoa03wob","filePath":"操作指南/AI数据处理/算子列表/文本/中文简繁体转换器.md","seo":null,"parentIds":["ilib2qygp","Ymo88m8hi","Imob3m6so","fmob4536z"],"parents":[{"id":"ilib2qygp","documentId":"bfa43a8b-968a-41a1-8c9d-906507eeaed9","name":"操作指南","repoName":"AIHC","filePath":"操作指南","disabled":false,"path":"ilib2qygp","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":null},{"id":"Ymo88m8hi","documentId":"c8cb5e38-f8c5-40f4-a424-b0c7895f0c0a","name":"AI数据处理","repoName":"AIHC","filePath":"操作指南/AI数据处理","disabled":false,"path":"Ymo88m8hi","lastMergeTime":"2026-04-21 14:23:10","isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""},{"id":"Imob3m6so","documentId":"fe548e34-6659-4ff5-86f6-eee2c43aec90","name":"算子列表","repoName":"AIHC","filePath":"操作指南/AI数据处理/算子列表","disabled":false,"path":"Imob3m6so","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""},{"id":"fmob4536z","documentId":"cd84d7f5-def9-4cfd-a487-db445fbdf371","name":"文本","repoName":"AIHC","filePath":"操作指南/AI数据处理/算子列表/文本","disabled":false,"path":"fmob4536z","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""}]},"parents":[{"id":"ilib2qygp","documentId":"bfa43a8b-968a-41a1-8c9d-906507eeaed9","name":"操作指南","repoName":"AIHC","filePath":"操作指南","disabled":false,"path":"ilib2qygp","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":null},{"id":"Ymo88m8hi","documentId":"c8cb5e38-f8c5-40f4-a424-b0c7895f0c0a","name":"AI数据处理","repoName":"AIHC","filePath":"操作指南/AI数据处理","disabled":false,"path":"Ymo88m8hi","lastMergeTime":"2026-04-21 14:23:10","isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""},{"id":"Imob3m6so","documentId":"fe548e34-6659-4ff5-86f6-eee2c43aec90","name":"算子列表","repoName":"AIHC","filePath":"操作指南/AI数据处理/算子列表","disabled":false,"path":"Imob3m6so","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""},{"id":"fmob4536z","documentId":"cd84d7f5-def9-4cfd-a487-db445fbdf371","name":"文本","repoName":"AIHC","filePath":"操作指南/AI数据处理/算子列表/文本","disabled":false,"path":"fmob4536z","lastMergeTime":null,"isApiDoc":null,"httpMethod":null,"seo":null,"sourceOrgName":null,"sourceRepoName":null,"sourceDocumentId":""}],"specificSeo":null}}}