如何筛选出不以数字开头的行(CSV、PySpark)。已编辑:仅包含编号

2024-09-24 02:22:56 发布

您现在位置:Python中文网/ 问答频道 /正文

CSV File

在df a列中,有一些行不是以数字开头的,我想将它们删除,我尝试了下面的一些代码,但它们不起作用

import re
df = sqlContext.read.csv("/FileStore/tables/mtmedical_V6-16623.csv", header='true', inferSchema="true")

df.show()

import pyspark.sql.functions as f
w=df.filter(df['_c0'].isdigit()) #error1
w=df.filter(df['_c0'].startswith(('1','2','3','4','5','6','7','8','9'))) #error2
w.show()

错误:

'Column' object is not callable #no1
py4j.Py4JException: Method startsWith([class java.util.ArrayList]) does not exist #no2

这是表格,您可以看到“\u c0”列第7行下方的行不是以数字开头的,如何删除这些行

+--------------------+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
|                 _c0|         description|   medical_specialty|                 age|              gender|sample_name (What has been done to patient = Treatment)|       transcription|            keywords|
+--------------------+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
|                   1| A 23-year-old wh...| Allergy / Immuno...|                  23|              female|                                     Allergic Rhinitis |SUBJECTIVE:,  Thi...|allergy / immunol...|
|                   2| Consult for lapa...|          Bariatrics|                null|                male|                                    Laparoscopic Gas...|PAST MEDICAL HIST...|bariatrics, lapar...|
|                   3| Consult for lapa...|          Bariatrics|                  42|                male|                                    Laparoscopic Gas...|"HISTORY OF PRESE...| at his highest h...|
|                   4| 2-D M-Mode. Dopp...| Cardiovascular /...|                null|                null|                                    2-D Echocardiogr...|2-D M-MODE: , ,1....|cardiovascular / ...|
|                   5|  2-D Echocardiogram| Cardiovascular /...|                null|                male|                                    2-D Echocardiogr...|1.  The left vent...|cardiovascular / ...|
|                   6| Morbid obesity. ...|          Bariatrics|                  30|                male|                                    Laparoscopic Gas...|PREOPERATIVE DIAG...|bariatrics, gastr...|
|                   7| Liposuction of t...|                null|                null|                null|                                                   null|                null|                null|
|", Bariatrics,31,...|       1.  Deformity| right breast rec...|2.  Excess soft t...| anterior abdomen...|                                   3.  Lipodystrophy...|POSTOPERATIVE DIA...|       1.  Deformity|
|                   8|  2-D Echocardiogram| Cardiovascular /...|                null|                male|                                    2-D Echocardiogr...|2-D ECHOCARDIOGRA...|cardiovascular / ...|

Tags: csvimporttruedfshow数字nullmale