正如您在下面的年龄和性别列中所看到的,我有一些数据,而它的值应该是null或数字,为什么单元格会相互冲突?如何清洁我的专栏
据我所知,问题的根源是描述列,其中一些单元格显示为空/或数据显示带有一些非删除空格,而它们有数据,因此当我读取文件时,描述的内容显示在年龄和性别列中
df = sqlContext.read.csv("/FileStore/tables/mtmedical_V6-16623.csv", header=True)
df.show(150)
输出:
+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
| description| medical_specialty| age| gender|sample_name (What has been done to patient = Treatment)| transcription| keywords|
+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
| A 23-year-old wh...| Allergy / Immuno...| 23| female| Allergic Rhinitis |SUBJECTIVE:, Thi...|allergy / immunol...|
| Consult for lapa...| Bariatrics| null| male| Laparoscopic Gas...|PAST MEDICAL HIST...|bariatrics, lapar...|
| Consult for lapa...| Bariatrics| 42| male| Laparoscopic Gas...|"HISTORY OF PRESE...| at his highest h...|
| 2-D M-Mode. Dopp...| Cardiovascular /...| null| null| 2-D Echocardiogr...|2-D M-MODE: , ,1....|cardiovascular / ...|
| 2-D Echocardiogram| Cardiovascular /...| null| male| 2-D Echocardiogr...|1. The left vent...|cardiovascular / ...|
| Morbid obesity. ...| Bariatrics| 30| male| Laparoscopic Gas...|PREOPERATIVE DIAG...|bariatrics, gastr...|
| Liposuction of t...| null| null| null| null| null| null|
|", Bariatrics,31,...| 1. Deformity| right breast rec...|2. Excess soft t...| anterior abdomen...|3. Lipodystrophy...|POSTOPERATIVE DIA...|
| 2-D Echocardiogram| Cardiovascular /...| null| male| 2-D Echocardiogr...|2-D ECHOCARDIOGRA...|cardiovascular / ...|
| Suction-assisted...| Bariatrics| null| male| Lipectomy - Abdo...|PREOPERATIVE DIAG...|bariatrics, lipod...|
| Echocardiogram a...| Cardiovascular /...| null| null| 2-D Echocardiogr...|DESCRIPTION:,1. ...|cardiovascular / ...|
| Morbid obesity. ...| Bariatrics| 50| male| Laparoscopic Gas...|PREOPERATIVE DIAG...|bariatrics, morbi...|
| Normal left vent...| Cardiovascular /...| null| male| 2-D Doppler |2-D STUDY,1. Mild...|cardiovascular / ...|
| Cerebral Angiogr...| Neurology| 31| male| Moyamoya Disease |"CC:, Confusion a...| she was found ""...|
另一种方法是映射数据帧并删除“坏行”。但是,如果您要获得几个这样的csv文件,那么这将不是一个非常可扩展的过程
第二种方法是清理
csv
文件本身。在我看来,该文件的选项卡或空间不正确,可能会有问题最后,您可以尝试以下方法
这将消除带有多个换行符的文本内容,这可能是这里的问题
相关问题 更多 >
编程相关推荐