擅长:python、mysql、java
<p>我使用ApacheSpark2.4.4和Hadoop2.7及更高版本。
以下是最终对我有用的代码:</p>
<pre><code>from pyspark import SparkContext, SparkConf, SQLContext
appName = "PySpark SQL Server Example - via JDBC"
master = "local"
conf = SparkConf() \
.setAppName(appName) \
.setMaster(master) \
.set("spark.driver.extraClassPath","mssql-jdbc-7.4.1.jre8.jar")
sc = SparkContext.getOrCreate(conf=conf)
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession
hostname = "localhost"
database = "HumanResources"
port = "1433"
table = "dbo.Employee"
user = "sa"
password = "Dedo9090"
jdbcDF = spark.read.format("jdbc") \
.option("url", f"jdbc:sqlserver://ILI-LAB-HRVOJE;databaseName={database}") \
.option("dbtable", table) \
.option("user", user) \
.option("password", password) \
.load()
jdbcDF.head(50)
</code></pre>
<p>如果访问SQL server仍有问题,请查看TCP/IP是否已按建议启用<a href="https://stackoverflow.com/questions/9138172/enable-tcp-ip-remote-connections-to-sql-server-express-already-installed-databas">here</a>,并确保防火墙未阻止对MS SQL server正在侦听的1433端口的访问。
最后不是密码中不支持字符的问题。你知道吗</p>