擅长:python、mysql、java
<p><strong>在Jupyter笔记本电脑-Windows中运行pySpark</strong></p>
<p>JAVA8:<a href="https://www.guru99.com/install-java.html" rel="nofollow noreferrer">https://www.guru99.com/install-java.html</a></p>
<p>阿纳康达:<a href="https://www.anaconda.com/distribution/" rel="nofollow noreferrer">https://www.anaconda.com/distribution/</a></p>
<p>jupyter中的Pyspark:<a href="https://changhsinlee.com/install-pyspark-windows-jupyter/" rel="nofollow noreferrer">https://changhsinlee.com/install-pyspark-windows-jupyter/</a></p>
<pre><code>import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
spark = SparkSession.builder.appName('test').getOrCreate()
data = [(1, "siva", 100), (2, "siva2", 200),(3, "siva3", 300),(4, "siva4", 400),(5, "siva5", 500)]
schema = ['id', 'name', 'sallary']
df = spark.createDataFrame(data, schema=schema)
df.show()
</code></pre>