爬网数据时语法无效？ - 问答 - Python中文网

爬网数据时语法无效？

2024-10-06 11:31:13 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

当我在twitter上运行爬网数据的代码时

from tqdm import tqdm
from bs4 import BeautifulSoup as bs
import re, csv
def html2csv(fData, fHasil, full=True):
    urlPattern=re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
    print('Loading Data: ', flush = True)
    Tweets, Username, waktu, replies, retweets, likes, Language, urlStatus =  [], [], [], [], [], [], [], []
    soup = bs(open(fData,encoding='utf-8', errors = 'ignore', mode='r'),'html.parser')
    data = soup.find_all('li', class_= 'stream-item')
for i,t in tqdm(enumerate(data)):
    T = t.find_all('p',class_='TweetTextSize')[0] # Loading tweet
    Tweets.append(bs(str(T),'html.parser').text)
    U = t.find_all('span',class_='username')
    Username.append(bs(str(U[0]),'html.parser').text)
    T = t.find_all('a',class_='tweet-timestamp')[0]# Loading Time
    waktu.append(bs(str(T),'html.parser').text)
    RP = t.find_all('span',class_='ProfileTweet-actionCountForAria')[0]# Loading reply, retweet & Likes
    replies.append(int((bs(str(RP), "lxml").text.split()[0]).replace('.','').replace(',','')))
    RT = t.find_all('span',class_='ProfileTweet-actionCountForAria')[1]
    RT = int((bs(str(RT), "lxml").text.split()[0]).replace('.','').replace(',',''))
    retweets.append(RT)
    L  = t.find_all('span',class_='ProfileTweet-actionCountForAria')[2]
    likes.append(int((bs(str(L), "lxml").text.split()[0]).replace('.','').replace(',','')))

try:# Loading Bahasa
    L = t.find_all('span',class_='tweet-language')
    Language.append(bs(str(L[0]), "lxml").text)
    except:
        Language.append('')
        url = str(t.find_all('small',class_='time')[0])
        try:
            url = re.findall(urlPattern,url)[0]
            except:
                try:
                    mulai, akhir = url.find('href="/')+len('href="/'), url.find('" title=')
                    url = 'https://twitter.com/' + url[mulai:akhir]
                    except:
                        url = ''
                        urlStatus.append(url)
                        print('Saving Data to "%s" ' %fHasil, flush = True)
                        dfile = open(fHasil, 'w', encoding='utf-8', newline='')
                        if full:
                            dfile.write('Time, Username, Tweet, Replies, Retweets, Likes, Language, urlStatus\n')
                            with dfile:
                                writer = csv.writer(dfile)
                                for i,t in enumerate(Tweets):
                                    writer.writerow([waktu[i],Username[i],t,replies[i],retweets[i],likes[i],Language[i],urlStatus[i]])
                                    else:
                                        with dfile:
                                            writer = csv.writer(dfile)
                                            for i,t in enumerate(Tweets):
                                                writer.writerow([Username[i],t])
                                                dfile.close()
                                                print('All Finished', flush = True)

我犯了这个错误

File "<ipython-input-4-4a19b18dc90d>", line 27
    except:
         ^
SyntaxError: invalid syntax

}

Tags： text url bs username all find language replace

1条回答

网友

1楼 · 发布于 2024-10-06 11:31:13

在Python中，缩进用于分隔代码块。这与许多其他使用大括号{}来分隔块（如Java、Javascript和C）的语言不同。因此，Python用户必须密切关注何时以及如何缩进代码，因为空格很重要

当Python遇到程序缩进问题时，它会引发名为IndentationError或TabError的异常。[1]

在您的情况下，这是一个问题：

try:
    print(x)
    except:              # wrong indentation 
      print("An exception occurred")

你可以这样简单地修复它：

try:
  print(x)
except:         # correct, try and catch stay at the same level
  print("An exception occurred")

希望这有帮助。祝你好运

相关问题更多 >

编程相关推荐

热门问题

热门文章