使用Python和pymong的多线程

2024-09-30 10:31:19 发布

您现在位置:Python中文网/ 问答频道 /正文

Hi-im正在寻找一个程序,将积极和消极分类的推文,关于一个公司已经保存在一个mongodb和一旦分类,更新一个整数的结果。在

我有代码使这成为可能,但我想多线程的程序,但我没有这方面的经验在python,一直试图遵循教程没有运气,因为程序只是启动和退出没有通过任何代码。在

如果有人能帮我这个忙,我将不胜感激。程序和预期的多线程代码如下。在

from textblob.classifiers import NaiveBayesClassifier
import pymongo
import datetime
from threading import Thread

train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg'),
(':)', 'pos'),
(':(', 'neg'),
('gr8', 'pos'),
('gr8t', 'pos'),
('lol', 'pos'),
('bff', 'neg'),
]

test = [
'The beer was good.',
'I do not enjoy my job',
"I ain't feeling dandy today.",
"I feel amazing!",
'Gary is a friend of mine.',
"I can't believe I'm doing this.",
]

filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple',   'Google', 'Amazon', 'EBay', 'Diageo',
              'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
              'Investec', 'WWE', 'Time Warner', 'Santander Group']

# Create pos/neg counter variables for each company using dicts
vars = {}
for word in filterKeywords:
vars[word + "SentimentOverall"] = 0


# Initialising the classifier
cl = NaiveBayesClassifier(train)


class TrainingClassification():
    def __init__(self):
        #creating the mongodb connection
        try:
            conn = pymongo.MongoClient('localhost', 27017)
            print "Connected successfully!!!"
            global db
            db = conn.TwitterDB
        except pymongo.errors.ConnectionFailure, e:
            print "Could not connect to MongoDB: %s" % e

        thread1 = Thread(target=self.apple_thread, args=())
        thread1.start()
        thread1.join()
        print "thread finished...exiting"

    def apple_thread(self):
        appleSentimentText = []
        for record in db.Apple.find():
            if record.get('created_at'):
                created_at = record.get('created_at')
                dt = datetime.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')
                if record.get('text') and dt > datetime.today():
                    appleSentimentText.append(record.get("text"))
        for targetText in appleSentimentText:
            classificationApple = cl.classify(targetText)
            if classificationApple == "pos":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] + 1
            elif classificationApple == "neg":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] - 1

Tags: 代码posimport程序forgetisvars
1条回答
网友
1楼 · 发布于 2024-09-30 10:31:19

代码的主要问题是:

thread1.start()
thread1.join()

在线程上调用join时,它的效果是使当前正在运行的线程(在您的例子中是主线程)等待线程完成(这里是thread1)。所以你可以看到你的代码实际上不会更快。它只启动一个线程并等待它。由于线程的创建,它实际上会稍微慢一点。在

以下是执行多线程的正确方法:

^{pr2}$

在这段代码中,线程1和线程2都将并行运行。在

要点:注意,在Python中,它是一种“模拟”并行化。因为Python的内核不是线程安全的(主要是因为它执行垃圾收集的方式),它使用GIL(全局解释器锁),因此一个进程中的所有线程都只在一个内核上运行。 如果您热衷于使用真正的并行化(例如,如果您的两个线程是CPU边界而不是I/O边界),那么看看多处理模块。在

相关问题 更多 >

    热门问题