使用TabPy的单词云问题的回答

使用TabPy的单词云

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我在R工作过一个项目，完成了一些非常类似的事情。这里有一个视频示例，展示了概念验证（没有音频）<a href="https://www.screencast.com/t/xa0yemiDPl" rel="nofollow noreferrer">https://www.screencast.com/t/xa0yemiDPl</a> 它本质上显示了使用Tableau交互检查选定国家的word cloud中葡萄酒描述的最终状态。主要组成部分是： <ul> <li>让Tableau连接到要分析的数据，以及一个占位符数据集，该数据集包含您希望从Python/R代码返回的记录数（Tableau对Python/R的调用期望返回它发送给处理的相同数量的记录……如果您发送文本数据，但处理它以返回更多记录，这可能会有问题，就像word cloud示例中的情况一样）</li> <li>让Python/R代码连接到您的数据，并在单个向量中返回单词和频率计数，用分隔符分隔（Tableau对单词云需要什么）</li> <li>使用Tableau计算字段拆分单个向量</li> <li>利用参数操作选择要传递给Python/R代码的参数值</li> </ul> 高层概述 <img src="https://www.websequencediagrams.com/cgi-bin/cdraw?lz=cGFydGljaXBhbnQgIlJvdy1Ib2xkZXJcbiBEYXRhc2V0IiBhcyBoABAFCgAgDVdpbmUAGA13aW5lZGF0YQAZDlRhYmxlYXUAQgV0AAYGADoOQW5hbHl0aWMgRXh0ZW5zaW9uXG4oUiwgUHl0aG9uLCBldGMpAIEABWV4dHN2YwBPD2V4dCAAPQVzaXMAgS4FXG4oZm9yIHdvcmQgY2xvdWQANwZ3b3JkAAoFAIEdBQoAgUsGIC0-AIEMCDogY29ubmVjdCByb3ctABgHZGF0YSAKbm90ZSByaWdodCBvZgCCAwggIGNvbnRhaW5zIGEgc2luZ2xlIGNvbHVtbiAtIHJvd251bQogIHNlcnZlcyBhcyBhIHBsYWNlAHEHCiAgZm9yIHRoZSBYIHJvd3MgSSB3YW50IHRvIHJldHVybgplbmQgbm90ZQphY3RpdmF0ZQCCNQkAgmAIAIEpFXdpbmUAgTMFAIEoDwCDFAkgAHEFAIFVBXRvIGFuYWx5emUgCiAgKG5vdACBDQV0ZXh0AIF2BSkAewoKIACDfQcAgiYNdXNlIHRoaXMAawYgICAgAIEcESAgIACDbwggLT4Agz0HOiB0byBjYWxsAIF9BQCAfwV0aWMgZQCDdwogKACDdwhSLC4uLikATw4AhAMHICAgICAgIACEEgcgLT4Ag1gOOiBwcm9jZXMAgRkGIGludG86XG4gV29yZCBhbmQgRnJlcXVlbmN5IFxuAIIKBmFibGUATQkAhCUNIC0AgTILAIMcBgBEBSwAOwsAMRMAgRIIAIRcDAA3BwCELQd2ZWN0b3IvAIQ1BlxuIGZvcm1hdCBhcwCBIwV-AIEbCQCCWQVkZQCBbxQAglQQAIVJCXB1bGwAgWkGdmFsdWUgZnJvbQBNCgCCNgkAhFEQAEsGAGEPAIM5FwBeDgCCSAoAD2JkaXNwbGF5AIdeC1xuICh1c2luZyBuZXcAg1MUZmllbGRzAIRGBgCBRDUAgXkXCgo&s=modern-blue" alt="overview"/> 表格计算字段-[R字+频率]： <pre><code>Script_Str(' print("STARTING NEW SCRIPT RUN") print(Sys.time()) print(.arg2) # grouping print(.arg1) # selected country # TEST VARIABLE (non-prod) .MaxSourceDataRecords = 1000 # -1 to disable # TABLEAU PARAMETER VARIABLES .country = "' + [Country Parameter] + '" .wordsToReturn = ' + str([Return Top N Words]) + ' #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^# # VARIABLES DERIVED FROM TABLEAU PARAMETER VALUES .countryUseAll = (.country == "All") print(.countryUseAll) #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^# #setwd("C:/Users/jbelliveau/....FILL IN HERE...") .fileIn = ' + [Source Data Path] + ' #.fileOut = "winemag-with-DTM.csv" #install.packages("wordcloud") #install.packages("RColorBrewer") # not needed if installed wordcloud package library(tm) library(wordcloud) library(RColorBrewer) # color package (maps or wordclouds) wineAll = read.csv(.fileIn, stringsAsFactors=FALSE) # TODO separately... polarity # use all the data or just the parameter selected print(.countryUseAll) if ( .countryUseAll ) { wine = wineAll # filter down to parameter passed from Tableau }else{ wine = wineAll[c(wineAll$country == .country),] # filter down to parameter passed from Tableau } # limited data for speed (NOT FOR PRODUCTION) if( .MaxSourceDataRecords > 0 ){ print("limiting the number of records to use from input data") wine = head(wine, .MaxSourceDataRecords) } corpus = Corpus(VectorSource(wine$description)) corpus = tm_map(corpus, tolower) #corpus = tm_map(corpus, PlainTextDocument) # https://stackoverflow.com/questions/32523544/how-to-remove-error-in-term-document-matrix-in-r/36161902 corpus = tm_map(corpus, removePunctuation) corpus = tm_map(corpus, removeWords, stopwords("English")) #length(corpus) dtm = DocumentTermMatrix(corpus) #?sample mysample = dtm # no sampling (used Head on data read... for speed/simplicity on this example) #mysample <- dtm[sample(1:nrow(dtm), 5000, replace=FALSE),] #nrow(mysample) wineSample = as.data.frame(as.matrix(mysample)) # column names (the words) # use colnames to get a vector of the words #colnames(wineSample) # freq of words # colSums to get the frequency of the words #wineWordFreq = colSums(wineSample) # structure in a way Tableau will like it wordCloudData = data.frame(words=colnames(wineSample), freq=colSums(wineSample)) str(wordCloudData) # sort by word freq wordCloudDataSorted = wordCloudData[order(-wordCloudData$freq),] # join together by ~ for processing once Tableau gets it wordAndFreq = paste(wordCloudDataSorted[, 1], wordCloudDataSorted[, 2], sep = "~") #write.table(wordCloudData, .fileOut, sep=",",row.names=FALSE) # if needed for performance refactors topWords = head(wordAndFreq, .wordsToReturn) #print(topWords) return( topWords ) ', Max([Country Parameter]) , MAX([RowNum]) // for testing the grouping being sent to R ) </code></pre> Tableau单词值的计算字段： <pre><code>// grab the first token to the left of ~ Left([R Words+Freq], Find([R Words+Freq],"~") - 1) </code></pre> 表格频率值的计算字段： <pre><code>INT(REPLACE([R Words+Freq],[Word]+"~","")) </code></pre> 如果您不熟悉Tableau，您可能希望与公司的Tableau分析师一起工作，他们将帮助您创建计算字段并配置Tableau以连接到Tabby

使用TabPy的单词云

1 个回答

相关Python问题