解析HTML不会输出所需的数据(联邦快递的跟踪信息)

2024-10-03 02:47:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试制作一个脚本,从联邦快递网站获取跟踪信息。在

我想如果我只需转到url'https://www.fedex.com/fedextrack/?tracknumbers='并将跟踪号粘贴在它的末尾,它会将我带到包含我所需信息的跟踪页面。在

我试着给URL输入跟踪号,并从响应中解析html。在

这就是我试过的。在

import urllib

url_prefix= 'https://www.fedex.com/fedextrack/?tracknumbers='
tracking_number = '570573906561'
url = url_prefix + tracking_number
sock = urllib.urlopen(url) htmlSource = sock.read()
sock.close()
print htmlSource

此代码输出: http://freetexthost.com/iy1ma2q1fm

我以为我可以搜索输出中的文本并找到交付状态/日期,但它不在这个输出中。在

如果要查看Chrome页面上的交货日期和日期, 所以如果我在Chrome控制台中运行:

^{pr2}$

它返回我想要的输出(交货日期)

为什么我的python脚本没有在html输出中打印实际的跟踪数据信息或类?在

我试着搜索这个问题,并尝试了几种不同的解析方式(Mechanize、beautifulsoup、html2text),但所有这些都给了我相同的输出,不包含任何关于货物的实际数据。在


Tags: https脚本com信息urlprefixhtmlwww
3条回答

联邦快递的IE站点在另一个站点的IFrame中返回网页。你不能用Iframe跨站点获取信息。因此,请执行以下操作。 可以将以下xml传输到: https://ws.fedex.com:443/web-services

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:v10="http://fedex.com/ws/track/v10">
<soapenv:Header/>
<soapenv:Body>
<v10:TrackRequest>
<v10:WebAuthenticationDetail>
<v10:ParentCredential>
<v10:Key>productionkey</v10:Key>
<v10:Password>productionpassword</v10:Password>
</v10:ParentCredential>
<v10:UserCredential>
<v10:Key>productionkey</v10:Key>
<v10:Password>productionpassword</v10:Password>
</v10:UserCredential>
</v10:WebAuthenticationDetail>
<v10:ClientDetail>
<v10:AccountNumber>accountnumber</v10:AccountNumber>
<v10:MeterNumber>meternumber</v10:MeterNumber>
<v10:IntegratorId/>
<v10:Localization>
<v10:LanguageCode>EN</v10:LanguageCode>
<v10:LocaleCode>us</v10:LocaleCode>
</v10:Localization>
</v10:ClientDetail>
<v10:TransactionDetail>
<v10:CustomerTransactionId>Ground Track By Number</v10:CustomerTransactionId>
<v10:Localization>
<v10:LanguageCode>EN</v10:LanguageCode>
<v10:LocaleCode>us</v10:LocaleCode>
</v10:Localization>
</v10:TransactionDetail>
<v10:Version>
<v10:ServiceId>trck</v10:ServiceId>
<v10:Major>10</v10:Major>
<v10:Intermediate>0</v10:Intermediate>
<v10:Minor>0</v10:Minor>
</v10:Version>
<v10:SelectionDetails>
<v10:CarrierCode>FDXG</v10:CarrierCode>
<v10:PackageIdentifier>
<v10:Type>TRACKING_NUMBER_OR_DOORTAG</v10:Type>
<v10:Value>$WAYBILL$</v10:Value>
</v10:PackageIdentifier>
</v10:SelectionDetails>
<v10:ProcessingOptions>INCLUDE_DETAILED_SCANS</v10:ProcessingOptions>
</v10:TrackRequest>
</soapenv:Body>
</soapenv:Envelope>

Use the following VBA Code to transmit and it will return the tracking info:

Public Function ReturnXMLResponse(ByVal XML_Method As Variant, _
                              ByVal XML_Track_URL As Variant, _
                              ByVal XML_Request As Variant, _
                     Optional ByVal WaybillNum As String = "", _
                     Optional ByVal CarrierName As String = "", _
                     Optional ByVal TotalWaybills As Long = 0, _
                     Optional ByVal XML_Chunks As Long = 1) As String

' Passed expressions to this function have to be Variant, as some arguments
' may be passed as Null which would result in a type conversion failure.

' If True Then Exit Function
ReturnXMLResponse = "Test" ' default if not supported or not tracked by request
If UCase(XML_Track_URL) <> "NOT SUPPORTED" And UCase(XML_Track_URL) <> "NOT TRACKED BY REQUEST" Then
    If (WaybillNum <> "") And (CarrierName <> "") Then
        TrackingCounter = TrackingCounter + (1 / XML_Chunks)
        SBText = "Tracking: " & CarrierName & ":" & WaybillNum
        If TotalWaybills <> 0 Then SBText = SBText & " (" & CLng(TrackingCounter) & "/" & TotalWaybills & ") [" & (TrackingCounter / TotalWaybills) * 100 & "%]"
        SBText = SBText & "."
        Application.SysCmd acSysCmdSetStatus, SBText
    End If
    Set XMLHTTP = CreateObject("Microsoft.xmlhttp")
    If (WaybillNum <> "") And (CarrierName <> "") Then
        SBText = SBText & "."
        Application.SysCmd acSysCmdSetStatus, SBText
    End If
    XMLHTTP.Open XML_Method, XML_Track_URL, False
    If (WaybillNum <> "") And (CarrierName <> "") Then
        SBText = SBText & "."
        Application.SysCmd acSysCmdSetStatus, SBText
    End If
    XMLHTTP.Send XML_Request ' okay to send blank string, if not needed
    If (WaybillNum <> "") And (CarrierName <> "") Then
        SBText = SBText & "."
        Application.SysCmd acSysCmdSetStatus, SBText
    End If
    ReturnXMLResponse = Cstr(XMLHttp.ResponseText)
    End If
    If ReturnXMLResponse = "" Then ReturnXMLResponse = "Nothing"
    End Function

    Basically XMLHTTP.Send XML_Request

    'XMLHTTP.Send = Sending the XML_Request which is the soap envelope     
    above.  It 'then returns the valid XML.

这就是我最后得到的,多亏了@Blender

import requests
import json

daysdict = {1:31,2:28,3:31,4:31,5:31,6:30,7:31,8:31,9:30,10:31,11:30,12:31}
def days_in_month(month):
    for key, value in daysdict.iteritems():
        if key == month:
            number_of_days = value
    return number_of_days




def build_output(tracking_number):

    data = requests.post('https://www.fedex.com/trackingCal/track', data={
        'data': json.dumps({
            'TrackPackagesRequest': {
                'appType': 'wtrk',
                'uniqueKey': '',
                'processingParameters': {
                    'anonymousTransaction': True,
                    'clientId': 'WTRK',
                    'returnDetailedErrors': True,
                    'returnLocalizedDateTime': False
                },
                'trackingInfoList': [{
                    'trackNumberInfo': {
                        'trackingNumber': tracking_number,
                        'trackingQualifier': '',
                        'trackingCarrier': ''
                    }
                }]
            }
        }),
        'action': 'trackpackages',
        'locale': 'en_US',
        'format': 'json',
        'version': 99
    }).json()

    return data

# finds delivery date info

ship_arrival_key = 'displayActDeliveryDateTime'
ship_time_key = 'displayShipDt'



def track(tracking_number):

    data = build_output(tracking_number)
     #narrowing down dictionary and lists to objects needed (ship day,arrival)
    for key, value in data.iteritems():
        narrow = value 
    #narrow more into packageList list
    for key, value in narrow.iteritems():
        if key == 'packageList':
            narrow = value
    # narrow to ship start value
    for x, y in narrow[0].iteritems():
        if x == ship_arrival_key:
            ship_arival_value = y
            exists = True

    # also find ship arrival
        elif x == ship_time_key:
            ship_time_value = y
            exists = True
    # list with two items shiptime and shiparrival

    return  ship_time_value, ship_arival_value, exists


def print_results(tracking_number):
    to_fro = track(tracking_number)
    if to_fro[2] == True:
        try:
            daysinmonth = days_in_month(int(to_fro[0][0]))
            try:
                if to_fro[0][0] != to_fro[1][0]:

                    ship_days = str(    (int(daysinmonth) - int(str((to_fro[0][2]))+str((to_fro[0][3])))  + int(to_fro[1][3])) )

                    print '_____________________'
                    print 'Shipped: ' + to_fro[0]
                    print 'Arrived: ' + to_fro[1]
                    print '_____________________'
                    print '\nShipping took:' +"     "  +ship_days  
                else:
                    ship_days = int(to_fro[1][2] + to_fro[1][3]) - int(to_fro[0][2] + to_fro[0][3])
                    print '_____________________'
                    print 'Shipped: ' + to_fro[0]
                    print 'Arrived: ' + to_fro[1]
                    print '_____________________'
                    print  '\nShipping took:' +"    " +  str(ship_days)  
            except IndexError:
                print 'Invalid Tracking Number'
                pass
        except IndexError:
            pass
    else:
        pass

def raw_results(tracking_number):
    to_fro = track(tracking_number)
    if to_fro[2] == True:
        daysinmonth = days_in_month(int(track(tracking_number)[0][0]))
        try:
            if to_fro[0][0] != to_fro[1][0]:

                ship_days = str(    (int(daysinmonth) - int(str((to_fro[0][2]))+str((to_fro[0][3])))  + int(to_fro[1][3])) )
            else:
                ship_days = int(to_fro[1][2] + to_fro[1][3]) - int(to_fro[0][2] + to_fro[0][3])
        except IndexError:
            print 'Invalid Tracking Number'
            pass
    else:
        pass

    return ship_days



#print_results(499552080632881)

像许多其他网站一样,没有JavaScript就无法正常工作。它向某个URL发送一个httppost请求,然后URL将跟踪数据作为JSON编码的对象返回。在

您需要使用Python来模拟:

import requests
import json

tracking_number = '570573906561'

data = requests.post('https://www.fedex.com/trackingCal/track', data={
    'data': json.dumps({
        'TrackPackagesRequest': {
            'appType': 'wtrk',
            'uniqueKey': '',
            'processingParameters': {
                'anonymousTransaction': True,
                'clientId': 'WTRK',
                'returnDetailedErrors': True,
                'returnLocalizedDateTime': False
            },
            'trackingInfoList': [{
                'trackNumberInfo': {
                    'trackingNumber': tracking_number,
                    'trackingQualifier': '',
                    'trackingCarrier': ''
                }
            }]
        }
    }),
    'action': 'trackpackages',
    'locale': 'en_US',
    'format': 'json',
    'version': 99
}).json()

然后处理生成的对象:

^{pr2}$

相关问题 更多 >