Hallo zusammen.
Ich habe mittels HTTPMOD Daten aus einer Webseite extrahiert und bekommen fortlaufend nummerierte Readings.
Internals:
BUSY 0
DEF https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes& 300
FUUID 5fbe4274-f33f-3c55-22f9-0d007f3db25a0338
Interval 300
MainURL https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
ModuleVersion 4.0.12 - 24.10.2020
NAME Florensstrasse
NOTIFYDEV global
NR 251
NTFY_ORDER 50-Florensstrasse
STATE Bus 723 Eller Mitte S-Bahnhof, Düsseldorf 20:21 (20:22)
TYPE HTTPMOD
value
CompiledRegexes:
HTTPCookieHash:
DB4-pb-bibe-history;:
Name DB4-pb-bibe-history
Options expires=Sunday, 13-Dec-2020 00:00:01 GMT; Domain=.bahn.de; Path=/; Version=1
Path
Value Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26
HttpUtils:
NAME
addr https://reiseauskunft.bahn.de:443
auth 0
code 200
compress 1
conn
data
displayurl https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
header Cookie: DB4-pb-bibe-history=Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26
host reiseauskunft.bahn.de
httpheader HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 03 Dec 2020 19:21:54 GMT
Server: Apache
Set-Cookie: DB4-pb-bibe-history=Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26; expires=Sunday, 13-Dec-2020 00:00:01 GMT; Domain=.bahn.de; Path=/; Version=1
Connection: Close
httpversion 1.0
hu_blocking 0
hu_filecount 1
hu_port 443
hu_portSfx
ignoreredirects 1
loglevel 4
path /bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
protocol https
redirects 0
timeout 2
url https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
sslargs:
OLDREADINGS:
QUEUE:
READINGS:
2020-12-03 20:21:55 departure_1_delay 20:22
2020-12-03 20:21:55 departure_1_destination Eller Mitte S-Bahnhof, Düsseldorf
2020-12-03 20:21:55 departure_1_product Bus 723
2020-12-03 20:21:55 departure_1_time 20:21
2020-12-03 20:21:55 departure_2_delay 20:42
2020-12-03 20:21:55 departure_2_destination Plange Mühle, Düsseldorf
2020-12-03 20:21:55 departure_2_product Bus 723
2020-12-03 20:21:55 departure_2_time 20:42
2020-12-03 20:21:55 departure_3_delay 21:02
2020-12-03 20:21:55 departure_3_destination Eller Mitte S-Bahnhof, Düsseldorf
2020-12-03 20:21:55 departure_3_product Bus 723
2020-12-03 20:21:55 departure_3_time 21:01
2020-12-03 20:21:55 departure_4_delay 21:22
2020-12-03 20:21:55 departure_4_destination Eller Mitte S-Bahnhof, Düsseldorf
2020-12-03 20:21:55 departure_4_product Bus 723
2020-12-03 20:21:55 departure_4_time 21:21
2020-12-03 20:21:55 departure_5_delay 21:22
2020-12-03 20:21:55 departure_5_destination Plange Mühle, Düsseldorf
2020-12-03 20:21:55 departure_5_product Bus 723
2020-12-03 20:21:55 departure_5_time 21:22
REQUEST:
context reading
data
header
ignoreredirects 0
num 0
retryCount 0
type update
url https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
defptr:
readingBase:
departure_1_delay reading
departure_1_destination reading
departure_1_product reading
departure_1_time reading
departure_2_delay reading
departure_2_destination reading
departure_2_product reading
departure_2_time reading
departure_3_delay reading
departure_3_destination reading
departure_3_product reading
departure_3_time reading
departure_4_delay reading
departure_4_destination reading
departure_4_product reading
departure_4_time reading
departure_5_delay reading
departure_5_destination reading
departure_5_product reading
departure_5_time reading
reading reading
readingNum:
departure_1_delay 01
departure_1_destination 01
departure_1_product 01
departure_1_time 01
departure_2_delay 01
departure_2_destination 01
departure_2_product 01
departure_2_time 01
departure_3_delay 01
departure_3_destination 01
departure_3_product 01
departure_3_time 01
departure_4_delay 01
departure_4_destination 01
departure_4_product 01
departure_4_time 01
departure_5_delay 01
departure_5_destination 01
departure_5_product 01
departure_5_time 01
reading 01
readingOutdated:
requestReadings:
get1:
departure_1_delay reading 01-4
departure_1_destination reading 01-2
departure_1_product reading 01-1
departure_1_time reading 01-3
departure_2_delay reading 01-8
departure_2_destination reading 01-6
departure_2_product reading 01-5
departure_2_time reading 01-7
departure_3_delay reading 01-12
departure_3_destination reading 01-10
departure_3_product reading 01-9
departure_3_time reading 01-11
departure_4_delay reading 01-16
departure_4_destination reading 01-14
departure_4_product reading 01-13
departure_4_time reading 01-15
departure_5_delay reading 01-20
departure_5_destination reading 01-18
departure_5_product reading 01-17
departure_5_time reading 01-19
reading reading 01
update:
departure_1_delay reading 01-4
departure_1_destination reading 01-2
departure_1_product reading 01-1
departure_1_time reading 01-3
departure_2_delay reading 01-8
departure_2_destination reading 01-6
departure_2_product reading 01-5
departure_2_time reading 01-7
departure_3_delay reading 01-12
departure_3_destination reading 01-10
departure_3_product reading 01-9
departure_3_time reading 01-11
departure_4_delay reading 01-16
departure_4_destination reading 01-14
departure_4_product reading 01-13
departure_4_time reading 01-15
departure_5_delay reading 01-20
departure_5_destination reading 01-18
departure_5_product reading 01-17
departure_5_time reading 01-19
reading reading 01
Attributes:
get1Name Update
reading01-10Name departure_3_destination
reading01-11Name departure_3_time
reading01-12Name departure_3_delay
reading01-13Name departure_4_product
reading01-14Name departure_4_destination
reading01-15Name departure_4_time
reading01-16Name departure_4_delay
reading01-17Name departure_5_product
reading01-18Name departure_5_destination
reading01-19Name departure_5_time
reading01-1Name departure_1_product
reading01-20Name departure_5_delay
reading01-2Name departure_1_destination
reading01-3Name departure_1_time
reading01-4Name departure_1_delay
reading01-5Name departure_2_product
reading01-6Name departure_2_destination
reading01-7Name departure_2_time
reading01-8Name departure_2_delay
reading01-9Name departure_3_product
reading01Name reading
reading01OExpr {$val =~ s/<br\/><span class="delay.*">//g; $val =~ s/<\/span>.*//g; $val =~ s/.* \;.*//g; $val =~ s/, <span.*//g; $val =~ s/\(\;/\(/g; $val =~ s/\)\;/\)/g; $val =~ s/ü\;/ü/g; $val =~ s/ö\;/ö/g; $val =~ s/ä\;/ä/g; $val =~ s/ß\;/ß/g; $val;}
reading01RegOpt gm
reading01Regex <span class="bold">(.*)<\/span>\s<\/a>[\w\W]&[gl]t;&[gl]t;\s(.*)\s<br \/>[\w\W]<span class="bold">(\d\d:\d\d)<\/span>(.*)<\/div>
room Nahverkehr
stateFormat departure_1_product departure_1_destination departure_1_time (departure_1_delay)
userattr get1Name reading01-10Name reading01-11Name reading01-12Name reading01-13Name reading01-14Name reading01-15Name reading01-16Name reading01-17Name reading01-18Name reading01-19Name reading01-1Name reading01-20Name reading01-2Name reading01-3Name reading01-4Name reading01-5Name reading01-6Name reading01-7Name reading01-8Name reading01-9Name reading01Name reading01OExpr reading01RegOpt reading01Regex
Diese würde ich gerne in einem Reading zusammenfassen. Meine Versuche mittels reading01RecombineExpr ergab allerdings nur eine Zusammenfassung des ersten Blocks. Hier die Nummer reading01-1 bis reading01-4. Ich hätte gerne alle 20 Readings zusammengefasst. Kann mir jemand helfen.
Gruß
Roman
Wie hast Du recombineExpr geschrieben?
Was ergibt wie im CommandRef
reading01RecombineExpr join ",", @matchlist
Ich bin mir nicht sicher, das man gleichzeitig reading01-NNNames und recombineExpr benutzen kann. Das Ziel von recombineExpr ist eh die matches zu kombinieren.
Ja das ist richtig. Bei Einsatz des -reading01RecombineExpr join ",", @matchlist- wird nur das reading reading aktualisiert. Allerdings nur mit des ersten 4 Ergebnissen des Regex.
Nachtrag: Beim ersten Update nach dem das Recombine zum Einsatz kommt, werden die Readings korrekt zusammengebaut. Danach werden nur die ersten 4 readings zusammengeführt. Hatte ich bisher übersehen.
Leider funktioniert dieses Funktion nur zufällig (siehe List). Mal gehts, mal gehts nicht. Vielleicht kann der Entwickler ja noch mal drüber schauen.
Internals:
BUSY 0
DEF https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes& 300
FUUID 5fbe4274-f33f-3c55-22f9-0d007f3db25a0338
Interval 300
MainURL https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
ModuleVersion 4.0.12 - 24.10.2020
NAME Florensstrasse
NOTIFYDEV global
NR 251
NTFY_ORDER 50-Florensstrasse
STATE Bus 723 Plange Mühle, Düsseldorf 09:34 (09:34)
TYPE HTTPMOD
value
CompiledRegexes:
HTTPCookieHash:
DB4-pb-bibe-history;:
Name DB4-pb-bibe-history
Options expires=Monday, 14-Dec-2020 00:00:01 GMT; Domain=.bahn.de; Path=/; Version=1
Path
Value Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26
HttpUtils:
NAME
addr https://reiseauskunft.bahn.de:443
auth 0
code 200
compress 1
conn
data
displayurl https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
header Cookie: DB4-pb-bibe-history=Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26
host reiseauskunft.bahn.de
httpheader HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 04 Dec 2020 08:21:55 GMT
Server: Apache
Set-Cookie: DB4-pb-bibe-history=Location1%3DA%3D1%40O%3DFlorensstra%DFe%2C%20D%FCsseldorf%40X%3D6741838%40Y%3D51204508%40U%3D80%40L%3D663633%40%26; expires=Monday, 14-Dec-2020 00:00:01 GMT; Domain=.bahn.de; Path=/; Version=1
Connection: Close
httpversion 1.0
hu_blocking 0
hu_filecount 1
hu_port 443
hu_portSfx
ignoreredirects 1
loglevel 4
path /bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
protocol https
redirects 0
timeout 2
url https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
sslargs:
OLDREADINGS:
QUEUE:
READINGS:
2020-12-04 09:16:55 departure_1_delay 09:34
2020-12-04 09:16:55 departure_1_destination Plange Mühle, Düsseldorf
2020-12-04 09:16:55 departure_1_product Bus 723
2020-12-04 09:16:55 departure_1_time 09:34
2020-12-04 09:16:55 departure_2_delay 09:41
2020-12-04 09:16:55 departure_2_destination Eller Mitte S-Bahnhof, Düsseldorf
2020-12-04 09:16:55 departure_2_product Bus 723
2020-12-04 09:16:55 departure_2_time 09:40
2020-12-04 09:16:55 departure_3_delay 10:05
2020-12-04 09:16:55 departure_3_destination Plange Mühle, Düsseldorf
2020-12-04 09:16:55 departure_3_product Bus 723
2020-12-04 09:16:55 departure_3_time 10:04
2020-12-04 09:16:55 departure_4_delay 10:11
2020-12-04 09:16:55 departure_4_destination Eller Mitte S-Bahnhof, Düsseldorf
2020-12-04 09:16:55 departure_4_product Bus 723
2020-12-04 09:16:55 departure_4_time 10:10
2020-12-04 09:16:55 departure_5_delay 10:35
2020-12-04 09:16:55 departure_5_destination Plange Mühle, Düsseldorf
2020-12-04 09:16:55 departure_5_product Bus 723
2020-12-04 09:16:55 departure_5_time 10:34
2020-12-04 09:21:55 reading Bus 723,Plange Mühle, Düsseldorf,09:34,10:35
REQUEST:
context reading
data
header
ignoreredirects 0
num 0
retryCount 0
type update
url https://reiseauskunft.bahn.de/bin/bhftafel.exe/dox?si=663633&bt=dep&p=1111111111&max=5&rt=1&use_realtime_filter=1&start=yes&
defptr:
readingBase:
departure_1_delay reading
departure_1_destination reading
departure_1_product reading
departure_1_time reading
departure_2_delay reading
departure_2_destination reading
departure_2_product reading
departure_2_time reading
departure_3_delay reading
departure_3_destination reading
departure_3_product reading
departure_3_time reading
departure_4_delay reading
departure_4_destination reading
departure_4_product reading
departure_4_time reading
departure_5_delay reading
departure_5_destination reading
departure_5_product reading
departure_5_time reading
reading reading
readingNum:
departure_1_delay 01
departure_1_destination 01
departure_1_product 01
departure_1_time 01
departure_2_delay 01
departure_2_destination 01
departure_2_product 01
departure_2_time 01
departure_3_delay 01
departure_3_destination 01
departure_3_product 01
departure_3_time 01
departure_4_delay 01
departure_4_destination 01
departure_4_product 01
departure_4_time 01
departure_5_delay 01
departure_5_destination 01
departure_5_product 01
departure_5_time 01
reading 01
readingOutdated:
requestReadings:
get1:
departure_1_delay reading 01-4
departure_1_destination reading 01-2
departure_1_product reading 01-1
departure_1_time reading 01-3
departure_2_delay reading 01-8
departure_2_destination reading 01-6
departure_2_product reading 01-5
departure_2_time reading 01-7
departure_3_delay reading 01-12
departure_3_destination reading 01-10
departure_3_product reading 01-9
departure_3_time reading 01-11
departure_4_delay reading 01-16
departure_4_destination reading 01-14
departure_4_product reading 01-13
departure_4_time reading 01-15
departure_5_delay reading 01-20
departure_5_destination reading 01-18
departure_5_product reading 01-17
departure_5_time reading 01-19
reading reading 01
update:
departure_1_delay reading 01-4
departure_1_destination reading 01-2
departure_1_product reading 01-1
departure_1_time reading 01-3
departure_2_delay reading 01-8
departure_2_destination reading 01-6
departure_2_product reading 01-5
departure_2_time reading 01-7
departure_3_delay reading 01-12
departure_3_destination reading 01-10
departure_3_product reading 01-9
departure_3_time reading 01-11
departure_4_delay reading 01-16
departure_4_destination reading 01-14
departure_4_product reading 01-13
departure_4_time reading 01-15
departure_5_delay reading 01-20
departure_5_destination reading 01-18
departure_5_product reading 01-17
departure_5_time reading 01-19
reading reading 01
Attributes:
get1Name Update
reading01-10Name departure_3_destination
reading01-11Name departure_3_time
reading01-12Name departure_3_delay
reading01-13Name departure_4_product
reading01-14Name departure_4_destination
reading01-15Name departure_4_time
reading01-16Name departure_4_delay
reading01-17Name departure_5_product
reading01-18Name departure_5_destination
reading01-19Name departure_5_time
reading01-1Name departure_1_product
reading01-20Name departure_5_delay
reading01-2Name departure_1_destination
reading01-3Name departure_1_time
reading01-4Name departure_1_delay
reading01-5Name departure_2_product
reading01-6Name departure_2_destination
reading01-7Name departure_2_time
reading01-8Name departure_2_delay
reading01-9Name departure_3_product
reading01Name reading
reading01OExpr {$val =~ s/<br\/><span class="delay.*">//g; $val =~ s/<\/span>.*//g; $val =~ s/.* \;.*//g; $val =~ s/, <span.*//g; $val =~ s/\(\;/\(/g; $val =~ s/\)\;/\)/g; $val =~ s/ü\;/ü/g; $val =~ s/ö\;/ö/g; $val =~ s/ä\;/ä/g; $val =~ s/ß\;/ß/g; $val;}
reading01RecombineExpr join ",", @matchlist
reading01RegOpt gm
reading01Regex <span class="bold">(.*)<\/span>\s<\/a>[\w\W]&[gl]t;&[gl]t;\s(.*)\s<br \/>[\w\W]<span class="bold">(\d\d:\d\d)<\/span>(.*)<\/div>
room Nahverkehr
stateFormat departure_1_product departure_1_destination departure_1_time (departure_1_delay)
userattr get1Name reading01-10Name reading01-11Name reading01-12Name reading01-13Name reading01-14Name reading01-15Name reading01-16Name reading01-17Name reading01-18Name reading01-19Name reading01-1Name reading01-20Name reading01-2Name reading01-3Name reading01-4Name reading01-5Name reading01-6Name reading01-7Name reading01-8Name reading01-9Name reading01Name reading01OExpr reading01RecombineExpr reading01RegOpt reading01Regex
Gruß
Roman
Deine "departure" Readings sind vom 2020-12-04 09:16:55
Dein "reading" Reading ist vom 2020-12-04 09:21:55
Heisst: die kommen nicht von der gleichen Abfrage. Was hast Du inzwischen geändert? Nur recombineExpr?
Bei mir funktioniert es bisher immer. Ich kriege alle Verbindungen in dem "reading" Reading zusammengefasst. Ist das nicht eher Abhängig vom Inhalt der Webseite auf Grund deiner OExpr ? Wie Du es geschrieben hast ist z.B. bei einem "delay" alles danach weg
Es ist schon wie du vermutet hattest das nur das recombine aktualisiert wird. Mir ist aufgefallen das bei dem reading nur die ersten drei Werte und das der letzte Wert angezeigt wird. Die Werte dazwischen sind verschollen. Merkwürdiges verhalten. Habe in der zwischenzeit ein Update gemacht. Hat sich aber nichts geändertes.
Gruß Roman
Das Problem liegt nicht an readingRecombineExpr, sondern an deiner OExpr: die filtert einige Ergebnisse aus.
Im Moment sieht die Seite so wie im Bild aus. Mit Verspätungen.
Lasse ich reading01OExpr aus, kriege ich:
ZitatBus 723,Plange Mühle, Düsseldorf,10:20,
10:20,Bus 723,Eller Mitte S-Bahnhof, Düsseldorf,10:57,
10:58,Bus 723,Plange Mühle, Düsseldorf,11:20,
11:21,Bus 723,Eller Mitte S-Bahnhof, Düsseldorf,11:57,,Bus 723,Plange Mühle, Düsseldorf,12:20,
Mit reading01OExpr:
ZitatBus 723,Plange Mühle, Düsseldorf,10:20,11:21
Nochmal:
Zitat von: amenomade am 05 Dezember 2020, 00:17:12
Ist das nicht eher Abhängig vom Inhalt der Webseite auf Grund deiner OExpr ? Wie Du es geschrieben hast ist z.B. bei einem "delay" alles danach weg
Probiere mal mit:
reading01OExpr {$val =~ s/<br\/><span class="delay.*?">//g; $val =~ s/<\/span>//g; $val =~ s/ü\;/ü/g; $val =~ s/ö\;/ö/g; $val =~ s/ä\;/ä/g; $val =~ s/ß\;/ß/g; $val;}
Herzlichen Dank, bin im Augenblick nicht zu Hause. Kann ich erst Montag wieder ausprobieren. Scheint an einem leeren delay zu hängen wenn ich es richtig verstanden habe. Schönes Wochenende, Roman
Das hängt an deinen gierigen Regex-Quantifizierern
.*Aus
ZitatBus 723,Plange Mühle, Düsseldorf,17:20,<br/><span class="delayOnTime">17:20</span>,Bus 723,Eller Mitte S-Bahnhof, Düsseldorf,17:57,<br/><span class="delayOnTime">17:58</span>,Bus 723,Plange Mühle, Düsseldorf,18:20,<br/><span class="delayOnTime">18:21</span>,Bus 723,Eller Mitte S-Bahnhof, Düsseldorf,18:57,,Bus 723,Plange Mühle, Düsseldorf,19:20,
entfernt
Zitat$val =~ s/<br\/><span class="delay.*">//g
auf einmal alles was blau markiert ist.
Mit
.*? dagegen, entfernt es nur bis zum nächsten
"> (was unterstrichen ist), und zwar mehrmals, da Du "g" Modifier nutzt.
Litteratur zB hier: https://riptutorial.com/de/regex/topic/429/gierige-und-faule-quantifizierer
Schönen Dank für die Erklärung. Ich hatte die reading01OExpr nochmals leicht angepasst, da doppelte Semikolons enthalten waren. Auch habe ich ein weiteres ? eingefügt. Als Begrenzer gegen die Gierigkeit des Regexs ist es ein Wundermittel. Danke für deine Mühen und Erklärungen.
{$val =~ s/<br\/><span class="delay.*?">//g;; $val =~ s/<\/span>.*?//g;; $val =~ s/.* \;.*//g;; $val =~ s/, <span.*//g;; $val =~ s/\(\;/\(/g;; $val =~ s/\)\;/\)/g;; $val =~ s/ü\;/ü/g;; $val =~ s/ö\;/ö/g;; $val =~ s/ä\;/ä/g;; $val =~ s/ß\;/ß/g;; $val;;}