THZ Tecalor (LWZ Stiebel Eltron) module support and code improvement.

Begonnen von immi, 02 Februar 2015, 11:42:16

Vorheriges Thema - Nächstes Thema

andre.k

ZitatKönnte das eine Erklärung sein für das Problem, das ich vor einiger Zeit beschrieben hatte?:

Ja, könnte sein. In meiner Umgebung habe ich zwei Situationen, die diese Fehlermeldung verursachen:

a) THZ hat die Anfrage richtig empfangen, ist aber beschäftigt und kann nicht antworten. Hier hilft die Erhöhung des Timeouts und dann wäre nonblocking eine Lösung
b) THZ hat die Anfrage nicht erhalten, dann antwortet die THZ auch nicht und es hilft auch keine Erhöhung des Timeouts. Hier würde aber eine (sofortige) Wiederholung der Anfrage helfen, das ist aktuell aber nicht implementiert. (noch nicht, ich teste das gerade)

Die Schwierigkeit besteht darin die beiden Fälle zu erkennen bzw. zu unterscheiden, wenn man sie unterschiedlich behandeln will.

In beiden Fällen hat man im Logfile
Mythz THZ_ReadAnswer got no answer from DevIo_SimpleRead. Maybe too slow?


willybauss

Super. Momentan bin ich ohnehin eher im Garten beschäftigt (Terrasse bauen). Wenn ich dann Richtung Winter wieder Zeit habe ist das womöglich schon abschließend implementiert. Klingt nach 'nem Plan  :)
FHEM auf Raspberry Pi B und 2B; THZ (THZ-303SOL), CUL_HM, TCM-EnOcean, SamsungTV, JSONMETER, SYSMON, OBIS, STATISTICS

peter456

Zitat von: andre.k am 22 August 2017, 14:04:10
Hallo an alle,

mich würde mal interessieren, wer noch das Nonblocking-Feature nutzen würde und dann seine Erfahrungen mitteilen könnte? Hat außer mir noch jemand eine THZ mit der alten Firmware 2.xx? Ich vermute nur die alten Modelle der THZ haben das Problem mit den sporadisch auftretenden Timeouts beim Warten auf Antwort. Vielleicht liegt es auch an der alten Hardware (serielles Inteface mit 9600 bps). Wie auch immer. Über Feedback wäre ich dankbar.

Andre

Hallo,
habe heute das Nonblocking-Feature bei meiner LWZ303i mit der alten Firmware 2.06 ausprobiert.
Hier ein Auszug aus der Log-Datei vor der Änderung:
2017.08.23 22:26:12 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
2017.08.23 22:26:13 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step0  -- F4 ->
2017.08.24 00:01:19 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
2017.08.24 00:06:19 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
2017.08.24 00:06:20 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step0  -- F4 ->
2017.08.24 00:16:21 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step2 -- FB ->
2017.08.24 00:16:21 3: THZ_Get: Error msg2:   THZ_Get_Com: error found at step0 15 NAK!! -- F4 -> 15
2017.08.24 00:16:23 3: Mythz/RAW: 02
2017.08.24 00:26:23 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
2017.08.24 00:31:23 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step2 -- FB ->
2017.08.24 00:31:23 3: THZ_Get: Error msg2:   THZ_Get_Com: error found at step0 15 NAK!! -- F4 -> 15
2017.08.24 00:31:23 3: Mythz/RAW: 02
2017.08.24 01:06:26 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step2 -- FB ->
2017.08.24 01:06:26 3: THZ_Get: Error msg2:   THZ_Get_Com: error found at step0 15 NAK!! -- F4 -> 15
2017.08.24 01:06:26 3: Mythz/RAW: 02
2017.08.24 04:21:40 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
2017.08.24 04:21:41 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step0  -- F4 ->

Nach dieser Änderung sieht das so aus:
2017.08.26 12:58:34 3: [Mythz] THZ_GetRefresh(sGlobal) BlockingCall started
2017.08.26 12:58:34 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 12:58:34 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 12:58:34 3: Mythz device opened
2017.08.26 12:58:42 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
2017.08.26 12:58:43 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 12:58:43 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)
2017.08.26 12:58:55 3: [Mythz] THZ_GetRefresh(sHC1) BlockingCall started
2017.08.26 12:58:55 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 12:58:55 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 12:58:55 3: Mythz device opened
2017.08.26 12:58:56 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 12:58:56 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)
2017.08.26 13:03:34 3: [Mythz] THZ_GetRefresh(sGlobal) BlockingCall started
2017.08.26 13:03:34 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 13:03:35 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:03:35 3: Mythz device opened
2017.08.26 13:03:35 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:03:35 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)
2017.08.26 13:03:56 3: [Mythz] THZ_GetRefresh(sHC1) BlockingCall started
2017.08.26 13:03:56 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 13:03:56 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:03:56 3: Mythz device opened
2017.08.26 13:03:56 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:03:56 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)
2017.08.26 13:08:35 3: [Mythz] THZ_GetRefresh(sGlobal) BlockingCall started
2017.08.26 13:08:35 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 13:08:35 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:08:35 3: Mythz device opened
2017.08.26 13:08:36 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:08:36 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)
2017.08.26 13:08:56 3: [Mythz] THZ_GetRefresh(sHC1) BlockingCall started
2017.08.26 13:08:56 3: Opening Mythz device /dev/serial/by-id/usb-067b_2303-if00-port0
2017.08.26 13:08:56 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:08:56 3: Mythz device opened
2017.08.26 13:08:57 3: Setting Mythz serial parameters to 9600,8,N,1
2017.08.26 13:08:57 1: /dev/serial/by-id/usb-067b_2303-if00-port0 reappeared (Mythz)

Wie kann ich diese Ergebnisse bewerten?
Es gibt auf jeden Fall viel mehr Einträge im LOG.

andre.k

Hallo Peter,

interessant wäre die Anzahl der folgenden Fehlermeldungen pro Tag einmal mit und einmal ohne nonblocking:

THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found

Die Anzahl gleicher Logeinträge pro Tag kannst du z.B. so ermitteln:

grep InterfaceNotRespondig /opt/fhem/log/fhem-2017-08.log |uniq -c -w10

welchen Timout (Attribut simpleReadTimeout ) hast Du bei dir gesetzt?

Andre




peter456

Zitat von: andre.k am 28 August 2017, 10:49:02
Hallo Peter,

interessant wäre die Anzahl der folgenden Fehlermeldungen pro Tag einmal mit und einmal ohne nonblocking:

THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found

Die Anzahl gleicher Logeinträge pro Tag kannst du z.B. so ermitteln:

grep InterfaceNotRespondig /opt/fhem/log/fhem-2017-08.log |uniq -c -w10

welchen Timout (Attribut simpleReadTimeout ) hast Du bei dir gesetzt?

Andre

Hallo Andre,

hier ist das Ergebnis der Fehlerhäufigkeit:


     56 2017.08.10 06:43:09 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- D1 ->
     11 2017.08.11 00:39:25 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     24 2017.08.12 00:58:26 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     35 2017.08.13 01:12:05 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     47 2017.08.14 10:18:35 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     23 2017.08.15 10:38:34 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     15 2017.08.16 00:10:21 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     31 2017.08.17 01:44:09 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     30 2017.08.18 00:46:05 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     31 2017.08.19 02:22:56 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     11 2017.08.20 11:50:15 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     19 2017.08.21 00:41:15 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     32 2017.08.22 00:03:20 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     12 2017.08.23 00:14:27 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     19 2017.08.24 00:01:19 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     24 2017.08.25 02:23:14 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     26 2017.08.26 00:44:52 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- FB ->
     18 2017.08.27 00:29:44 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->
     12 2017.08.28 03:20:52 3: THZ_Get: Error msg2:  THZ_ReadAnswer: InterfaceNotRespondig. Maybe too slow THZ_Get_Com: error found at step1  -- F4 ->

Wenn ich das richtig sehe, dann waren es gestern (ganzer Tag nach Umstellung) 18 Fehler.
Vor der Umstellung wares es zwischen 11 und 56 Fehlern pro Tag. 
Timout habe ich wie du auf 8 eingestellt.

Kann man irgendwie die Anzahl der LOG-Eintrage reduzieren oder komplett abschalten?

immi

Zitat von: peter456 am 28 August 2017, 20:30:14
Timout habe ich wie du auf 8 eingestellt.
Kann man irgendwie die Anzahl der LOG-Eintrage reduzieren oder komplett abschalten?
Hi peter
Yes, it would be very easy to ignore the error in log; but we are trying to remove all errors.
for me it would be interesting to know if the remaining 11 errors are caused by an even slower  heatpump or by the heatpump just ignoring the command.

I would appreciate if you increase the simpleReadTimeout to 10.
It would help to know the difference between no nonblocking, nonblocking &  simpleReadTimeout 8  and nonblocking &  simpleReadTimeout 10
immi
it would be grat if you make some statistics for several days each.

peter456

Hallo.
Habe heute wieder nach der Anzahl der Fehlermeldungen in den letzten 2 Tagen nachgeschaut.
Es waren 16 und 18 Fehler pro Tag.
Den Timout habe ich jetzt von 8 auf 10 verändert und ich werde in einigen Tagen neue Ergebnisse dazu liefern.

peter456

Hallo.
Habe heute wieder nach der Anzahl der Fehlermeldungen in den letzten 3 Tagen mit Timeout 10 nachgeschaut.
Es waren 21, 16 und 13 Fehler pro Tag.

immi

Peter
thanks for sharing.
It looks like hte new feature is not help you :(
maybe Andre has another idea.
immi

peter456

Hallo Andre,

wie sehen deine Ergebnisse bezüglich Timeout und Nonblocking?
Hat es bei deiner Anlage was gebracht?

Gruß Peter

andre.k

#625
Bei mir hat die Erhöhung des Timeouts auf 8 sec. eine Reduzierung der täglichen Fehler von ca. 60 auf weniger als 20 gebracht. Die restlichen Fehler scheinen aber anderer Natur zu sein. Hier geht  vermutlich die Anfrage "verloren". Das könnte man durch eine Wiederholungsschleife in der Funktion THZ_GetCommunication reparieren. Damit verhindert man zwar nicht den Fehler, aber er wird abgefangen und es geht kein Reading verloren.

Der Code könnte so aussehen


sub THZ_Get_Comunication($$) {
  my ($hash, $cmdHex) = @_;
  my ($err, $msg) =("", " ");
  Log3 $hash->{NAME}, 5, "THZ_Get_Comunication: Check if port is open. State = '($hash->{STATE})'";
  if (!(($hash->{STATE}) eq "opened"))  { return("closed connection", "");}
 
  my $cnt = 0;
  my $cntmax = 5;
  my $success = 0;
  while (!$success and ($cnt ++ < $cntmax)) {
  Log3 $hash->{NAME}, 4, "THZ_Get_Comunication: loop $cnt";
  #select(undef, undef, undef, 0.001);
  THZ_Write($hash,  "02"); # step0 --> STX start of text
  ($err, $msg) = THZ_ReadAnswer($hash);

#Expectedanswer1    is  "10"  DLE data link escape

  if ($msg ne "10")    {$err .= " THZ_Get_Com: error found at step0 $msg"; $err .=" NAK!!" if ($msg eq "15"); select(undef, undef, undef, 0.1); return($err, $msg) ;}
  else  {
THZ_Write($hash,  $cmdHex); # step1 --> send request   SOH start of heading -- Null -- ?? -- DLE data link escape -- EOT End of Text
($err, $msg) = THZ_ReadAnswer($hash);
  }
 
  if ((defined($err)))  { $err .=  " THZ_Get_Com: error found at step1 "; next ;}

# Expectedanswer2     is "1002", DLE data link escape -- STX start of text   

  if ($msg eq "10") { ($err, $msg) = THZ_ReadAnswer($hash);}
  elsif ($msg eq "15") { $err .=  " THZ_Get_Com: error found at step1  NAK!! "; select(undef, undef, undef, 0.1); return($err, $msg) ;}
  if ($msg eq "1002" || $msg eq "02") {
THZ_Write($hash,  "10");     # step2 --> DLE data link escape  // ack datatranfer
($err, $msg) = THZ_ReadAnswer($hash); # Expectedanswer3 // read from the heatpump
THZ_Write($hash,  "10"); 
  }
  if (defined($err))  { $err .= " THZ_Get_Com: error found at step2"; next;}
  else {$success = 1;}
  }
  ($err, $msg) = THZ_decode($msg) if (!defined($err)); #clean up and remove footer and header
  return($err, $msg) ;
}



hi immi,

if you have time, would you give it a try?

VG
Andre

immi

Hi Andre
Most of the people do not need it; I almost never get an error, but just in case I lose a reading, I get it 60 seconds later.
If you have lots of errors, it means the heatpump is busy; it makes no sense to push the heatpump even more.
If your childrens are crying, let them calm down and repeat the order later  :D

I would prefer a different strategy:
only in case of nonblocking we provide a repeat functionality, 5 seconds later.
In function THZ_GetNBDone we already check  ($msg ne "[ERROR]").
In case of an error we execute the followings 4 lines.

$timedelay=5;
my %par = (  hash => $hash, command => $cmdhash );
RemoveInternalTimer(\%par);
InternalTimer(gettimeofday() + ($timedelay) , "THZ_GetRefresh", \%par, 0);   

The code in red should delete existing timers for the given register; it has to be careully checked.

What do you think?

p.s.
I am on a trip; I will be back next monday.
immi

andre.k

Hi immi,

ZitatMost of the people do not need it;
Are you shure? A retry handling in case of communication errors could help everyone. Not only in my case.
ZitatI almost never get an error, but just in case I lose a reading, I get it 60 seconds later.
But could also be the next day (interval is allowed to set up to 86400 sec)
ZitatIf you have lots of errors, it means the heatpump is busy; it makes no sense to push the heatpump even more.
That is not completely right. A "timeout rrror" in ReadAnwer only means that no response was received from the THZ. This can also have other causes (packet lost, no sync). In these cases, a repetition of the communication is useful after a certain waiting period has elapsed. My tests showed that after 10 sec a retry was always successful.
ZitatIf your childrens are crying, let them calm down and repeat the order later  :D
agree :)
ZitatI would prefer a different strategy:
only in case of nonblocking we provide a repeat functionality, 5 seconds later.
In function THZ_GetNBDone we already check  ($msg ne "[ERROR]").
In case of an error we execute the followings 4 lines.

$timedelay=5;
my %par = (  hash => $hash, command => $cmdhash );
RemoveInternalTimer(\%par);
InternalTimer(gettimeofday() + ($timedelay) , "THZ_GetRefresh", \%par, 0);   

The code in red should delete existing timers for the given register; it has to be careully checked.

What do you think?
Good idea for the nonblocking part.
I would prefer a more generaly approach for the ErrorHandling, so that also errors, which are recognized by THZ_decode (CRC error, unknown error,...) are handled with a retry of communication.

Besides, I'm no longer convinced that Blocking.pm is the best solution. I have no idea how to prevent the continuous open / close of the IO device. I had already after some days in nonblocking mode a memory problem (maybe memory leak). I could not fork any subprocesses.

In my opinion, it would certainly be the better approach to organize reading operations through the FHEM main loop. This also corresponds more to the normal FHEM design (many other FHEM modules work in this way). The waiting times would then no longer be in the THZ module but would be returned cooperatively to the FHEM main program. You could then wait 10 seconds for a response from THZ without blocking FHEM. This would be very positive for the overall performance of FHEM.

I know this would be a major change, but maybe you can think about it.

Andre


immi

Zitat von: andre.k am 08 September 2017, 11:48:54
Besides, I'm no longer convinced that Blocking.pm is the best solution. I have no idea how to prevent the continuous open / close of the IO device. I had already after some days in nonblocking mode a memory problem (maybe memory leak). I could not fork any subprocesses.
Hi Andre
Also not sure Blocking.pm is the best solution;  It was nevertheless fun trying.
Many of the issues you named can be fixed; but I really have  no time now;

Apart from blocking.pm, there is also a very elegant solution from boris: SubProcess.pm
You move all comunication tasks to a parallel process always running and hopefully never dying (look at SubProcess.pm from boris)


Zitat
In my opinion, it would certainly be the better approach to organize reading operations through the FHEM main loop.
This approach was my starting point with thz some years ago; but never got it running.. OK It was my fisrt perl script, I am sure I can do better now :)
The main problem with this approach is that you never know when you get the fhem-loop back to THZ_Read  (10 other modules could block fhem loop for too long)
e.g.
I send a 10
THZ_Read waits that the global-fhem-loop gives me a 02 for an undefined time; if your heatpump is fast, this is a problem.

immi

andre.k

Hi immi,

I've already looked at the SubProcess.pm. This requires some changes in the THZ module. All DeviceIO operations must be relocated to the child process. The return of THZ's answers to the parent process is then controlled by the global select loop. Synchronous requests to the THZ would then also have to run over the child process etc.
This is quite complex.
Since I do not have so much time, I do not want to put time into a solution that you do not like ;)
Therefore, the question is which way you would go:
1) Blocking.pm
2) SubProcess.pm
3) Global select loop (without child process)

For me runs currently 1) with known drawbacks and 3) without any problems.

Andre