I had three questions about populating the databas...
# umh-support
d
I had three questions about populating the database which I couldn't figure out form the doc: 1) Am I correct to understand that
factoryinsight
is deprecated, and we're expected to use
umh_v2
exclusively? 2) Does populating the 8 tables from
umh_v2
still work via the topic name as follows? * Via the
_analytics
schema: asset work_order, shift, state, product, product_type * Via the
_historian
schema: asset, tag, tag_string 3) Can the predefined database schema be extended with additional tables if my use case requires a more complicated data model? For example, a work order consists of multiple operations? I'm referring to https://umh.docs.umh.app/docs/datamodel/. Is there any other documentation that I'm missing?
j
creating a thread to better organize
Will take a closer look later, but here are some changes in the bridges: https://learn.umh.app/course/new-bridges-and-automatic-helm-upgrade-in-umh/
1) factoryinsight is deprecated and only for legacy enterprise customers, indeed 2) you can read how this works from the bridges article I linked. you can furthermore see there how the database schema is setup: https://github.com/united-manufacturing-hub/united-manufacturing-hub/blob/e030a22ebf1bf94c3935167c21787130627632f1/deployment/united-manufacturing-hub/templates/bridges/kafka_to_postgres/historian/configmap.yaml#L406-L443 3) i recommend "duplicating" an entire benthos bridge, and then adjusting it with new table names, so that _analytics stays UMH analytics, and _historian stays UMH historian, but you can easily do something like _DenisAnalytics which is like 90% _analytics, but also has custom changes
d
aah.. nice. This is something i will look further into. Much of the data we use today is not needed to be in timeseries data. For some of the machines we save around 70 values on each cycle. This is not used for live data but more for historical data to use if needed to look back on specific machine states. This is why i merge the data to a single json in benthos, to get 1 timestamp for all values. This however breaks when saving it to timescale since each object will get its own timestamp.. I would like a more traditional schema for this data where 1 table would be common for all machines with tags like cycle_running, production_rate_current, production_rate_planned, etc and another table for machine specific tags since we have alot of similar machine groups.. a data bridge for this would be a huge step forward in saving space and time. just publish the data to ex. _machinetype1 and it will push it to the 2 tables.
What i also will look into is to have like a _debug or _analysis bridge to save data a predefined time. have some easy setting to enable _debug for 30 day and push multiple machine tags to that topic to have the full history of each sensor but automatically delete data older than 30 days and have a button or something to stop debug and delete all data or that asset.. this would be really good for troubleshooting machine and for evaluating new machine data. For example if you have 4-5 sensors on a machine to calculate when it is running etc. it is quite nice to log all data for a while to see that you calculations are correct without saving a bunch of scrap data to the database.
d
If
_analytics
schema use is no longer recommended (as per the blog article), how will asset work_order, shift, state, product, product_type be created and populated?
For 3) can you elaborate what you mean with "duplicating"? Do you mean editing the file https://github.com/united-manufacturing-hub/united-manufacturing-hub/blob/e030a22ebf1bf94c3935167c21787130627632f1/deployment/united-manufacturing-hub/templates/bridges/kafka_to_postgres/historian/configmap.yaml#L406-L443 by adding a case for topic names that contain
_DenisAnalytics
and defining the appropriate SQL code to create and populate the tables?
j
You can still use the old bridges and manually activate them if you want. They are simply not activated by default anymore. For 3): in the management console, you could simply copy and paste the existing bridges and create your own new ones. Use custom data flow components for this.
d
I assume the old _analytics bridge is written in Go instead of Benthos. I checked the GitHub repository but failed to find it. Could you point me to the file(s)? In essence, I'd like to reuse most of that logic, including the OEE calculation, but adapt the database schema slightly to my use case
l
(I also noticed that this is used the latest webinar, as a custom bridge? I'm also not clear about this part at the moment. Would be great if we have some examples how we can reproduce this) https://cdn.discordapp.com/attachments/1313876222429630525/1314561266131406888/CleanShot_2024-12-06_at_12.53.432x.png?ex=67543815&is=6752e695&hm=49ab1843c20b7a278de0fa1a8a8dfea91b602fc93de40c0d816e38f05e411f1c&
Hi Jeremy, how can I reactivate that bridge?
@Denis and I are working on a PoC to demonstrate the value to a metal fabricator. For this we need to create a table that is not work orders, but operations, which is a subset of work orders. I was hoping that we could use 90% of analytics, with an additional table? How should we approach this?
j
@Mateusz @krgrsebastian do you remember what you did there?
yes, this one
i checked with @Ferdinand and by default kafka-to-postgresql shoudl still be running
l
Thanks @Jermuk appreciate any help here. If we get this to work we'll gladly share our learnings 🙌
m
@Luke van Enkhuizen We just created a new bridge based on the _historian sample as stated in the blog article and in it deleted the "Step 6" (see picture). "2. Create a new DFC for MQTT to Kafka: Check out the mqtt_to_kafka_historian sample and adjust it for the _analytics schema. " https://learn.umh.app/course/new-bridges-and-automatic-helm-upgrade-in-umh/#for-users-of-the-analytics-schema https://cdn.discordapp.com/attachments/1313876222429630525/1314584063377539093/image.png?ex=67544d50&is=6752fbd0&hm=c5a73bd41acd4466e118ad341fb413145479fffca44d9e4c3f4039bfd13f84f2&
l
j
the benthos bridges even include the table generation, and potentially the migrations 🙂 we specifically designed it so that the entire logic is in a single place, can be versioned, easily duplicated and modified.
so steps 1-3 are together
the rest is correct 🙂
j
at one point, it would get a little bit messy tbh to have so many tables and logics in a single config file / benthos container. technically possible and our current path though!
d
@Luke van Enkhuizen yeah, but the way I currently see it is that we’ll need to create a bridge for every table
And just filter the topic by name, e.g. umh/v1/myEnterprise/…/originId/_MES/work_order: filter by topic name contains “_MES” and “work_order”
It could perhaps be done in one file using lots of switches, but like @Jermuk says it would get messy. I’ll think about it tomorrow when the Glühwein wears off 😝
l
Prost!
Really impressive work. Definitely going to dig into this next days
d
a question about the bridge. Why not use mqtt direct to sql and skip kafka in between? At least when creating a custom bridge and you know that you will only use mqtt?
j
Because there is a high likelihood that messages will go missing, that was the main reason to add Kafka. See also https://learn.umh.app/blog/tools-techniques-for-scalable-data-processing-in-industrial-iot/
d
d
yes. that part i know. however it is not quite so bad as described in the article. You can still use QoS 2 and have message guarantee and also retain. There is also LWT you could use. The case when a mirco service crashes is harder to handle if messages are sent during restart, but for normal machine data this would not be a problem.. I mean if you already are publishing the data to mqtt, what is the point of sending it to kafka before pushing it to the database? it is just an extra step. For the UMH standard bridge i understand that using the bridge the user could choose to publish to mqtt or kafta and everything will work. but if creating a custom bridge i see no value in pushing the data through another step. Or am i missing something else ?
nice. As far as i know the logs a are a bit laggy atm. i had the same problem when starting up a new cluster. i had 15-20 error messages and i cleaned all messages in kubernetes but the messages was still visible in the console.. it caches the messages somewhere if i remember correct. There is a plan for improving this.
d
I'm now configuring the bridge from Kafka to Timescaledb. I think my YAML code is fine, but I'm getting an error related to versioning. Has anyone had this one before?
Copy code
[8:58:46 PM] - Action confirmed. Preparing to execute action.
[8:58:47 PM] - Editing data flow component.
[8:58:47 PM] - [Step 1/6] Retrieving configuration to get current settings.
[8:58:47 PM] - [Step 1/6] Parsing configuration to extract deployment parameters.
[8:58:47 PM] - [Step 3/4] Adding new data flow component to configuration.
[8:58:47 PM] - Error: failed to execute action: [Step 3/6] Failed to add new data flow component to configuration: refusing to deploy new version, not based on the currently deployed version (BasedOnUUID: 4392ae33-5ecd-42b9-953b-3cd541c3ebcf, DeployedVersion: a0b0cb21-2999-4fc1-9aea-f43c9ae876cd)
j
For example the Tag Browser in the Management Console incl Message Tracing are not possible. Furthermore, QoS 2 still has message loss. It only guarantees you the delivery to the broker, not to the receiving application. If the consumer is offline during that time, then the standard does not guarantee it anymore and it’s entirely up to the implementation.
It will ignore old errors in the logs automatically either after a duration.
Did you used multiple browser tabs? This error appears when you open for editing, but in the meantime changed the underlying DFC or protocol converter in a different tab, or someone else (in multi user) changed it. This is done to prevent accidental code overwrites, so you need to be on the latest code change before you can edit. Simply re-open the DFC again.
d
so the tag browser uses kafka? Yes, if the consumer is offline you could loose messages, but i mean if you are pushing the messages from a machine using mqtt and using a bridge, then the bridge is the consumer so if the consumer is offline it does not get sent to kafka either. if not minding loosing the tag browser functionality i see no meaning in bridging it to kafta before sql if you machines all connect direct to mqtt. I guess the best would be to only use kafka all the time but that is not possible since mqtt is often the only protocol supported for many devices.
d
Confirming that this was indeed the problem
@Jermuk can you help me understand what the "merge point" is, and why it's supposed to be 3 or 5? It seems to play a role in creating the Redpanda topic name https://cdn.discordapp.com/attachments/1313876222429630525/1315619291919421491/image.png?ex=67581172&is=6756bff2&hm=f7d9792423125947a1d578d1fe3c7bb898290fe7499a7e5befda100ea87f9021&
j
the point where a mqtt topic gets broken into kafka topic and kafka key (as by defaul kafka doesn ot like having a lot of topics). We already did a successful PoC where we configured kafka in such a way that it is actully possible, but it make take some time until it reaches nightly
d
Got it. ~~If I'm creating a custom
_MES
data contract, is there any reason why I can't just give the Redpanda topic the same name as the table in Timescaledb, i.e.
asset
,
work_order
,
operation
, etc. ? I don't think I need the feature from the current
_historian
data contract that populates an asset table based on the topic name because we just query the assets from an external system via HTTP~~ My question doesn't make much sense... I'll take some time to think this through
l
Hi @Jermuk! @Denis and @Diederik and I had a close look at it. But we coulnd't figure out how to get the _analytics working via Benthos yet. Would love to create our own _analytics2. Would you be so kind to help us out and share a copy of the YAML config to mimic this ourselves? We promise to share our insights if we get it to work 😎
We just need a good sample to get the broad lines and then we can finetune it ourselves. Thanks in advance!
j
i quickly asked @Mateusz and he said they are using the bnethos bridges between mqtt and kafka exactly like in the tutorial, and to insert it into the postgres, usign the old kafka-to-postgresql
i hope that is helpful 🙂
d
I'm creating a new Custom Data Flow Component to transfer data from Redpanda to Timescale by editing the kafka_to_postgresql_historian_bridge sample. I have a working kafka topic
umh.v1.Belgium
where I receive messages with the kafka key
Antwerp.Kalmthout.Heide.Begonia._mes.work_order
This is an example message:
Copy code
{
  "actual_end": null,
  "actual_start": null,
  "archived": false,
  "article_id": 12345,
  "created_at": 1733761490051,
  "created_by": "system",
  "external_order_id": "EXT-6925",
  "order_id": 403074,
  "planned_end": "2024-12-10T16:24:50.051Z",
  "planned_quantity": 100,
  "planned_start": "2024-12-09T16:24:50.051Z",
  "status": "pending"
}
The problem is that the
input_received
count under Metrics in the management console stays at zero, even though new messages are successfully being written to the kafka topic. I don't understand why the input node doesn't receive data. I checked that: - I'm using the correct consumer
kafka_to_postgresql_historian_bridge
- the topic name is
umh.v1.Belgium
which should match the wildard - the consumer group
kafka_to_postgresql_historian_bridge
is listed in the list of consumers for that topic in Redpanda Here is the input YAML:
Copy code
input:
  kafka_franz:
    seed_brokers:
      - united-manufacturing-hub-kafka.united-manufacturing-hub.svc.cluster.local:9092
    regexp_topics: true
    topics:
      - umh.v1.*
    consumer_group: kafka_to_postgresql_historian_bridge
    auto_replay_nacks: false
I added a file with the complete bridge YAML ito this message https://cdn.discordapp.com/attachments/1313876222429630525/1315719705318658108/question.yaml?ex=67586ef6&is=67571d76&hm=4fb03d290f7e79df6a76ee59637c9188d89c840378bce5c4307bf19cb1a69845&
By the way, in the Output YAML I had to hardcode the password in the database URI, otherwise to postgres throws an error:
dsn: postgres://kafkatopostgresqlv2:{{ include "united-manufacturing-hub.postgresql.kafkatopostgresqlv2.password" . }}@united-manufacturing-hub.united-manufacturing-hub.svc.cluster.local:5432/umh_v2
f
The input message in your case is invalid. You defined created_at as string, but gave it an integer. As for the input_received, it might only increase when a message is being commited as processed back to kafka
d
Oops, that was it.. thanks! I'm bookmarking your tool 😉
We now have the following architecture for our bridges: 1) One mqtt_to_kafka_mes_bridge that transfers and combines messages from several
umh/v1/<ISA-95 hierarchy>/<asset>/_mes/<MES table>
MQTT topics into a single kafka topic
umh.v1.<ISA-95 hierarchy>.mes
where the kafka key is equal to
<MES table>
2) An individual kafka_to_timescale_mes_bridge for each
<MES table>
that transfers kafka messages from the
umh.v1.<ISA-95 hierarchy>.mes
topic into a database table based on the value of the
kafka_key
. Here's an example of kafka messages the Redpanda UI (see image). Each bridge works individually when tested with an "inject" node in Node-Red. However, whenever both bridges are activated, and thus both read from the same kafka topic, randomly one one bridge manages to get its data into the database. It's as if it's not possible to have multiple bridges listening to the same kafka topic if all messages have the same timestamp. Could this be the case? The solution would be to either split the single
umh.v1.<ISA-95 hierarchy>.mes
topic into may topics like
umh.v1.<ISA-95 hierarchy>.mes**.work_order**
etc.
Doesn't work: see below Alternatively, we would combine all the individual MES bridges into one large MES bridge and filter by kafka_key. This is possible, but it's more messy. Therefore I'd like to ask if my assumption about this being a problem is correct https://cdn.discordapp.com/attachments/1313876222429630525/1316784501061324800/image.png?ex=675c4ea1&is=675afd21&hm=d049f35a80e8780485a2c842da748073c0312dbc62bbc551f4361ec3c9aa8ce4&
I just tested the first solution (where we split the
umh.v1.<ISA-95 hierarchy>.mes
topic into may topics like
umh.v1.<ISA-95 hierarchy>.mes.work_order
etc.) and the same problem occurs: The first bridge 1) works flawlessly, but the kafka_to_timescale bridges don't run. It's as if they are not being activated and are ignoring new messages in the topics they listen to. Any tips on how to debug such a bridge? I already posted the code here: https://discord.com/channels/700613971941785680/1316366172907638835/1316779471692628019
j
> Each bridge works individually when tested with an "inject" node in Node-Red. However, whenever both bridges are activated, and thus both read from the same kafka topic, randomly one one bridge manages to get its data into the database. did you checked the name for the consumer group? this must be different 🙂
if two consumers are in the same group, they will load ablacne the messages
d
Ahh 99% that’s the issue: we use the same consumer group. Will check tomorrow
Confirming that this was indeed the problem
j
❤️
d
Is it problematic to create the SQL schema outside of the bridge(s)? We have a file that instantiates the schema like in the attached SQL file https://cdn.discordapp.com/attachments/1313876222429630525/1317158147621650453/message.txt?ex=675daa9e&is=675c591e&hm=c3fbdfa371bd2a6580e1b51a77f5911f30e2f403511be5c6c0d8d8d30762c7e4&
j
no, from a technical perspective not. might be helpful to have "everything together" and be able to reproduce the entire stack within the single yaml
5 Views