Episode 01 - Mechanistic Machine Learning Artwork

Flush to Data

We mix Water, Poo, and Data.The Flush to Data podcast invites conversations about the data, models, and science used in wastewater engineering. Your hosts are Jörg Rieckermann (Eawag, Dübendorf, Switzerland) and Kris Villez (Oak Ridge National Laboratory, TN, USA). The contributions of Kris Villez to this podcast are a reflection of personal opinion only and are not related to any project, study, or opinion at the Oak Ridge National Laboratory or the U.S. Department of Energy. ---- Credits: ---- In our Trailer and intro we use the beautiful music from the "Obliterator" Amiga game by the incredible David Whittaker https://en.wikipedia.org/wiki/David_Whittaker_(video_game_composer) we also use creative common artwork distributed under a CC BY-ND 2.0 license. Check out our homepage for more info --- https://flush2data.gitlab.io

All Episodes

Flush to Data

Episode 01 - Mechanistic Machine Learning

July 10, 2020 • Kris Villez and Jörg Rieckermann

This is the first episode of the Flush to Data podcast. We start with a discussion on mechanistic modelling and machine learning and venture into models for emulation, uncertainty quantification, and data quality. Bonus material includes a discussion on aspects of current scientific practice, including the lack of hypothesis testing, the evaluation of novelty, and the challenges with a generalist approach.

Hosts: Jörg Rieckermann and Kris Villez
Guest: Juan Pablo Carbjal

Links:
* Juan Pablo's web page: https://sites.google.com/site/juanpicarbajal/
* Article relating Gaussian processes and Kalman filter: www.jstor.org/stable/2984861
* BBC podcast on Gauss: https://www.bbc.co.uk/programmes/b09gbnfj
* Using Lake Zurich as a heat sink: Unfortunately, we could not back-track the original source, despite considerable effort. If anyone of the listeners happens to know how to access the original source we would be grateful for a notice. The best we could find was documentation of related projects by Eawag: https://thermdis.eawag.ch/ and [1]. These show that ecological consequences have indeed been assessed in detail.
* Goodhart's law: https://en.wikipedia.org/wiki/Goodhart's_law
* An invitation to reproducible computational research: https://doi.org/10.1093/biostatistics/kxq028
* Science in the age of selfies: https://doi.org/10.1073/pnas.1609793113

References:
[1] Wüest, A. (2012). Potential zur Wärmeenergienutzung aus dem Zürichsee. Machbarkeit. Wärmeentzug (Heizen) und Einleitung von Kühlwasser. Kastanienbaum: Eawag. DORA-Link

Episode guide:
[0:00:00] Who is Juan Pablo Carbajal?
[0:03:10] Mechanistic modelling versus artificial intelligence
[0:07:08] Who is Juan Pablo Carbajal? (ctd.)
[0:09:26] Cross-fertilization between robotics and wastewater engineering
[0:15:05] Emulation: using models to approximate other models
[0:21:22] Incorporating common sense and prior knowledge into data-driven models
[0:31:31] Equivalence between Gaussian processes and Kalman filter
[0:33:50] Utility of emulation
[0:40:15] Utility of quantified uncertainty
[0:44:50] Intermezzo
[0:49:04] What can models say about data quality
[1:02:15] How to communicate about data quality?
[1:10:10] Preparing engineers for the future
[1:15:23] Thank you and goodbye!

Bonus material:
[1:16:40] Interpretable machine learning models
[1:22:33] Hypothesis testing
[1:26:14] Critical assessment of novelty
[1:30:50] Barriers to the generalist approach
[1:35:48] Thank you and goodbye!

[still requires editing]
Jörg: 0:01
Theo. Everybody thinks first episode off Flush to data The podcast on wastewater and data data science. But I'm happy to, um, welcome my co host, Chris. Hi. I'm like, Oh, and I guess today Juan Pablo Carvajal, um, he's sitting in his home office in battery guts. Deep Switzerland. Like, as we all are in this time off Corona Crisis and home. Pablo walked into my office five years ago. Four years ago. Um, because we were collaborating on ah wastewater project on sewers. We were applying mechanistic machine learning techniques. And that is what we would like toe talk about today. High home. Pablo, how are you?

Juan Pablo Carbajal: 0:56
Hello, you Hello. Hello. Thank you for the invitation. This is fun.

Jörg: 1:02
So I'm introducing you a little bit. The things I know you're originally from Argentina, then did your PhD in the University of Zurich on robotics and leg feed stuff. Right? And then you went on to Belgium as a postdoc and then tried to settle down in Switzerland. And that's how we originally met. I have had it. Ah, I never had it in a job interview. That somebody walks in with a paper in his hand and says, Hey, guys, this is the method which you are looking for. It's just anchor this this, Colonel, this cardinal stuff. Um, yes, but, um, where to start? Maybe you

Juan Pablo Carbajal: 1:48
can be We can a right to that point. They slowly. Yes. So I'm Why don't you want to go into the technical staff or you want to start a debate over, you know, like, more personal things

Kris: 2:03
that something more what with the New York Miss our forget.

Juan Pablo Carbajal: 2:08
And so I know. I think he mentioned I'm a physicist or, you know, Leah faces in in a what we have at that time it was called a industrial physics. I think in the northern countries, cold day in a engineering physical, something that but they did that the euro lay study and learn things officious is a learns. But at the end, you tried to bring this to the applied world that Seok industrial environments in. So that's my set up. And I was always very much inspired by the complex systems modeling. So things that there you cannot handle a with linear and thing is mm. On a and I started my road without in electro Magnetics a which then going toe controlled and optimization a huge opens in industrial environments and finally easy. Okay. Hey, I think I need a PhD on the topic. Waas again Robotics ai, You know, things that were a relevant other point to me. I wanted to learn more about that. Essentially had always this question like, OK, we have to make so much f or tow bill models off the devices we have to deal with my, For example, let's say you have this huge opens which are like a like a carousel by 30 meters in diameter. Many, many components, you know, they have millions of things inside, and I have to build a model for it. So you will see to the experts with your craters and try toe, you know, with a lot of human work. Try toe, build the classical way, you know, models off each a component and then try to put them all together. And I was by, you know, hearing and environments mentioned. Yeah, machine learning and robotics. And yes, Okay. It wouldn't be called that we could get some support not to do this work. We do so heavily toe help us build the smaller on this How? I got interested in a topic like automatic modeling if you want on AM on dso I went deep into it to realize actually cannot do It s Oh, that was my big discovery after said when it was too late to quit my PC to realize that they actually my Suleiman Ali, I actually cannot do that that they only human so far.

Kris: 4:31
Could you elaborate a bit

Jörg: 4:34
like for him to an automatic system Identification, you mean or

Juan Pablo Carbajal: 4:38
so again, system identification already has acknowledged the human has because you have to somehow decide on a model structure you have tow You have decided what you're gonna identify But then what we do when we built the smallest who actually we collect all our knowledge, our intuitions or our impression we have about the system. We can also include eight down that maybe later comes later on some measurements off immune system. But we build this most based on something a lot more abstract than data. They way above in the instruction of leader, we usually post mechanised and all this thing is hitting that stuff A. Here we have some transfer process here were so kind of like we work at the very absolute level, which will release models aan den. We date that. We kind of identify the parameters off those more by. That's what they gave us. We summarize data in those parameters, but this ability that the experts have toe kind off focus on the mechanisms as more relevant toe make a model of a system. That's what I waas always aiming like we should get help from the machines to do that. A. Yeah, with the experience, I realized I was completely confused. Maybe I was led to my two straight by the hype because it's what they will be. What we were here. We will understand that we will discover on and Adi in what Essentially I put it bluntly, right? Is that they the men that we have basically summarized data and nothing else. They they do not produce a least a sort off absolute

Jörg: 6:10
No. Yes, But I think what we're looking for is like evidence, right, like and and understanding. And if I I as, ah, wastewater environmental engineer, if I could start my career again, I would That's what I thought. The last 20 years, I would start to study physics. You know, because in physics like you are always asking is what I see. Is it really? You know, I think you're really close to the likelihood function, so to speak now, like, uh, like mostly like in comparison to math. The parameters have some physical meaning, right? It's like, um yeah, it's almost in Newtonian sense. You know, You you discover reality through models, you know, which you throw against data and then this hypothesis testing stuff. But we can we can come late, Come back to that later. Can you say with with two or three sentences, why you started to study physics and not like mechanical engineering or like informatics?

Juan Pablo Carbajal: 7:17
Well, I think I waas between biology, astronomy and physics and a I took courses. I started the university not completely sure which one. So I started their courses on all of them on a I have already sprayer idea that it was a case, but essentially I wanted to be a generalist. I didn't know one toe b a in I am indoctrinated in a field. I wanted to have the tools that allow me to move across fields because that's what I like. If I get if I get too much into one thing, then over time you start to get boring to me. So I really need to change the here. So I'm that I I was sort of a physics a I understood that. And I think I was not wrong with me. That was a good a estimation of fix. It will give me these very general but run, which then I will be able to apply whatever. I find it interesting. It has worked so far. For me, it's not easy to find jobs A as a general east. But so far is that work out in. So that was the main motive. The final decision was based on that. Basically, Yeah, You know what?

Jörg: 8:29
Move across the fields. You did? Oh, boy, You ended up with wastewater.

Juan Pablo Carbajal: 8:33
Not ended. I wasn't. I was with artists. Biologist said electromagnetic gist, electrical and computer scientist. I was really everywhere. Isn't I? Don't want to say I'm a next purposes. This sometimes sensory with pedantic like I know a little about this feat on my contribution over basically the same thing. I go there. I listen what they have to say. I try to identify what are the problems there, and then I just have the use, the tools on my approach to try, could to contribute to that date. I learned a lot in the process, which also like a lot. I learned a lot from all these things on him, but I will never say I you know, somebody that could you give a talk on biology? I wouldn't there, and I could tell you what I eat with priorities, but they will not there to say, I know something on this field. And

Kris: 9:26
so do you find that the things you like that robotics are about electromagnetic systems today? Today transfer because there's something that wastewater engineers could learn from that. Or do you feel you have the machine learning tools apply? But other than that, it's actually quite different.

Juan Pablo Carbajal: 9:48
So I think a whether if you're talking about, let's say m a waste water. A. I think there you have a world off uncontrolled systems. Basically, you can control it with your process. But what consider process else know what is right. Nobody knows so, and I think in that scenario Yes, there is a lot off commonalities between the challenges that, but it's really war. Robotics, not industrial robotics, have because so, maybe make the difference. The classical road, because this machine is attached to the ground in a factory ground and is basically needs to do always the same thing, you need to have some security, so you will do perform. A set off us is clinically drilling. It could be assembling, but the environment is engineer by us. We build the environment off this machine, so we know the environment is for us on a complete known thing. Aim, and that allows us to evaluate the performance of this machine very precisely. What we want to get out of that a production line. And when you want to do robots, for example, interact with humans or that work in like, different to the self driving cars. Maybe you heard the news that there's starting now a is that the environments completely on control? That means data will have very little structure, or, if you see a structure in may be temporary, you need the structure of the data may change from time to time very quickly and in an uncontrolled manner. So you don't know what you want to get. You never know when you will have a representative that said. And so in that sense, I think a wastewater that's a lot of commonalities with robotics. Andi, with the really challenges in machine gun, if you want to explore data and for many four minutes, four minutes, I think, essentially, just maybe we kind of struck more the modelling tools, I think still the our rudimentary. But that comes and leave it attached to the fact that it's hard to be knowledge in an environment that this structure with so in the physics in particle physics or intermittent, essentially most, will be in a laboratory, which we remove everything that we assume was not necessary to understand the process. And then we were able to get those details pro the model off those probably isolated and from there and we build. But this is something that is very hard to do in a way spot. You cannot this kind of approach and we're gonna remove everything and keep only one thing, because actually, what you want to know is what happens with everything is together. So I understand it was take a lot of time, a lot of fair four and therefore should be there. No, to try to be the smallest, although they will be complicated at the end. And you isolate processes and put them together with your things, stuff like that. So, yeah, it's a I found him at the very end, Were very challenging. Field on Super interesting. Yeah.

Kris: 12:49
Could could we take this a bit further? Still. So when cars came along, um, the environment changed. Basically, roads changed. Uh, the materials that we use for roads changed out the code of conduct. When you drive a car changed mark, we added markings. We are stoplights. So there's all that technology that came alone because we had the car. If we not say there's a self driving car, there is, there's something in me that thinks well, probably the world will change. Maybe we will heart Internet of things beacons that will tell the car what it is and have the same function as lane mark lane markers have to the human driver. There will be these expeditions to the environment that we will add. And again, I'm asking myself Well, what we should be doing the waist with respect to change the environment such that it becomes a more controlled environment more. I'm salon doubling time.

Juan Pablo Carbajal: 13:58
This concept you're mentioning he has a name in a in real world. A, I say embodied ai, which is scaffolding is Khloe's scuffle. So it's a process in which you build the world for a machine. A for a model toe work better than if you do it in an on structure work. Yeah, and it means you have some sort of control over the environment which may or may not be a reasonable assumption. And the other concepts steam allergy that east. When you have, when you invent something and you put it into the environment, that thing itself will change the environment him And I think this this aspect is one off the main arguments A When we somewhere Well, what would be the difficulty of applying, you know, a statistical learning and classic a metal tone or name on this field is Theis kind of asti magic. Or you estimate the effect in which as soon as you put this thing, it will change the bottom itself, and then it will defeat itself, and we can discover later. It

Kris: 14:56
was just which is the notion of complexity, right?

Juan Pablo Carbajal: 14:59
Yeah. Open system. You have an open system. Yeah, You put something in, Everything else changes

Kris: 15:06
its So you're back to a planned?

Jörg: 15:10
Yes. Yes. It's interesting discussion. Like with one. Probably. We could also go on talking about open signs, reproducible research and exiting off things up to climate change and maybe come touch on that. But like what? For me was a bit eye opening was his skills. And, like treating data and models on t helped us like, uh, with our work on mechanistic emulation. And to introduce that a little bit, I think going back to the early two thousands. When, when? When I started at a rock like my professor mentioned that all the talented students they wanted to do web design, you know, like the Internet hype, You know, like, I am not getting good students now. And the thesis engineering work is so boring. You take like your your cookbook approach. If you want to design a wastewater treatment plant, you know, go from one step 1 to 10 and then you have it. Andi people were not interested, like in practice, in models at all, you know, And then he said, Come, come on, let's make it interesting. Let's let's bring uncertainty into the game, you know, design parameters. You know how What's the security of the safety factors? Let's get rid of safety factors. Let's let's design for probabilities. You know, this forms like reliability methods. Andi, In some point, if you won't like to estimate parameters, you know you condone drum up any objective function. You can throw your data edit. But we came to the point that our models were really not up to the task, especially the heavy differential equation mechanistic models we've been mentioning in the beginning to run like a 1,000,005 million times, like which might be required for a Markov chain. Monte Carlo Eso. One remedy would be to make models of modern's emulators and maybe one probably you can give us your take on emulators. You know how you experience them in your like previous work on do like what is in emulators for the wastewater field

Juan Pablo Carbajal: 17:19
her and so emulation is a very intuitive knowledge trump for anybody that has than actual science. So it would have been involved in the process off this curry. And you have some ideas about what would you observe if you set up an experiment? A. Then you do the experiment, and then you try toe a feat that say your expectations to the letter you have observed on If you realize you cannot actually describe what you're measure, it was a well, my expectations with my big little wrong. So I need to propose something new, and you go back to the sand table. In this process, it can be applied also when we have a simulator. So now we assume that the reality is a simulator and we just approach with expectations off what they are put off. The simulators will be on. We ran some simulations, and then we try to feed a our expectations toe that data on the person Go see him if you are lucky. A your reputation or your understanding what you believe you where you will be fine in this a similar than you found it. And then you have, as you say, a model off a mother. Not that you have a somehow reduce your whole mother into this a in portable data that you produced Mm, in particular. So the application the United States about this survey model is what you mentioned, basically, is that because you are, you're reducing the whole question toe some input output data. You may get rid off many intermediate computations, your simulators doing a name on get Adina Zeya. Some something that's much issued to evaluate than the simulated itself. I like to always put this example in which, and imagine you build your spring like in a spring mass system, right? Springing out off Adams. You know, metal atoms. You may crystalline structures with iron a atoms, and then you build a new minister, you know, to collect mean many numbers off abogado atoms off things, and you build a macroscopic a spring. And of course, you think Well, I can only think

Jörg: 19:33
in atoms, Chris, we has wastewater people way. Try, right? Like hanging guys like

Juan Pablo Carbajal: 19:40
I help you get in my mind that you can build this spring with millions of millions of little spheres connected with little springs by themselves, right? And I said, Well, I will now simulate this A and sambal off molecules right off particles to see what is the relation between the forced the spring. That's when I compress it. All right. So you will simulate all these atoms and these molecules to see when you put some pressure on the boundaries of the spring. What is the force? You get us a response in order. Why aren't people some force? What is a delirium? A volume off the system on em. And this will take very long because he will be simulating all the quantum dynamics or whatever you have put all the molecular simulation off your molecule and again. But you're gonna find if you plot basically the force to do aliens and information. The cable deformation you observed is really some sort of hoax law. So will be maybe a line or line with some separation. You know, you will have some sometimes called initial simulation of the mechanical properties off the device. But you could have a start to say, Well, you know, I will fit a. You could take that. They don't feed a model based on Hook's law from elastic material, and now you will get that much simpler more than the simulator we just described is more just basically ask you, Katie Me? What is your fourth? I will tell you what is my information. And I know nothing about molecules and interaction between Mali. All right, that's a little more off a model and which is also in these contexts will be called Khar Screening Your more you start reducing the details you are to your mom and which are common techniques in Yeah, make material science and chemistry.

Jörg: 21:24
Yes, and I think, what, what? Our idea originally, Waas is not to just take some neural network or black box model or like machine learning. Let's say toolbox, you know, and make a replica car off our urban drainage simulator, which translates rainfall into like a runoff wave or like water levels in the tank. But, like also encode some mechanisms into each Can you briefly say what would be the advantages and what you could tell our listeners. Whether you still would prefer Goshen process is over, like some long shirt memory stuff. Or maybe what has changed things? Our initial collaboration in 2015.

Juan Pablo Carbajal: 22:11
I think you have heard me say this many times, but I never choose the moderate choose a model before I never just a metal before. I understand what I need to do for me, the metal is something that comes when I understand what I need to do. I don't say okay. Gonna solve all my programs with a with Russian processes because they may not be the tool that they suited for certain problem may be the new networks are better for the task. So and not married toe a medal. But they What I see is that for the many problems we have faced together in many cases, I decided that gushing process, for example, was a way to go because the problems we have we're suited for that mentally. And But yeah, it's not always the case. Sometimes this is not the way to go. For example, in tryingto do some predictions on tweets a using Russian process. I think it's quite a waste of time. Can

Jörg: 23:05
you say that again? I didn't copy that on.

Juan Pablo Carbajal: 23:07
So if you just want to say you put you pick a person, right, let's say Donald Trump and then you want to predict the content off his next tweet. So what will be his name about, right? Got it So this is a promise so complex in nature and for which we have so little mechanised acknowledge where maybe in the case of Trump is not so complicated. Yeah, yeah, yeah. Breaking into a channel some play, but some human Let's say, on your this very compass, we almost have no knowledge. This little things we can hang on before we look at the data basically the history of the streets on the context in which the streets were generated. So inside the situation, you need this kind of universal machine that comfort everything and then we'll find any sort of correlations in your data. A took in our near never near lever on this kind of approach is there. And also you have an enormous amount of the guerrilla war. Okay, Depends also on the person he tried to do that with me. You will find that you're not any morning big data environments trick, but yeah, one could assume that the human behavior produce a lot off a this data, and then, yeah, you will need to process millions and millions off a that interest toe, get your thing. But I think the Corvette is that we if we want to approach that the problem, nothing will be very hard toe agree on what is our prior knowledge? What do we know about these things As models, right? We may have him. Okay, He's a conservative or he is less handed, you know, things like that. But how we put those things in tomorrow's right into mathematical, most we can use for privilege. And I think there will be a challenging task the least a So in such situations in which there is no prior knowledge, I'm very happy using completely. It's a tabula rasa metal that you want. Yeah,

Jörg: 25:05
So Oh, sorry, Chris. Yes, go ahead.

Kris: 25:10
And so, in machine learning, one of the more common prior is that people who uses is a smoothness. Prior, when I was a lot to do with computational efficiency that comes with it, um could could you give an example of another prior? That is, maybe goes beyond just assuming smoothness.

Juan Pablo Carbajal: 25:31
Okay, I think one thing it's valid sometimes to think depends on the background, of course, is in what type off prior information. Right. And I can't distinguish too big a branches here. One is like prior information on data on this is usually reflected us prior distribution on your problem. So you know, more or less Oh, if I do this experiment that will get the value around this thing and then you can start with this priority situation. Right? Is from here is pray information you have on the data that you will observe either the direct measurement or the parameters evil of thing him. Then you have on some other kind off a structural prion information, which is more related, What your thing was. Okay, I expect here to observe a small functions, or they should be more less mauled by this differential equation. Or am Yeah, you can is more again, I think it says, as I said before, some more abstract level in which your knowledge is not about the datable observed, but actually the mechanisms or they that is generating process and then

Kris: 26:40
one more coat also. So if there's time by where we can talk about that, the second group would be more causal in nature.

Juan Pablo Carbajal: 26:47
It is effort, democracy, but I like to call those, like a softer than the date right is not yeah in so wave for the that a type a kind of prior information That's not very difficult. You can handle that most A A in most statistical learning methods they will allow you to do. To impose thes things is usually will appear as constraint, even if it's not for a realistic learning. It was that you can impose constraints, you know, is this positive is not be within these boundaries, which ended. You can translate that as prior distribution. So that's certainly not what this had to twist this for the type of prior information that we have a simple case. Okay. And let's take flow into a pipe, right? I know there is some sort off transport there like maybe people be able to produce a couple of equation that 3% trump. Of course it. And I know some of this must be there, so I don't know nothing about the leader. I just I want to know that whatever thing you learned out of the state that should be compatible with these processes, you're a predictor or your diagnosis. Er should be compatible with this ideas. About five years. This off course with a lot of effort can be translated into that Triers. But again, it's like a hack is something that you will be able to transform if you have enough time and enough data, these source prayers that they are structural Prior's probably would like to call them better like these a structure prior into that of fryers. But they is not always the one you want the way you want to go because somebody's heart problem a name also from a more a mathematical point of view. Usually it is a structure. Prayers are a lot stronger than the latter prior meaning that you're learning problem because I love easier to solve me. Then if you just constraining the values of your parameters,

Kris: 28:49
I think some people would call this common sense also especially looking back at the early, the only buildings off a I, um But I didn't, um hasn't really solved. That s Oh, so I agree that this is a very open question.

Juan Pablo Carbajal: 29:11
I'm especially people in the country of General Ai. It is absolutely open, like we don't know anything about how to do this. Basically, the DSP acknowledge, but yeah, we represent expert knows still a big issue. And on there, but again here. We're talking on things that A is closer to math. Let's say so. We're talking this structure prize will be some sort off mathematical Moeller in when she it's still open, I will say, But it is not that difficult as a general problem off representing this normal. And it's a more concrete person is a subset off those problems, I think, and beyond. Mm, yeah, when it comes to crease to York, I saw what they were doing and essentially in because I was already fighting with his brother. Like, let's say I have a process for which I assume certain differential equations should be well or septum. Principles like energy should be concerned in your mo mentum should be concerned or a you know, things like that and symmetries. These are senators only. I know, sir. Consider how do I tell my learning machinery that is just looking at the data that it should respect even approximately. These symmetries I know are there. And eso When I saw what they were doing, essentially, I I saw that they Yeah, they have this very strong structure prior and they could use it and the metals that allows you to do Strong structural prayers are caution processes that I want to want related with common filters am on DATs how he basically came with this particular basing like this is what you need. This is your problem on and on. And also what we discover is that this is the type of prior's. You can put their on this metal directly. Alina Briers s. So basically, you're the French question is to Alina different quick. And And if you have money emeritus thesis again, an open problem. Catch me if you see, that's a comma. Before you

Jörg: 31:18
know, it was a story I didn't want t o harp on the duality off the common field and the caution process. But we can, like, come back to that NATO. Thank you said if you wanted to you if you see the Goshen process as a common filter and then I was like stopping you.

Juan Pablo Carbajal: 31:40
Yeah. So there is this quality. Exactly. So they're sort of duality between a common fitted concussion process. It's a beautiful dualities. England, That is my academic on. My 50 sees a in Geist a resonating here. It is beautiful. You know that because this loan had wisely. It is while in Oia, in the 75 a. M a. Mohegan O'Hagan O'Hagan rediscovered for on the But it was long no in the function analysis community on, in the approximation, a fun from a theory. So these are very old here that it was not know, but it was so obscure and so much embedded in jargon that he wasn't transport. But also we can't even say it was only nobody's people, and this one is very good. This paper is calling that is called a statistical displaying interpolation, or something that by O'Hagan linking seven Father Can you this reference like their own? And it's been republished not because of the people said, because the people of service a applied and it is beautiful like most of okay on people's. But there's nothing particularly special off the paper. But what this is special is the review process. There is a lot to learn from the review process, so they had been republished with all the discussion of the review process is like a textbook because the guys that the new things and they you see how they try to communicate this with a personal it's not from their field on day. Also, you see the complaints or he can explicitly complaint and told them. Look, guys, you're telling What I did was already known, but it was only known by you in your, you know, in your am ivory tower, like we couldn't access it. So for us is the same much it's not there. A. So it's really beautiful reading the review processes

Kris: 33:29
alone, but in a different language.

Juan Pablo Carbajal: 33:31
Yeah, it's in a different manner. It was

Jörg: 33:33
a small community, right? And I think this is also something which I really like to collaborate with. Physicists, you know, they give you a totally different perspective and bring you a method which you were looking forward. It didn't exist in engineering more or less. Right? So maybe one thing what I would like to come back to is the application, you know, like we still haven't touched on What? What's the use off the emulator, right? Or what's the use off a mechanistic emulator, if you can, and just, like, run you more than a 1,000,000 times and then fit some your network to it and then make some cross validation and you you have your fast replica off your engineering model. Eso. We've been talking about inference making sense of data with respect to parameters of a certain model. But what else could you use? I am your latest four,

Juan Pablo Carbajal: 34:29
so essentially you can you emulate. You can use them little for anything you will use your simulator for. But that's let's say it's a part of life. And why? Because you're emulator will be specific. It will not be able to do everything your simulated thus because if it can do, it is just your simulator, the complete thing. So if you can do everything your simulator CanDo, then is your simulator off course. We're assuming here that your simulator from a implementation point of view, it's some somehow optimal that there no a. That's a so for engineering problem that you are doing something in a particularly bad way, and then it's taking. Put it on. I think so. So let's forget about the Let's imagine when we say simulator. Imagine this is the best simulator can be called this for that problem with the knowledge we have. So if we have an emulator that can do everything that I can go, then it's just that guy. Because this is the minimal program. You can have to produce all these things, eh? So in. But, um, you can say Well, you know, I'm not using my simulator for everything. He can do that. Seldom. The case A we use this. The simulator toe investigate. A particular question on this question will have attached to it. Some observable is we care about. For example, you will run your aim reactor model. You know, you already seen by complex a symbol model, and then you will care about Okay, Just care about the concentration off nitrogen after three hours or something over, or even the curve off a nighters and concert consideration till it reaches certain value or till establishes that the second value and things like that. So you have a specific questions that do not use everything they similar to sexual similar. We just a little part of it. And this is the part in which you wanna build your emulator. So you're emulator. Now, if you are able to with their computers for anything you A, we're going to use your simulated for. So if you were going toe toe on optimization. You also will be able to do it, even give, you know more than what their simulator can do in terms off. A result, it will give you the same things faster, right? With less effort. That's why and then allow you toe maybe ask more complicated question a many times. For example, if you want to do our sensitivity analysis or just been properly it and certainly stromal it just counting, the amount of simulations will have to do in to propagate. That's a five parameters to know the uncertain people generating a curve. It's basically Brooks for samplings were gonna sample the parameters in the regions that is interesting for us and then just run the simulation, look at the result story and then go back again. Sample new parameters, a run, the simulation answer, and then we'll have basically distribution of this up. And that's what we call the posterior. If you want or a they propagated your uncertainties to the up right on him. This will be input on centralised, and so it

Jörg: 37:31
can you say a bit why this would be important for science. I still remember this problem of our colleagues were modern lakes, you know, like some I don't know what, like concentration distribution over the profile off a lake, you know, And then they have a really complex CFT model or like some some turbulence in there. You can you can you explain to our listeners why would be beneficial to run many simulations instead of just 12?

Juan Pablo Carbajal: 37:59
They have an example from my current to provide, so that involves the Russian roulette. But I will not use it is too gory. But that is not only important for science is important for anybody a trying to make a decision and inform country quantitative decision. And when we choose a model to represent the reality we had make a lot off assumptions. We have removed a lot off a aspect off reality we assume are not collect. And I'm not necessary to this page. And even if you're right at that level, it is seldom because that you know all the data your simulator needs to actually run. So Dwight toe around even to a single output. It will still need to make some assumption on the values off some parameters. You will have to find out some parameters from somewhere in sa this Callie with a fundraising process there. And now you have run your your simulation and you've got this far right. And then we say, Well, what about if I have looked in another books or if I ask other people, they will have come with, like, two different values for these parameters I took right? So if I run the simulation again, I see Oh, I have now a different result. We may be slightly from basically from what? And now you want to take a decision, you know, like, is this a good output or it? It's about out. Now you're being aware that, you know, you're similarly either as some sort off a imposition on it and to take this decision, then you will need toe Look at the spectrum off our boat, dissimilar to can generate, right. And only then you will say, Oh, yeah, you know, I'm safe a 50% of the time or 19% with them, and I'm happy A or the opposite. But basically any kind off the station race on models, it requires that we accept that our model is just that It's an approximation off the real prospects on this approximation introduces or produces that the outputs are not one number or one curve an infant number. Of course, it's a distribution, of course, right, and it's going to take a decision about that output. You likely want to be a aware off how the situation looks like and alright, that was completing a problem.

Jörg: 40:11
Yes, that was concrete. I think we've been discussing this in our group like quiet a lot of times. Andi, In the engineering reality out there, it's like something which is not there. It's also the question whether you really want to go down that road, right? Like if the engineer, if typically we create value by by solving real world problems, your river is polluted. You build a wastewater treatment plant and you build it big enough so it solves the problem. And then the decision makers. They don't really want to go into the details, you know, it's like they don't want to be informed about, like probabilities of exceeding certain pollutant levels. It's it's probably like an example is going to the doctors, right. You have some chest pains and pressure. You can't breathe very well on then you go to the doctor and he says, Well, you have 60% probability that it is this illness and 40% probability years It's that illness, you know, it's you probably just want to make the they get like the you like a yes, no type off like Ah, go not go type off like expert opinion. Andi. Yes, Whether that s so. That's like going to very details is probably not or has not yet arrived in the practical community. But

Juan Pablo Carbajal: 41:34
if I really like him, what you're used to on a what you e salute off an educational process right now and then, especially not other. I've bean discussing this with people from the tree drinking water community which need toe set. The value off the cubic liter for the use is fine. So they're raising interest stolen. Um, have these a cantons thesis segmentation Zoff the country and then within this come from there are further seven stations and they're called reminders or municipalities. This municipalities, they need to decide on the price of water, right? Yes, I am. And then it is. If you have the opportunity, I will just let the scooter and see how the decision is. Speed may on, and it's not very different from going to church and asking your your priest. It's basically that does that Breast is the oldest guy with the most political power on the group, and they in a way, they use, of course, the data we have from previously. But it's always a little bit off good destination from the expert. And if you would like to out it, this decision or the position, or do any kind off quantitative analysis and why was he surprised? You basically have to open this guy brain and start looking at the neural because nobody but him knows how he arrived to a conclusion. So right being able toe. So there is this thing, okay, They if you're sick, right, and you want to know? Okay, Should I take the pill or or not? And I agreed that for personal is not used to the language of probabilities. No improbabilities this useless because he doesn't know what to do with it. And so maybe you need to digest this in other times, like, you know, yes, that people put it in terms off odds that for the layman are easy to understand. Like you know, If you pick up the symptoms three out off six times you will have the disease or something that that someone is Maybe he said to understand just a language problem. But they you know, they also some people like to be told what is the way it is up to the experts to tell them? Yes, do it now. Do it on their responsibility lies on the person that except that a expert decision.

Jörg: 43:55
Yes, And I think a za motile an engineering consultancy. You know, you would be more than happy if you're commercial software solver on the press of a button, give you, ah, distribution of water levels or like off off, off, exceeding, like probabilities. Right. And that's in a way, what we do like with these long term simulations in sewer design where we not look at one storm or two storms but like consider all the stones which have, like passed over the catchment mint last 20 years, you know? So like, look at the empirical distribution. But what we cannot do in practice still is like parameter uncertainty. Moral structure, uncertainty. Yes, but we've been talking for a long time, Onda. One topic we have not been talking about is like one of the issues that Chris me. We're struggling with this, the quality of data. But before we come to this and maybe really short, I would like to have a break out session toe look into the neurons in home. Pablo's head himself on, like, play a little game on this. Yes, no. Or like what you prefer type of thing. And I would like it's a bit experimental. I would bounce this back with Chris. So each of us asking one question I have, like, 20 on my list. So if we go for 10 to better understand home Pablo, like, what would you prefer? Like two choices bia. All wine.

Juan Pablo Carbajal: 45:24
Quien bread carefully. Are Chocola Chocola

Jörg: 45:34
polo or shrinking between?

Kris: 45:42
Ah, I see train or trunk,

Juan Pablo Carbajal: 45:47
huh? Train

Jörg: 45:50
and linear regression versus long short term memory.

Juan Pablo Carbajal: 45:55
The adoration

Kris: 45:57
supervised learning versus unsupervised

Jörg: 46:02
Supervise Our versus Horrible Pasquale

Juan Pablo Carbajal: 46:08
Oh, are are

Jörg: 46:12
here that here that world

Kris: 46:17
title Giulio

Juan Pablo Carbajal: 46:22
Ah, transition period. A sick the fightin at the moment.

Jörg: 46:29
Long trousers or short trousers.

Juan Pablo Carbajal: 46:31
Short answer. Not in settlement.

Jörg: 46:37
Two more. Yeah, we have one, Chris. No, I have on neurosis versus Dinosaur Jr

Juan Pablo Carbajal: 46:48
Uh, no dinosaurs.

Jörg: 46:55
Jennifer Lopez vs Ossie Osborne. Jennifer Lopez One more. Yes,

Kris: 47:07
Goes that Maxwell?

Jörg: 47:11
Oh

Juan Pablo Carbajal: 47:12
Oh, man. Got scouts holder for

Jörg: 47:14
Yes, there's nothing. There's nothing beats house. That's a super fantastic podcast by the BBC. Richard with you. Yeah, everybody, you have to listen to it. It's no time about goals. And they say, like this guy was personally, apparently a bit unsure, like a bit timid and shy. And so, like he did not make his notebooks public and was keeping like all many of the ideas he had like for himself. And when he died and day they went through his notebooks. They say, Wow! And they took you know, what, 30 years or so, and they said, like if he only had like, like, talk to people about his ideas. You know, this this setback mathematics, like a couple of decades. That's no. But

Juan Pablo Carbajal: 48:02
you also if you put goes two days, he will not be the girls off those days. No, he will be toward what's interesting and what will you be centuries?

Jörg: 48:14
Well, don't you think he would be like this? like pedal Mungai, like living in some hot in Siberia or like

Juan Pablo Carbajal: 48:21
being fed by this mother. And yeah, e think that would be the only way to get goes today will be having my dad.

Kris: 48:28
And that's the only on the career stages. Hey, get sucks sucked into the the no form of publish or perish or whether he can escape

Juan Pablo Carbajal: 48:39
person. It's really have completely dissociated academia from science. I know it may offend people without, but for me, I see now is like two different things, Yes, but

Jörg: 48:49
that's like, Ah, I would I would yeah, move this to a different date because we could go on talking three hours about this on and I say I'm any of your concerns. But let's keep this for a different episode. So last big chunk or like last short chunk, because we are almost at the end time wise, Um, what do data have to say about like Sorry, what to have models to say about data quality?

Juan Pablo Carbajal: 49:18
Mm. So I know how to take your question the way you want, but I can take it from my side and say, Let's talk about data size or let's say you have many daters. Only a few points off them are good and so basically have been analysed and reliable points or missing data. Or think you end up having maybe from a large suspect only with a few points. So whatever you want to do with this data, it needs to be something that they works. Well, in the absence of a you know. So, uh, metal, um, metrology that can discover anything that has the power to fit anything in needs to discern one thing from another. A lot of points toe intimately. You can think that left, for example, on your level is taking the stations all the time. And so I have these large hypothesis off potential relations that exist in your data on By looking at different points, he will be starting discarding some off. Those on essentially end up with a one or many that are a good for you. But the more models you have, the more data you need to make that all those decisions, right? And so the model comes here because the mother removes a large set off those potential relations and maybe in your later. So what happens is that if you're able toe a set price structure prior a model in some symmetries on the that as honorary vetting process may be only a small set will give you a very good A model, even in extrapolation sense, because this morning does not need to take a lot of decisions. Toe arrived to its conclusion with only a few points. It can be a typical example. I know it's out of the death book, but this exactly the illustration of this topic so fit a line through a set off points. The minimum amount is two points, right? With two points you can define alive. If you're gonna goto a quadratic, you want any three points, right? And if you keep increasing that, say the order the wiggly nous off your feet and the more points you will need toe a toe, have toe, say okay, yes, this is a the relation in your leader. So it's a little lately, so the stronger your model is, it will need. If you were a data points toe arrived to some a strong conclusion, while a more let's a tabula rasa method will not be able to do. Usually people over feet or a It will give you some provisions, but we are huge uncertainty on things and on the in scientific creations in the cost of one data point usually is excessive. Like you may have projects with huge budget from which the result will be maybe 10 points or something that

Jörg: 52:29
you like certain, like certain you mean

Juan Pablo Carbajal: 52:33
okay, since yes, that may be a senator. It is definitely a a lot off data, but they cannot to reach to that point. There was a huge effort, Yes. Investigating best menu. So I would say, in science, the number of the other point is proportional to the investment you have that So a certain has a huge investment produce and don't need then is, of course, relevant data. Because you could, of course, track particles off dust in a room with a high resolution camera, blah blah, blah blah. And then, of course, it would be a lot of picks of that completely useless. So I'm talking over here. The actual data you can use, not the road eight. Yes. And so in, in, in, in scientific things, at the scaler off. One humanity off our small group of results or a single result. You sure you will nap with the scars set off that point and you want to get the maximum out of that. And and so for us is what is for us when I say I mean scientists, right? Scientific. That science? Yeah, the morals. One aspect of the mother is very important is the that size error rate that means how many points. So I need till this mother will tell me something a element will achieve shirt on recently beautify E a sample size of power calculations for statisticians. Right. That's what

Jörg: 54:03
Can you repeat that began? Data size error rate?

Juan Pablo Carbajal: 54:06
Yes, that the size error rate. So that means how is the error? So the extrapolation error off your model A Well, it's a the error on unseen, out of sample error off your model related to a number off samples you had. That's that's all right. Usually in most statistical learning a mental, what you have is a symptomatic learning. That means the result you have on commercial lease for infinite amount of it. So in that case, you know you're gonna learn, but they don't tell you. Okay? How we gonna arrive to these a thing right on an in justification. There are some inequalities that are usedto for this is always a very expensive process toe and the computer, you know, to get the exact results for the models. So in, Yeah, your basic criteria and experience. Always. You are great. What models have been successful? Use a toe to fit a small data set. And what you will find is that are those metals are ready to put strong structure trying because essentially, you had adding a lot of information. And again, if you don't see your data, you have learned also something you're strongly structure prior. It's not the right one. Yes, I think Yeah,

Jörg: 55:25
Yeah, I think I think what I was aiming at. And yes, of course, like we're making this for wastewater for wastewater professionals or scientists. And I like what? What Chris is especially struggling with, you know, like, 30 years ago, you did not have much information from your should. Now we are trying to put their like ion selective electrodes which give you like high resolution time. Siri's. But we are sometimes very skeptical. You know what, how to interpret the data because they're sensitive to other irons in the water. More like temperature effects, you know, So you get a nice time. Siri's Onda. We have the engineers who do not have the methods to make sense of the data. And we have the data. Science is to take the data at face value, right? Yeah. So

Juan Pablo Carbajal: 56:18
So, I mean, if you have a very indirect observations or observations that do not only absurd what you want to sell, what you want prices, but observes process at the same time both here moldy, playing a big role. Because if you wonder Saito the effect of the other things that you are not absurd. What you can do is you need to model those preferences. Are those influences in your later and off course? Then you would say, Well, I could go and measure those all the process by other means of put a sensor that will be sensitive toe a compliment that he set off process. Right? So you have a sensor that they is sensitive to us set. Let's call it a off processes right there. Many process that changed the signal of this sense. And then you have another sensor that these sensitive to another set of processes go be what you want to find its to censor that have some intersection on. Then we were distant, since it will be more or less able to filter out what you aimed what you're looking for, this little interest a smaller the intersection, the more precisely will be ableto get up. But, um, that's not always possible, right, s So you just have these things that you have some mechanistic nature. You know what reactions are there and what things can affect. So what? They told that you have 100 stuff. It's your prior knowledge and your models on a yes. By doing that, it's not an easy job. You could try to extract from those signals that you make sure the actual in signal that you're looking for It is not uncommon. For example, if you do a spectrograph E, this is the approach in a spectrograph, for you will measure some sample and then you will get this basically grass a draft I saw grass line is looked like grass and then you this is the spikes, right? So and so you have a lot of noise and in spite and in spite. And despite you will try to identify a elements by much in those speaks but against what you might all right, So ok, nowadays we can go and search millions off millions off this foreign spectra and try to match them by that you can do today. But then is not necessarily the case of curing them in experiment with the The components have not been just five years. Let's go back to ah, a couple of years old. It's 18 years or 20 years in the past. What you have, you have to make a century. What can be in that sample on this? A ruling out those components with spikes till you, you know, find what you're looking for. But the important persons here is that you will have somehow to model. What else is there So you can assign those picks to this other things and remove them? If you don't, you don't care about the moment. So if I don't know, probably this a name for this type of data, we just basically they could be good day. The samples is theirs that they're a no direct measurements off the process you care about Not only there not only measure your process but many others that are there and so lacking. For example, if you this is a physics approach will be used some sort off mean field approximation. That means okay, there are so many processes besides the one I'm doing and all of them will appear some effective signal, right? You will not try to isolate all the process in the pendulum. You will put them all together in a bag. And so came, I signal will have this kind of background signals at least. All this all the processes on Bright did some sent off a statistical model right off. What else is there? And then, if you're lucky, you could do some sort off averaging. And this is your meeting over a your background noise removal and

Kris: 1:0:04
one last challenge there is that that I get the sensor some sense that this background is changing all the time, to the point that it's very difficult to separate changes in the background from changes in the things that you want to learn about that you want to measure. And I think that's a particular particularly nasty problem, definitely. But it was big changes.

Juan Pablo Carbajal: 1:0:33
Yeah, depending on the scenario, it could be actually an advantage. So we're talking about is here about observational study. That means we just measure We cannot directly perturb the plant because this inconstancy against the background is actually used by roads to identify where they want to identify. So basically it's imbalances is used in terms of them back. So you identify one you want toe A No, because it's not changing when you do certain things to your sister so and you can do the compliment also. So you say OK, look, if I throw water into my tank A, I should observe that this doesn't change, right? So everything that changes is something else. So if you still does that But then you know it was You know, I know this particular intervention on my system will affect my process alone. Or they say I only have reduced sort of process. Okay, Then you do it Everything that doesn't change. That's what we want to remove rest. So you can do this in a way. Sometimes in the microscope you do this right you do light for in the front and light from behind just to find out exactly what you're looking at

Kris: 1:1:46
on this spring's US back toe causal. Absolutely. Also reasoning on also to experimental design, particularly feedback system still to online exporter Nuttall design. Yeah, but it is a flying a bit under the radar right now.

Jörg: 1:2:05
It is totally, totally. I'm in my view, that's, uh, yes. So I make a note experiment of dine on design. Sorry, another episode. One thing I would like to ask and that's probably going towards the end. You know, like what to do if people don't care about their senses about the data like the normal wastewater treatment plant. Joe, you know, like he couldn't care less if the census drifting right can How can we say that? My first question. How can we teach him or like, like, motivate him about the importance of data? On the second question? That's probably the end is like, Oh, yeah, I I bring that up. Let's let's discuss that first. You know, like, how can we make people aware that, like bad data What? You had a really nice name for this this big g go know what was your What was your hook line, Chris.

Kris: 1:3:06
And you can't go without

Jörg: 1:3:09
you. Were you were shift? You were like, um, transforming this into

Kris: 1:3:15
Oh, yeah. Garbage in value out.

Jörg: 1:3:18
Yes. Very nice.

Kris: 1:3:21
My aspirational goal. So

Jörg: 1:3:23
what's the social component off? Garbage in value out. You eliminate garbage?

Juan Pablo Carbajal: 1:3:30
Well, yes. So you can eliminate so you can go into a filtering process. Or you can go into an enrichment process to enrichment process so you can try to get your your input A Don't lose mass on getting better. Or you can say, Well, just remove what is wrong and just rescue what is good. If

Jörg: 1:3:50
only you knew what was wrong, right?

Juan Pablo Carbajal: 1:3:52
Yeah, I Maybe he would like to map this into my much more mundane questions like a I must in our I'm getting a lot off a all cell phones. Old self. What do you get? Some for smart phone smart from the people would throw away, right? And what I find amazing is that all these cell phones perfectly work. The old are perfectly functioning hard work, right? But people throw them away and I was go back to him and ask him like why you throw them. And they think the first answer I get is they don't work anymore. They're not working anymore. So they don't perceive any more value on that machine because it doesn't work off course. The problem here is that the software is made toe make this perfectly working hard, work off, stolen right. But in everybody eats radiating to your question like if we are not able to make to transfer the value that it is this hard war that cost energy and produces trash, not the so for your own, that's actually the cheaper part of it. So this right doing the hard work? It's a little off Beiring. This good data like it's not only what you get out off it, but the fact that the cost on the effort is embedded in the heart more. I see if I can translate that back to your thing is it is a teaching process. We need to transfer a value, which is even if you think you don't get nothing about getting two decimal places. You're a measurement on the list that your thing doesn't reach this somehow an accurate representation of reality or even in just every presentation of reality. There is intrinsic value to that and you should not throw it away because it's not useful for you at the moment. Especially if there is no actual effort associated with doing that. Yes, it. So basically we need to commit these people. Your workflow will not change. You will not a be having more problems that what you have before I know it's a new thing. But will your life will be the same. You will just make your mother's happier basically right It is. It is a big problem, Do you think it is? You cannot convince people of doing things because they don't see the value in it. But I cannot convince people don't throw your hardware because still, it is still value. He has not the ready. They heard what is what you bought. You know you have the camera You were happy about a two months ago and he can do anything. I could have two months. I was just your suffered a some working anymore.

Jörg: 1:6:30
Yes. I think that's a that's Ah, that's a very nice example. I'm looking at the time a little bit. Christie, would you like to comment on quality out to make people aware of their online senses.

Kris: 1:6:43
Um, well, I think I think of it in a spiral that can go two directions. The obvious vital that we see a lot is that the center gets installed somehow due to lack of training or lack of awareness. It's not maintained. People realize that doesn't produce the right quality. And then they say, Okay, this is useless. I'm not looking at it anymore. But due to local environment to the sensitive remains installed, nobody really removes this answer. So you're collecting garbage, Andi. Obvious solution to that is removed This answer cause if if you're not looking at it and it's just producing noise that by all means, um for for the models that come behind you or after you just remove the sensor. But there is the potential for another Spiro where people are made aware of what I can do to improve signal quality. And maybe they as part of that look at the data, I realized that there is information and there look at the data because they know now that it's really to them that this information may be paying more attention to So this is room for for a better feedback cycle. But I think has as much technological aspect as as a human and team culture. Oh, aspect to it, which I think is even

Juan Pablo Carbajal: 1:8:17
more I very much your second a spider increase.

Kris: 1:8:21
Oh, I see what I aspired to it. I inspired

Juan Pablo Carbajal: 1:8:23
Teoh Say, I think Spiro. Yeah, now they say I face it a lot basically, and I think it does come. I have come to realize that the responsibility off showing the value off this improved behavior of these forcing the mother because if you see that's a purely egoistic actors right for them, unless you show to them why they will be added value on doing it like that, they will not do it. But I'm sure because it doesn't affect their work. It's affected. You say the mothers are behind or after you, so it's up to the Molitor's toe somehow be able to communicate the the extra gains. So, like if you give me niceties, you will get nice to the spark afterwards. Somehow listening always take approach. And so it's not the one I was mentioned with a cell phone, which is actually an heuristic one. But yeah, I think considering me many off the actors are purely egoistic, in which you need to show them what they're going to get back. If they give you something, it's a it's something that we need to use. And it's us, the modelers, the people who need the data that I have to make the effort to show them what could be better for them if they provide with

Jörg: 1:9:45
yes. And I think 11 important thing which one Pablo mentioned like in the early beginning, was like that. We should be aware of the data generating process here, right? Andi, if there's something which is fouling on your sensor, you know, like it's it's dampening and all causing a loss of sensitivity. Or, you know, this could be model could be removed, but I think we are almost at the end. One last point we've been discussing also a couple of times is how much of this shall we put into the engineering curriculum, right? Like, um, we once talkto well professor t th like whether we should have a data signs like brush up or, like, even short element in a lecture. But it is so for with like a chemical background, you know, biological details like 100 dynamics. You know, how important is the data signs aspect? The modeling, the likelihood function

Juan Pablo Carbajal: 1:10:44
and you're asking are generally so I guess you can guess my answer right. I will definitely sacrifice the Davis for a more general tools that can cope with future off road. I see. And now I'm in Medina a factually in a technical school. And I see a problem there, which I noticed the real many times is when the content off A the careers are tune over fitted to the current demands off the industry. Because this an in Paris, it's it's making poor engineers because they can solve the problem we have today. But usually you want on it, dear, to solve the problem, you will have tomorrow on am yes. So if you over feet this engineers tow the knowledge and the problems we have today, you will have to wait for a generation till you get the news that can solve the problems of tomorrow. So I'd rather, you know, relax on the contents and on the specially contents, and they expand the perspective off not try to make them scientists, but just give them a broader spectrum in yellow career off the things and metal ologists some things that can be applied, problems that can face they could face. But I wanna work on No, no name. I know in the local utility, they don't do this. I don't know. I don't need these, right?

Jörg: 1:12:13
Yes, yes. The alternatives that each consultancy hires, Ah, statistician or a data especially stretch, which then helps the helps the technicians or like domain specialist toe toe Not over fit probably

Juan Pablo Carbajal: 1:12:35
No, it is hard. So you see, let me bring this earth and you can remove it If it's too long, It's not really, but I think it's what I want. So, Price, I was really in the buffalo way. No, they Yeah, it's the environment in my Racine, always from back for it is on there There was this a model or this project to use the lake off storage toe cool down the city, right or a And it was, you know, all It's cell like a beautiful idea, you know, because the latest arrester war blah, blah, blah and from an injury in turmoil engineer, mechanic and Dana point of view. What? This a fantastic solution. The expert on thermal engineer Building Commissioner. So we're very happy with this idea and he was presented ideas. Actually, Amarasinghe Spirits, I think the previous number and and for people that maybe have a more brother expect and you immediately ask Well, what happened with the ecology of the Lakers story? What have you start with? Ecology of the lake off? Sorry. You start warming up the leg, right? How many degrees will coughs? A disaster for the color on this, A little bit of permits off. Having expert forage problem is that each expert look in its little reon right and make this local optimization. And it is very hard to build teams that communicate effectively with people with different bagger experts in different backgrounds to communicate effectively to arrive to a more holistic view off the problem. Right? So my believe and pretty convinced about this is a team's off experts are actually not that good is very to have a few expert, maybe in the key topics and then all the rest of the general. It's a So you need this A is a grand optional

Jörg: 1:14:24
local minimum. Yes, that's

Juan Pablo Carbajal: 1:14:26
Yeah. So there you have these run people that each one pulls in on the action. But the people that received this pool don't have a particular entrainment with those guys. They, of course, get pull in the direction. But they're still attached to the general approach to the holistic view of the throne. Mm, So on. And he was surprised because I was suspecting that at some point in this magazine, with just about environment, basically, they were mentioned the perm off. What will happen if you warm up? Let's say, by half a degree the Lakers nothing mentioned. It was like, I'm surprised. Like what happened with the review process here. So I am body to read, I think, is the previous number. If not, I will send it to you. USA. Yeah, it's amazing. It's only

Jörg: 1:15:09
way we can put it any compartment. We can put it in the notes. It's in German, though, for all the international speakers. The ones from were listening in from Argentina. Sorry. Um, I think we should close, so thanks a lot, E, you're talking. I'm really enjoying the conversation. It's just like time's running out. Sun is setting here in Switzerland. Um, in the US Tennessee is it's

Kris: 1:15:42
shining bright.

Jörg: 1:15:45
Yes, you will forget anything. Chris

Kris: 1:15:47
and I was this time that this has been great. If if all episodes like this that this thing, this should be fun,

Juan Pablo Carbajal: 1:15:56
we'll be retention. May 1 more. About three

Kris: 1:16:01
more episodes with public

Jörg: 1:16:04
that has a lot of food for thought out there. And like, if you have any suggestions like leave us Ah, another work. Send us an email. You'll find us online. We have very peculiar names knowing at least we'll we'll have like this. This

Kris: 1:16:19
means we need the Twitter account.

Jörg: 1:16:23
Yeah, like, look for us like Twitter leave or something in the show notes. I think there will be some Get in touch on gear. Think about your data generating process whether you use emulators or not like, um, have fun out there. Goodbye

Kris: 1:16:48
and the flush to data podcast. Get some extra time now with a few minutes of bonus material from Chris York and the guest.

Jörg: 1:16:56
Okay, that was good. And I a lot of from a lot of things, really like because I was reading today about ethical a I know in this my in my German magazine. That was really good. It was really a good article. Like they say, Well, yeah, it's just a tool, you know? It's like it depends on their pick a that there no ethical knives per se. You know, there's no ethical knife you have to make. Make sure that's it's like manufacturing their way. It doesn't break and like, sticks to your finger. But there's no ethical knife. Yes, but there was That was good. It was good.

Juan Pablo Carbajal: 1:17:31
I like a little bit love you. Stand there would say it was a German tactical on the interpret Double A God. Yes, I think it was Yes, I found It's nice that again is coming to the general public. This kind of discussion. I think it's very positive it goes against the hype. A

Jörg: 1:17:49
Chris doesn't knows this like there was like a motion by the U. They won't have transparent algorithms basically, and they want to shift the burden on the manufacturer. So if you build a machine which, like discriminates, are like in this us this, this this, this prison software, you know, which like makes predictions what your sentences based on your whatever they feed. You know, it punishes black guys or, like puts, gives them higher sentences for whatever reason do you think? And if your eyes trained on women, you know, like, uh, whatever you feed them because women are more with kids, you know more in kitchens than men. You know this thing might discriminate women for, like, certain tasks. So they push the

Kris: 1:18:33
responsibility and all of those games mantra. And although the bases ah, it's inadvertent, right? The people that built the model did not have this intention.

Jörg: 1:18:45
No, no. And

Kris: 1:18:46
what the model is learning its or the biases that exists in our society.

Juan Pablo Carbajal: 1:18:52
Italy's if the misuse some time off the uses of action. This in several decision support system. When you read the article that designed the system, they caveats on the misuse were clearly presented. And then when you see Howard issues, it is exactly for the caveats. And Vinicius, it's right And like, don't use these to the way a selection. You know, it's evaluation many of these things that you will ever regret. Your decision not You're not gonna produce the decision based on this evaluation, so embarrassing the arrow. You have a system that makes your decision tells your pain they were twenties, came good or bad. And then what we're gonna do is the opposite. You want to start searching for this is our evaluated positively and use them and apply those decisions like the cracked abated. It's a good measure, but it's a horrible selection. Criteria. Yes, good scientific have impact a high impact factor. But having an impact factor doesn't mean you're a good scientists. An inversion of the error you see many times so misuses off the smoke is for the jail thing about the common on. But I like the question is basically well, it was explicit or implicit. I don't remember now in that article is that basically the models meet once? Or the question why? And unless these guys are aware off the latest results on social learning, no statistical learning algorithm can answer that question. That means that your article, if they do this actually what they're saying is, no matter that exists today come used. It's essentially that

Jörg: 1:20:34
I don't know. That's

Juan Pablo Carbajal: 1:20:36
like, I think I

Jörg: 1:20:36
read 11 commentary of ah m I. T. Guy who says like if we go down that road, it were caused bias because people like over confident with computers at the moment now, already. And if you make this like, ah, quality stamp on it, you know, they even trust the methods more, you know, And the quality stuff. As you say, you know it, it will be no 100%

Juan Pablo Carbajal: 1:21:01
waterproof. Where they proposed is not a quality isn't, is just They hired the bar on what is an acceptable solution because basically what they say is your method should allow users to us. Why why it come to this decision. So, actually is asking is allowing the use is not to dress that would be now

Kris: 1:21:21
allow model

Juan Pablo Carbajal: 1:21:22
criticism over how did. In a way they are. What they say is this also idea now, in in many of the systems like the ones we're using now with encryption school, no trust system, right? That means the point is not that when you are using a service, you should trust the company or the provider that they will do good things with your later. The point is that you should not need trust. Of course you can trust if you like the company. But the point that you should be doing this without any need of trusting these people you know that the method they are using doesn't allow misuse off this thing. Right? So there is no trust needed. Consider Trout City off course. The promoter is a Snowden, but the I like the idea a lot. I think it goes the same for for this machine learning about like we should not be asked to trust because our this and renewal approximated We have so many data, we apply this and I should not. It's really just need to understand why is this guy doing what it's doing on them? Anya.

Jörg: 1:22:28
Yeah, that's that's very interesting. And it's inevitable, right? Like, yes and another. Another point I was missing out. Is this like what physicists like to do? Is this hypothesis testing right? Like engineers, we don't have hypothesis is, you know, no paper off. What we have is his hypothesis. You know, there's no I. I prophesized that the bacteria not growing and then all they're growing. So let's like a lasting alive. Maybe, you know, like but maybe they're just inflated by some cosmic force, you know that. So that's like, Yeah, I'm a bit skeptical about engineering science, you know, like whether that one

Juan Pablo Carbajal: 1:23:11
is sure. You see, at least, especially with Argentinean engineers. But I noticed the same unit eight engineer is that they're trained for corruption. Mention for we're not trying to doubt that train for confirmation. That means they're trained chess, believe on this mold. And then if they testing the model means basically confirmed that the mother works, which is exactly not what confirmation what they a science will do. Basically, you will hear the sign. An experiment, not toe show how it work. It was. But actually you don't have the sign. Explain the Senate bill cereal to your metal. That's tryingto your unnamed. But in a way, this is a training issue because they you don't want that every time an engineer gets ah, job, toe a dimension, a pie that he's are asking. Okay, Should I trust the this question ocean, that is a whore is dis applicable. You may want to have some sort off switch that someone says back now the critical thinking on Then switch it off on your on. Compute a Yes, but I don't know how to have these. So people that are either critical at some point some panel, of course what my thing is like, if you are always critical, I think you can make an effort and try to sweep you off. But if you're never critical, I don't see how you can treat it on suddenly. And so I rather have Yeah, I don't have a critical engineers that they are very good at switching off the criticism basic and just go on and do it a don't suffer paralysis by analysis. But at the end of the day that the if the time comes to say okay, is this a question I should be used in the situation or a thing that I'm doing the right thing? E

Jörg: 1:25:01
I think it's more. It's probably more Monday, You know, the critical thinking is at the level off the reminder heart. You know, the Municipality Council, you know, if they commission like a huge pipe, you know where the engineer would say Now, guys, this doesn't make sense. You know, you don't spend 10 millions on this problem, which you can solve with. I don't know what like a alternative solution. You know, it's very close to generating value to the to the community. What? But on a scientific level, I still would like to read the engineering paper, which puts Ah, no hypothesis, you know? Yes. What about? And I I still remember like your your experiences when you were co editing this special issue. Yeah. What was it? Data Hydra in dramatics conference like the

Juan Pablo Carbajal: 1:25:49
Is this a MDP ai Is Realogy name? Yes, Machine learning applications for yes. Either. With you?

Jörg: 1:25:58
Yes. Now where people come with algorithms, you know all my albums better than this. And then you just apply some toolbox and you think it's science. You know, it's it's valuable to publish, like, comparison off algorithms,

Juan Pablo Carbajal: 1:26:14
I think One crease mentioning in and over the thing that off course, we we learn we have done something we personally like I have not done before. And then we think, Oh, I need to publish this right. But that you learn something doesn't mean that they committed us. I know and increase was played this out once, and so a Yeah, I remember my my mentors in in physics when once we thought we had discovered something. You were okay. This seems to be something new. This will immediately trigger. They know sleep reading paper process in which you go to a library with your supervisor. You see there for a night launch searching paper to see if this is not already during the community. Right. And nowadays, it's much easier because you need to go to leverages, got whole scholar or any other paper index on and started searching for the past and for the future as well in so it should. One would think today should be so easy to verify. If what you have discovered for you, it's actually a discover for the community. It is never done. It is no off course. And you see why it's the fit their Keramik behavior, right? In a current me, you don't want to realize yourself that what you have done is no, no well, right you that ago say it's noble by ignorance. You know, I think it's noble, but I actually don't know and see if it goes through the rivers on and then you got your publication, and then you're building your your niche. But yes, if we will go against self critical and silky is what I'm doing. Actually knew to be honest, it's most of the things What locks me in tow long analyses on a few papers is that I start reading. I like to read a lot. And then you started coming. How they think you thought you were super clever. Ready? We're extent where you were rediscovering something that it was forgotten or it was just another field or something. That so you end up asking yourself Well, what is the value of this thing? And they usually for me, if I managed to put myself into Pakistan will be OK. What is the value for this community that that's a no, this thing that these other community knew it already, right? So it's not like I'm gonna go for science because it's completely new. But I was okay. Let's find this community that could use this thing that I rediscover. And I'm able to put in the language on then and tell it to them for some of this connection between common on Gush. Impressive. Right in.

Kris: 1:28:49
You know, we can We can predict if you re discoveries, right? Yeah. Next 5 10 years Machine Learning Committee will will only discover feedback systems. Have

Juan Pablo Carbajal: 1:29:04
you need you need you that they're going, They're going

Kris: 1:29:07
todo experimental design. Exactly. So that's what they can for the next time. For sure,

Juan Pablo Carbajal: 1:29:23
Yeah, I mean, in a way

Jörg: 1:29:25
that's really nice.

Juan Pablo Carbajal: 1:29:26
Yeah, it's a e. I don't think really discovered it is not a prom, because there is. So it's a change of perspective. If I different committee, we discover something every discovered by trying something different. But the other committee so

Kris: 1:29:39
exceptional scientist will might still find okay, I discovered that it's not new, but now I can write it in a language that but my people understand and maybe connect. Connecting, integrate, synthesize. I think there's still value absolute. Of course, you don't do that when you're a PhD student. That's not something you want to risk when you're doing a PhD because coming out with the baby. But it's there. So I only discovered what you don't write it that way, but saying okay, because the sting out, he discovered that we should think about it again. But I'm not immune to stay. That's a difficult proposition

Juan Pablo Carbajal: 1:30:20
for E. I think it's a mother of context because we are talking about a system that s a little bit them of perverse prevention that sent that you're trying scientists and essentially, you are forced them to behave in a way, no sign, no scientific away, because otherwise they wouldn't let see

Kris: 1:30:41
come out, throw.

Juan Pablo Carbajal: 1:30:43
Yeah, but there, once you go through this processing on, that's always I mean, this market maturity relatively like if you've been successful applying sultan metallurgy, why would you change it afterwards? After you may think, Yeah, the other thing is good. But you know this what? Bringing food to my table. This was giving me successes. What, giving me fame or a you know, putting me on top of my peers or something that why will be the motivation to change that behavior. So people who do it it's interesting. It's an interesting lesson, and I think nowadays, in our society, in our economy, asking for a treaty behavior, it's not, they say, the most popular thing.

Kris: 1:31:26
I think it's put a certain section off. The audience will interpret it as a weakness.

Juan Pablo Carbajal: 1:31:33
Absolutely, absolutely. And I'm thinking

Kris: 1:31:35
about the sea of the let's say, the Sea of D in four years you worked with mesh models, and now you discover part this poor smooth particle hydrodynamic stuff and you switch. I can't imagine that half of you don't instinct what's going on. Why, you know, how does this person not trust that the models and the tools that they build for 10 years, why it's person switching? That's only requires that an additional. I wrote a strong level of confidence to allow yourself to do that, which is what you do. But not not everybody has has the level of confidence of great I'm switching gears. I'm doing stuff. I'm approaching the same problem from a different angle. I don't

Jörg: 1:32:21
test it with skills, you know, like not everybody can, like, move fluently between mention thinking domains. You know the generalised stuff. Yes,

Kris: 1:32:33
I'm not that many around.

Jörg: 1:32:35
No, no, no less and less. It's like

Juan Pablo Carbajal: 1:32:38
E. I don't think there is. Anything that's special are besides this way of thinking. But see that you don't want to be stuck in a particular feel about them Way are not trained. It's like this is many times in my career, I've been say I'm doing the room thing is not intelligent. What we are is what I've been. Trouble is not Russian. I hate the word rational, but can is not rational. Okay, When? When? I don't think like you and not rational. Okay, I got the point. So a a it's just a that way. Have motivations, right? My motivation was always an ideal off science that is not encumbered by politics and systems like a colony. Now they on a But in a way, if I want to the science, I somehow it needs to be embedded into the system by A and M. So, yeah, struggle there. It's a new rushes if you want

Kris: 1:33:35
beginning Oh, so under way since the science will continue whether without universities, it a Croesus and and I cant their academics right now with the current Karatz when sticks but being implemented, it's basically I can see that's I got them. Er is risking a factoring itself out of the process completely by just doing, you know, by favoring enter about both incremental stuff by favoring studies that could actually be turned by companies.

Juan Pablo Carbajal: 1:34:15
Oh, I treated us say, I don't know if you guys do. Did this article on instructed us phrase that I will just a read because I don't know it by heart, but basically and it's an article from Donahoe Weeks ago. Machine Learning also and m show what? It's my profession. Yeah. Hey, I have this quote in my in my emails also like they drew, if you probably software, you don't need to process the result. You need also the part of the whole thing. Otherwise, it's not just a propaganda. And he right this another article in which he says clemency for fine I am being breezy needs to be visible on deep thinking. It's not. Academia has largely become a small idea factory. Well, so yeah, it's him. Even asked science in the age of cell fees it's in. We are, in a way, petition. Yeah, we're punishing the behavior that brought us here. Brought us here, brought us here essentially the Palestinian which academia is placed today. Okay, what's not build with a behaviour promoting nowadays? Hymn book. He's not necessarily something bad, but

Jörg: 1:35:50
thanks to everybody for listening to this episode, we're looking forward to having you on our show the next time. Goodbye