Cyber-Security & Anti-Money Laundering | Applied AI & H2O AI | Interview with Dr. Ashrith Barthur


Sanyam Bhutani: Hey, this is
Sanyam Bhutani and you’re listening to “Chai Time Data
Science”, a podcast for data science enthusiasts, where I
interview practitioners, researchers, and Kagglers about
their journey, experience, and talk all things about data
science. Hello, and welcome to another
episode of the “Chai Time Data Science” show. In this episode,
I interview the chief security scientist, Dr. Ashrith from
h2o.ai. And as you can guess, we talk all about cybersecurity and
AI, AI broadly speaking in this episode. Ashrith has a
background in cyber security and has done a lot of interesting
research in the field. And he’s also currently doing applied
research so to speak at h2o.ai which of course we talk all
about. We discuss about cybersecurity generally
speaking, and its applications in AI, this is I know a first on
this podcast series. So I’m really excited to be sharing
this with you. We also talk a lot about anti money laundering
and the applications that h2o is working on in the cybersecurity
domain. If you’d like to know more about all of these amazing
things, Ashrith, we’ll be doing a lot of webinars soon. Again,
please scroll down to the show notes if you’d like to check
them out. For now, here’s my interview with Ashrith, all
about cyber security, anti money laundering, artificial
intelligence and applied AI in this domain. Please enjoy the
show. Hi everyone. This is a first
unique on this series where I will be talking all about cyber
security but, I’m on the call with Dr. Ashrith, thank you so
much for joining me on the “Chai Time Data Science’ podcast. Dr. Ashrith Barthur: Thanks
Sanyam, maybe we’ll keep the doctor part to who you know
people who save lives and you can just call me Ashrith. That’d
be fantastic. Sanyam Bhutani: Hehehe. Dr. Ashrith Barthur: Thank you
for inviting me. Sanyam Bhutani: Now, I want to
start by talking about your background. I’m curious how did
you discover your passion for machine learning, you followed a
learning path in cybersecurity and research. Where did machine
learning start to come into the picture for you? Dr. Ashrith Barthur: So that
that’s, that’s actually a good question. Maybe it’ll help out,
you know, people are seeking something similar. So, I, after
I finished my masters, I had this real itch for research, I
wanted to do a lot of research in the field of cybersecurity.
And it so happened that a lot of research that was actually being
done was very operational, you know, in the sense, someone’s
trying to hack you. So we just prevent you kind of thing. What
I wanted to do was much more analysis, which was, you know,
can I use algorithms? Can I use mathematical statistical models
to actually detect these things? And that’s how I ended up at
Purdue. You know, and my advisor, Dr. Williams Cleveland
is one of the really well known network security researchers in
the field like from statistics point of view. And so what I did
was always very statistical, you know, we used like random
forests and we use svms and these kind of things, but it was
never put under the umbrella of data science. And to be very
honest, you know, when you’re when you’re in the, when I was
in school, the concept of data science had not evolved, just
stock analysis, you know, analytics. And even when I was
looking out for a job, it was like, oh, you know, an analyst’s
job would be great. This is how we were thinking about. Sanyam Bhutani: Okay. Dr. Ashrith Barthur: But I got a
bit of an exposure to a bay area. And then, you know, the
term data science came in, and yeah, that’s, that’s how I
probably moved towards fish. Sanyam Bhutani: Okay. Is it
common in your field for an outsider, talking about
statistical analysis? Is that common in cybersecurity? Dr. Ashrith Barthur: So when I
started, it was definitely not that common. There were only a
few people who had actually published papers, you know,
using using the approach that I had spoken about before me, and
they were like the pioneers of you know, people who did it. It
is still not common because one of the things about cyber
security is it’s always very reactive, which means that
you’re always firefighting, you’re not thinking in the
future. So which essentially means that it’s more operational
than algorithmic. So it’s still not uncommon is what I would
say. I mean, it’s still not common is what I would say. Sanyam Bhutani: Okay, now,
before we talk about the intersection, can you tell us
more about your passion for cybersecurity, when did that
happen? And asking for a friend, can you retrieve someone’s
Facebook messages if they’ve been locked? Dr. Ashrith Barthur: So I think
these kinds of questions have been asked quite some time. I
would, I would like to say, I think these these kind of things
are not something that you’re supposed to do. But if people
are motivated enough, I think you can get these, you know, you
can do these things. So having said this, one of the things
that I was always very interested in is from from a
very generic point of view, I just wanted to see how things
you know, how all components come together, how things break
apart and you know, how like what are the weaknesses kind of
a thing, it was very it was not necessarily you know, from a
software point of view, it was from a larger computers point of
view. And then when you get into you know, like a field very
specific to trying to see vulnerabilities in like
software, then it becomes much more interesting and then that’s
where I kind of got into the, the idea of cyber security
itself which you know, eventually led to my interest in
actually be like an active penetration tester, you know, in
my masters and after that, you know, going into much more
research in the field of cybersecurity, try a try and
identify people who are doing these kinds of things, something
that I did much before you know, as a part of my masters program
and my internship and all those things, to try and see if
mathematical modelsor statistical models for that
matter, can actually identify this behavior, or can we build
models that can identify this behavior is how to the
progression mentor. But coming back to your friend’s question,
I think all technologies are vulnerable. So you know if this
is something as an experiment that you would want to try. Sanyam Bhutani: We’ll be back
after a break, dear audience. Hehehe, kidding aside, now, I
want to talk about were you still following your passion at
h2o.ai. What problems are you working on? What does a day in
your life currently look like? Dr. Ashrith Barthur: Um, that’s
actually a pretty big question. Maybe I’ll break it apart into a
few things. So one of the primary I mean, the general
umbrella that I work in is in the area of identifying
malicious behavior. And I started off with the field of
cybersecurity, with focus heavily on network security, you
know, trying to identify malicious behavior. You know,
through network traffic. And what that eventually led us to
is, is to add many different kinds of behaviors into my
portfolio. So right now, I also look at electronic fraud. I also
look at money laundering as a part of this larger scope of
things that I researched. And we also look at other kinds of
malicious, like other kinds of larger state acting malicious
behavior as a part of this entire portfolio. And we
essentially build models for all of these things for you know,
different organizations. I’m sorry, I think I forgot the
second question. Sanyam Bhutani: What does it
mean your life currently look like? Dr. Ashrith Barthur: So in a
day, I would actually say, a big part of my work is is still
testing models, is still building and testing models, is
also researching what are the different approaches that I can
use, but one of the things that I’m I am very passionate about a
few few if you would want to identify something much smaller,
like a sliver of grass, what I would have said, is to actually
see everything that I built like a model that I built to be taken
to a point where it’s completely applied. Because mind you, when
I’m building a model, if I build a model, and it’s a fantastic
model, I’m only satisfying the data scientist in me or the
analyst in me. But if I take that model, and put it into a
solution and solve someone’s network security problem, like
let’s say an analyst who’s sitting on the other end, and
this analyst, he or she has a lot of false positives in their
network, you know information or the network attacks or alerts.
And if my model can solve that, if it can reduce the number of
false positives, give them much more accurate information, then
I’ve solved a real world problem. Sanyam Bhutani: Yes. Dr. Ashrith Barthur: I’ve
actually applied you know, problem solving kind of thing
and getting things end to end, literally from like science here
to applied solutions is what I’m actually very passionate about
in getting done. And that’s that’s what I focus on in my, in
my day to day activities. Sanyam Bhutani: Where does h2o
come into the picture? h2o to the world is is the auto ML
company? Where does auto ML come into the picture, so to speak? Dr. Ashrith Barthur: So yes, um,
so I’m guessing you’re talking about auto ML human when we say
AML, or is it; Sanyam Bhutani: Automatic?
Because you mentioned building models, and auto ML is supposed
to replace you. Dr. Ashrith Barthur: Okay, yeah,
of course. So so one of the big things that we always have to
focus on is the idea that eventually, uh, systems will
take over human beings. And essentially what you have to do
is you have to train the systems enough to understand how you’re
thinking because mind you with the number of attacks that
happens, with the number of alerts that are generated, with
the number of things that are becoming digital or electronic,
for that matter. And systems being interconnected, we will
not regardless of what rate the population explodes, we will not
have enough manpower to, to identify all this behavior. So,
which essentially means that a lot of things needs to be handed
off to systems, which also includes intelligence. And one
of the things that intelligence aspect is the very kernel of how
h2o fits in or h2o’s auto ML for that matter fits in or even if
you want to be much more specific, you know, the whole
aspect of how driverless AI fits into the whole equation. It
gives us the ability to build models, it gives us the ability
to refine the models tune the models, for different kinds of
variations and behaviors, be it changing data be you know,
periodicity sees not to be any of these things. And I think
that is and the fact that you can get a model from you know,
conceptual to actual production very quickly gives it gives you
a gives you the ability to just come a bit closer to that
situation where I was telling you about where machines should
be doing a lot more work. Sanyam Bhutani: So you’re in the
side of robots in the robots with the human race. Dr. Ashrith Barthur: On the
contrary, I’m actually, to be very honest, I like to do things
like I like to do things with my own hands kind of thing is is
how I approach it. But, you know, the reality is that these
things are going to explode, you know, exponentially. And you are
better off handing it off to an intelligent system than trying
to make things work and losing out, you know, in the process. Sanyam Bhutani: I think it’s
similar to creativity in general, before the call you
were making coffee and coffee makers are now automated. You
don’t need to manually figure out that’s what maybe data
science will be in in a few years now. Dr. Ashrith Barthur: I would
agree. I would agree. But you know, just like how you know
that there are spaces where coffee handmade is slightly no
better than coffees that are made out of machine. Sanyam Bhutani: All the very
niche expert has a very exclusive class Dr. Ashrith Barthur: Completely,
I completely agree with you, although I do like machine
coffee, I’d have to tell you that. The same way a data
scientist for that matter, would be able to polish the premade
model by a system or by an automated system to just give it
that edge to you know, be better. And and that is how you
could see your workers or you know, as you use it as a data
scientist you could see your work the same way. Sanyam Bhutani: Okay, now coming
to an application that I think you recently interested in,
you’ve been working on anti money laundering. Why is it even
a thing in 2020? Everything’s digital, how is money being
laundered in 2020? Dr. Ashrith Barthur: So, this
is, is like a fantastic, you know, question that you put out
and you know, it might even come on as we are you working in
cybersecurity and then money laundering. But what have you
been? I’m sure that that will probably be your next question.
So one of the things was that I could see, while I was exploring
a lot of behavior, I saw a lot of behavior that was that I was
looking while I was studying network behavior was that there
were a lot of similarities in papers that were published about
fraud and money laundering and these kind of things. So I kind
of started to explore a bit more about that. And then what we did
is we engaged with a few clients to work on this, these aspects
as well. And the thing is, almost always people want to
save up on what they have to pay the state. Of course, these
these are these are not there’s not ethical, non moral, any of
those things, but they just want to try and evade the system as
much as possible. And which is one reasons money laundering
exists. In old school, it was very similar. You know where you
set up, shell corporations in like different countries, then
just move the money around when you’re done right? In the
current day, it’s because it’s digital, it’s fantastic. Like
you can move, let’s say, fantastic I say it with a bit of
responsibility in terms of how money laundering is happening.
There are, you know, the way money laundering learning is
done using electronic means itself is just, you know, it’s,
it’s so well done that it’s it’s, you know, it becomes
really, really difficult for identifying and because the
crooks are also growing with the, with how, you know it’s
like a generational change they are undergoing as well. So which
still makes it an important aspect for you know, today’s
something that needs to be focused on today. And yes, which
is why 2020 is still relevant for me. Sanyam Bhutani: How do we end
up, how do we end up automating the process? Since you mentioned
it’s it’s beautiful in a sense how people evade this beautiful
in in not the real fashion, but it needs a lot of human
expertise, how can we ensure that whatever models that we are
DriverlessAI builds are are robust enough, are of the human
expert level? Dr. Ashrith Barthur: Fair
enough. Um, so, that’s that’s, you know, I would see that as,
you know, like, like a much more like a technic, technically
rooted question, right. So the idea is that what you’re trying
to do is you’re trying to identify behavior. Now, anything
out of the ordinary, be it like an attack on a network or be it
you know, you’re siphoning off money is is going to stand out.
Okay. But the important thing is, with reference to what is
the big question, like if you want something to pop out, or
you know, be seemingly visibly compared to something else, the
thing that you’re comparing to, needs to also establish like a
baseline for something to pop up. And the models that we
build, essentially, sorry, the features that we build
essentially does that. Sanyam Bhutani: Okay. Dr. Ashrith Barthur: Now these
guys, I think you had a question. Sanyam Bhutani: No, Sorry. Dr. Ashrith Barthur: Okay. Yeah.
Um, so the features that we build are so tuned to actually
pop out unique behaviors that you could have not normally not
seen. I mean, for example, you know, if an investigator were to
look at, like, you know, behaviors in the last week, he
or she might not see anything interesting. But now, if you
were to look at us, you know, a long term shift about how much
money or you know, how much transaction a certain account
was doing. There could have been a steady shift or the clubbing a
spike, you know, that happened much before a week before which
the investigator might not have sights on and that is
essentially what the model captures. The model is capable
of going long, far and wide and deep, which, for humans we
limited, we not not not smart, but we are limited because there
is only certain amount of information that we can process
at a given point in time. But that limitation does not exist
with these with these features or models or systems. And
essentially, when you add these features, the models have the
good capability to pick these things up. And that’s where the,
you know, driverless AI comes into the picture, you have these
fantastic features that are identified, you know, for use
cases, and the models in driverless AI are able to pick
up on these features. And then you know, your model is highly
likely to be, you know, a good predictor of what money
laundering is. And that’s essentially how we go about,
like building the model. Sanyam Bhutani: This is actually
pretty interesting because someone would assume that a
human expert needs to spot those errors, and you talk about the
model being robust enough to actually see through things that
a human might miss through the fine grains? Dr. Ashrith Barthur: Yes. And it
and it’s not, um, I would, I would, I would also, you know,
be honest enough to say that this is not necessarily in a
negative connotation as well, right? Because we as humans, all
of us, like you and me and me, we there is a limitation and the
amount of information that we can consume. So it’s obvious
that we’ll miss out on something. The only thing that
we’re saying is, hey, look, you know, when for example, when you
when you’re fatigued, you don’t necessarily grab let’s say,
you’re reading a book and you’re like, tired. You don’t
necessarily grab all the story that’s coming out of the book.
You know, sometimes I’ll which I always do, I go back about four
pages, and I start again, which is the same. But machines don’t
have that problem. Sanyam Bhutani: Yeah. Dr. Ashrith Barthur: Or maybe we
still haven’t figured out if machines or systems and
algorithms have fatigues will I’m sure we’ll figure that out
later. But they don’t have that so which is one of the reasons
if you can offer these things to a machine, you know, it helps
us. Sanyam Bhutani: Before we talk
about where machines are currently helpful, can you tell
us more about the data set curation process? Because
anomalies happen, maybe in a ratio of 1:100, maybe less? I’m
not sure. But how do we find the right data in place up? I
couldn’t find any proper data sets on Kaggle. Maybe they were
one or two competitions not, it’s not a common problem, so to
speak. Dr. Ashrith Barthur: Yes, it’s
not a common problem. And one of the reasons is because there is
it’s heavily guarded, you know, with security, because one of
the things is that when you’re looking at these, these kind of
irresponsible behavior, you aren’t necessarily bringing in
your organizational risk team, you’re bringing in the state.
You’re bringing in many different agencies. So which
essentially means that you have to be very, very, very careful
when you’re handling this data set, because we’ve got PII,
personally identifiable information. Sanyam Bhutani: Okay. Dr. Ashrith Barthur: So it’s not
it’s, you would hardly find, I don’t think you would find any
data set available online, which we should be how it is, which is
how it should be. And so essentially means that a lot of
this work actually happens, you know, iteratively, which means
that you look at the data, you learn, you build, you know, your
first iteration of a model, and then you look at the data again,
you know, try different kinds of features, try different kinds of
joins, and then I tried the model to be much better. And
that’s essentially how we have built the whole, you know, the
solution space for any of these malicious behaviors, using
driverless AI. Sanyam Bhutani: Can you speak to
where driverless AI is currently being used? What sectors is it
being currently deployed across maybe the model from driverless
or auto ML? Dr. Ashrith Barthur: So, um, I
one of the things that has happened is driverless AI has
given organizations that amazing ability to be able to build
models and deploy models without having as many data scientists
as they would have earlier need it. Sanyam Bhutani: I think we need
to first clarify for the unknown people, what is driverless AI
and how is it related to cybersecurity. Dr. Ashrith Barthur: Please um,
okay, so I mean, I think I think for the the people who don’t
necessarily know what Driverless AI is, Driverless AI is this
tool that you know, our company h2o.ai or a but so you know,
makes which basically is a completely machine learning
automated tool. And, and, and I think it’s very popularly called
as Kaggler, a Kaggler in a box. Sanyam Bhutani: Yeah. hehe. Dr. Ashrith Barthur: And what it
does is, you know, it has this amazing ability to tune the
model, build better features, and process iteratively and then
give you the best output. And that’s essentially how
driverless AI works. Now the way we adopted for the field of
cybersecurity or money, fraud or any of these things or malicious
behaviour for that matter is that we tuned driverless AI, we
rather configured driverless AI, with the, with something called
a recipes. The idea of recipes is to tell driverless AI that,
you know, there are certain set of, there is a certain design of
features that it needs to look for, or to explore when it’s
building the model, which is very specific to a use case. For
example, when we’re looking at malicious users, I’m going to
tell it to look at, you know, historical patterns, periodic
patterns, anomalies thatstand out with a certain statistical
effect. Or, you know, unique interesting behavior that never
existed, maybe logs that are incomplete. You know, in, in
terms of money laundering transactions that you know that
that seemed to go in circles, for example, in money
laundering, there are a lot of times when you know, people move
the transactions in circles, because if money is in transit,
which means that they don’t necessarily have to pay tax for,
which, which is one most of the people do as well. So any of
these behaviors is actually encoded, like in proper coding.
It’s not, it’s not an illegal language, it’s actually encoded,
and it’s from a different driverless AI. So yeah, which
gives the driverless AI the ability to actually build models
and engineer features that are very much required for this kind
of reasons, and that’s essentially how it fits. So
which is why I would say, driverless AI, is the kernel in
this case, you know, very typical business model. But once
it’s built the model, you know, we call back driverless AI model
again, when you want the model to be built again, or you know,
when you go to build the model again, but other than that, the
output of driverless AI, which is the actual model object gets
gets deployed. Sanyam Bhutani: You mentioned
about the configuration, is this a different version? Or is it
just a few switches that you toggle to put it into security
mode? Dr. Ashrith Barthur: Oh no. So
it’s so it’s actually very simple. It’s when I say
configuration, it’s it’s the feature. It’s the recipes
feature that comes over. And the idea is, you write a custom
recipe for any of the use cases that you’re working in. It could
be any of the use cases that I’m working in, it could be that
which includes, you know, the entire spread of malicious
behavior across electronic fraud, cybersecurity, to
transactional and money laundering. Or it could also be
for things like, you know, you want to identify loan default,
you want to identify, you know, are you going to have customer
churn or any of these excuses, and driverless AI has the
capability to ingest any kind of recipe that user, you provide,
to adopt that for the use case that you are expecting model
for. So it’s, it’s, it’s, you can think of it as you know, a
much more unique customization for the use case that you want
to use now that you want to work on. Sanyam Bhutani: Okay, now, this
was an interesting tangent coming back to various
driverless being used for any sectors where we currently using
it. Dr. Ashrith Barthur: Oh, yeah,
yeah, I’m from, from what I can see, from what I know,
driverless has been adopted across the entire spectrum of
all the customers who are using it shortly shortly, of course,
was the, is an open source product as well. And it’s used
across the industry. It’s used in financial insurance,
manufacturing, supply chain management, I think hardcore
security as well. So pharmaceutical and I think,
yeah, for these are the ones that probably crop up in my head
right away. So I would say that these are groups they’re using,
but it’s it’s, it’s gained the same amount of traction as much
as you know, h2o’s widespread. Sanyam Bhutani: Any applications
of anti money laundering that that come to mind in this broad
spectrum? Dr. Ashrith Barthur: Um, yes, of
course, um in the sense you’re saying in respect to driverless
AI, right? Sanyam Bhutani: Right. Dr. Ashrith Barthur: Yeah. Um,
so we’ve used multiple models for the AML use case itself,
which is the anti money laundering, we use driverless AI
as the engine, the very engine to generate the model and to be
predictive enough for AML, and that and the driverless AI with
the AML solution is something that we have deployed, you know,
quite a few organizations to solve this to solve their
problems. Sanyam Bhutani: Any upcoming
sectors that that you excited about where we could we could
help with AML or even cyber security problems? Dr. Ashrith Barthur: Um, so one
of the sectors that I’m very excited about is IoT. IoT is a
vast space from from diminishers behavior in which which I always
start with and it’s got it’s got it’s you know, influence in the
financial sector, you know, like remote payment systems for that
example, or economists payment systems. And it’s also in the
field of cybersecurity like IoT is like, very important, you
know, because you have systems that are not necessarily, you
know, like, managed, but that that are very critical in the
entire operations space. So I would I’m very excited to see
you know, how that will come about and how we can work with
that on from from a modeling point of view. Sanyam Bhutani: To me this also
brings an interesting question, I’m visiting US soon. So I know
IRS flags any transactions, I think, above $10,000 and it’s,
it’s 50,000 INR for India, which is like thousand dollars. Does
does this problem vary from region to region across
this-This might be a bad example. But do you see any
challenges in shifting from region to region or policy
changes? Dr. Ashrith Barthur: So the
thing is, you have to understand where limits come from right,
the limits, you know, comes from the idea that, you know, a
certain country has certain average income, average, you
know, per capita kind of thing. So, you know, that essentially
plays a part in setting up the limits, but there are other
activities, you know, the kind of activities that they’re
looking for also plays a very important part in setting up
resubmits. For example, in, in Europe, it’s much more
stringent, you know, in in India, it’s probably stringent
as well. You know, because there might be a lot of activities
that are seeping through. In America, it’s much more
regulated. You know, there is a lot of much of the financial
sector has actually moved to, you know, an electronic
footprint, which means that trackability is easy, it’s not a
big deal. And this, this will vary region by region. And that
is that in essence is dependent on what you know, the specific
agencies who guide these things? Think is a reasonable amount
that would be, you know, threshold. And that’s
essentially what would drive the entire process. Sanyam Bhutani: US, like you
said, is a properly regulated, maybe maybe relatively properly
regulated even up do you think? How does the future to you look
like in the region such as Asia, India, where it’s still up and
coming technology internet is still sort of still picking up? Dr. Ashrith Barthur: I mean,
See, the thing is, I’m Asia, all parts of Asia, right? Of course,
it’s up and coming. There’s no, there’s no question about it.
But you do have to understand that there is a fundamental
problem that exists is that regardless of what you do,
regardless of the fact that you use any kind of technology, if
you want the technology to be applicable, like let’s say you
want a large kind of electronic footprint, like the way Europe
operates and the way America operates, you would have to
deploy the same kind of electronic footprint across you
know, usually China and India have some innovative banking
methods that have come through, you know, like SMS based banking
and all these things. And, you know, transactions that can be
monitored as well. I think media is coming with the one unique ID
we know that everybody starts to, you’re able to unify, you
know, any kind of addiction if, if that helps. So, what matters
is because the numbers are so big in this in Asia, it’s not a
matter of will we be able to, it’s a matter of when will we be
able to, and it’s only because the technology has to be applied
everywhere. And that’s when I think they could they will be
much more you know, I would say effective monitoring, and also
transparent monitoring, you know, because both sides can see
what’s happening. If I am falsely flagged. I think I’d
have the ability to you know, question that. But on the other
side of someone things have, right I’ve, I’ve actually done
some kind of, you know, illegal transaction, then they’ll have
enough information to flag me for that as well. So it’s, it’s,
it’s a problem on numbers, not necessarily, you know, will we
be able to do it? So, so I’m just saying that. Sanyam Bhutani: you’re talking
about being flagged. And h2o already has MLA in our products,
how interpretable are these models that we just talked
about? Dr. Ashrith Barthur: Oh, yes.
Um, so one of the things that we, that we do when we’re
building these very unique, specific use cases or specific
solutions, is that we work very closely with the financial risk
groups of the financial institutions or the
organizations that we work with, from cyber security, fraud,
money laundering any of these things, right. We actually
engage with the with the risk team, internal risk team,
because one of the things that they also have to do is let’s
take money laundering for a quick example. Now, when a
transaction that seems like money laundering for a matter of
fact, is visible in a bank, it needs to be reported to the
state. It’s, it’s something it’s a process of they have to work,
which essentially means that this information is not
something that you are keeping it for yourself for your
knowledge, but you have to provide it for others as well.
And it must be equally informational for them. So which
is one of the reasons what we do is we we customize driverless AI
to build unique statistical features. So it’s, so we do a
little bit less of feature combinations. We do a lot more
of statistical features, you know the going to the model and
because the statistical features are naturally interpretable. For
example, if I tell you the average amount in a month for a
user, you want, you know what it means. I mean, you, you know,
it’s aggregation, everything developed by the number of
icons, simple as that. So it’s very, you know, intuitive, like
easily interpretable. And that’s something that we strive for in
all of these use cases. And, and let me give you a simple
example, a quick, simple example, if you have the time
for cyber security as well. Let’s say you are the Risk
Officer working in association with the [] the chief
information security officer of an organization. And
essentially, what happens then, is that if your systems are
breached, you you’re duty bound, you know, to like, inform the
state as well informed you know, all the required agencies that
you’ve been hacked, you know, if there is a certain loss of data,
and these are the customers that you’ve lost, you also have to
inform your customers, you know, whose data seems to be lost,
which essentially means that you have to make all these things
parties, not probably to the same level, but you have to make
all these parties understand what actually happened. And if
I’m using AI, it behooves me to actually be able to explain
these models through explainable features. And it can be just
feature combinations, it has to be explained them features, and
which is why we adopt the same approach, we customize driveless
AI a crew recipes, of course, you know, for these kind of
features so that when we looking at malicious behavior
specifically, the thing is the models are super transparent,
you’re able to explain everything in one. Sanyam Bhutani: Now, broadly
speaking, or maybe even naively speaking, how do you convince
such regulated industry banks to use AI something that that’s
maybe like hard to sell, so to speak? Dr. Ashrith Barthur: Fair
enough. That’s that’s a very fair question, I would say. So
this is the thing right? Is this AI or ML for that matter is this
fantastic tool that you know, companies who’ve got good
resources have adopted it have been very successful. Now, one
of the things that is and these companies who have been
successful you, you can identify them, you know, literally
because these are companies who have large amounts of data, now
AI, very different to you know, like different approaches of
predictive you know, making models seeks large amounts of
data to be predictive. Now, which essentially means that AI
is more observational than, you know, a concept called emergent,
you know, where you actually getting knowledge out of. So
there is a slight difference. So it’s very experimental and
observational. While I would say ML is more experimental and
observational by you know, other forms like old school statistics
is more emergent. So what you have, the way you have to
convince, you know, regulators is the fact that you don’t
necessarily show them that large amounts of data will give you a
much better result. But you show them a process of transparency
in how your model is built in what your model actually is, it
could be a very simple, you know, card model, but it has to
be transferred like the investigator should, I mean,
the, the regulator should understand what it does, like
how does it make its decision. And the very fundamental aspect
in the model are the features these features must be
understandable, like that, I would say is the golden rule. To
get a regulator to actually understand how AI makes a
difference is to make the features understandable because
then everything falls into place, and it’s intuitive for
them to understand. Sanyam Bhutani: I think that
where also AutoDoc comes into the picture AutoDoc is already
integrated everywhere. So that gives out a fully regulated
regulatory friendly so to speak document for any anyone that
wants to investigate or probe into these. Dr. Ashrith Barthur: Yeah, I
mean AutoDoc, I think surfing news on the features, I think, I
think that would be amazingly useful, you know, as a
consumption device for regulators because it keeps the
model very transparent it, it’s able to, you know, take in all
the feature sets that we have put in and, you know, build a
story around how much of a bigger story you know, can build
and then essentially provide that as information to you know,
the regulators for consumption. Sanyam Bhutani: We were also
talking about being fatigued as as a data scientist, you know
what most excited about writing documentation you want to move
on to the next model building task! And I think it’s, it’s
where the automation is helpful. Dr. Ashrith Barthur: It is, it
is, I would agree with that. But the other aspect, this, this, of
course, is not something that, you know, people who build
models like, and that includes me as well, is, you know, to
document everything. It’s like this, right? You, you’ve done,
you’ve gone and done something really cool. And you’re like, I
don’t want to document this. I mean, that’s, that’s where it
ends. But that’s not how it is, right? what I’m doing, the model
that I build is not for my satisfaction. It’s for the
satisfaction of someone else who sought my help, which means that
our customer or client organizations who work with us,
it’s for their consumption, which means that be must be very
title in what we give them as information. And which
essentially means that because the customer came to me for
help, I must provide him or her with all the information that
they can to understand that what we have built is something that
they can trust. What we have built is something that they can
rely on. And what we have built is robust enough to solve that
problem. So, which is one of the reasons, although we don’t like
it, we have to force upon ourselves to get these things
done. I think that’s where AutoD c kind of helps us to a large
extent, it fills up the larger s ace, you can probably browse t
rough it very quickly and pred ct a few things and that’s p
obably all. Sanyam Bhutani: This has been an
amazing interview full of many great insights. I know a lot of
the audience’s is students,MLl students. For them who are
excited about machine learning, you have a good grasp of it. And
now you think to apply it to the domain that you’re an expert of
what best advice would you h ve fo Dr. Ashrith Barthur: Um, so one
of the things I would say is try and get your hands on any kind
of data set that you can, you know, to, to, to familiarize one
with the domain and to familiarize with the data
science, with the data science aspect of it, which is one thing
that I do, you know, to get an idea of what you do, and when,
when you’re familiarizing yourself with the domain, try
and don’t focus on the the accuracy of the model, focus on
what is going into the model. So it gives you a much better, you
know, valuable result, I would say focus on that a bit more.
And the next thing that I that I actually, I feel very strongly,
is try and be a full stack data scientist as much as possible.
Because building models are cool. Building a model is really
cool. But solving someone’s problem is much cooler, which
means that if you can build from model to a solution, and then an
application, that’s way cooler than building a model and saying
this is my shiny new object. So if you can follow these two
things, I mean, if you know guidelines, I think that would
be fantastic. People are starting off. Sanyam Bhutani: Awesome. So
before we end the call, what would be the best platforms to
follow you, follow your work? Dr. Ashrith Barthur: I think I
wasn’t expecting this question. Because I don’t necessarily put
out information on a lot of platforms. But I think I rarely
put things out on Twitter. You know, when when there are some
good articles, sometimes probably on LinkedIn, but h2o
blog is is a very much a good space to seek through what I
write because even though I write a few things that there
are people who push me to publish it, so that that helps
me. Yeah, I think I think that should be I’m good enough to. Sanyam Bhutani: Perfect. Thank
you so much Ashrith for joining me on the podcast. Dr. Ashrith Barthur: Thanks
Sanyam. I really appreciate the time and the opportunity. I
would say to the audience, if you know if there’s any
questions that they would have, please feel free to drop on the
line and we’d be more than happy to answer them. Thank you. Sanyam Bhutani: Leave them in
the comments we’ll try to review and leave a reply to your reply. Dr. Ashrith Barthur: I thought I
would never get an opportunity to say that but yes, leave them
in the comments. Sanyam Bhutani: Thank you so
much for listening to this episode. If you enjoyed the
show, please be sure to give it a review or feel free to shoot
me a message you can find all of the social media links in the
description. If you like the show, please subscribe and tune
in each week to Chai Time Data Science”.

4 thoughts on “Cyber-Security & Anti-Money Laundering | Applied AI & H2O AI | Interview with Dr. Ashrith Barthur

Leave a Reply

Your email address will not be published. Required fields are marked *