A Journey Through Data
Last October 2016, I had the pleasure to meet Gauthier Vasseur at Stanford University. I found his lecture on Big Data and Marketing Strategies to be very interesting. It was obvious that he was not only a data expert but that he could also transmit his passion for his craft to his students through his charming personality.
After the lecture, I reached out to Gauthier via a survey that he sent out to follow up on the event. I expressed my interest in interviewing him for our blog. He was very gracious and kindly accepted to be featured in this article. During our Skype interview, I had the opportunity to express my sincere appreciation for his lecture, and to tell him how much I enjoyed learning about his approach to data in today’s global market.
I hope you enjoy the interview as much as I did!
Madelyn Montilla: Who is Gauthier Vasseur?
Gauthier Vasseur: I help companies to be successful in their digitization process. What makes me who I am as an instructor is my background, which has been first defined by 10 years of experience in operations and finance. That’s where I had the opportunity to really understand the impact of data on business. This was followed by 10 years in high technology, software, and data management, which allowed me to really deep dive into the topic from a Silicon Valley perspective. As my passion for data grew, I started to teach, and that’s what I’ve been doing since 2002 in universities such as Stanford and at the Association for Finance Professionals. Working with them has given me tremendous experience on how to convey my operational knowledge around data to students and corporate employees that are looking to learn more about what data is.
Personally, I’m a believer of lifelong learning. I get passionate about topics, which don’t have an end, that are unlimited sources of learning and discovery. That’s why I also surf and do astrophotography.
Madelyn: How did you become involved with Big Data, and why is it so interesting?
Gauthier: The reason I became interested in data is that I realized it could change the way I was working. It could make me a much better professional. It could also make my employees much better professionals. Data is a means to an end; data for data is not of big interest to me. Data to do better things, faster, more sustainably and more reliably is really what made me so excited about the topic.
Madelyn: Are you currently working on any special projects related to data?
Gauthier: I’m continuously working on three main types of projects:
- Helping startups to develop better data products for their customers.
- Teaching data and digital transformation to graduate students in universities such as Stanford, and Science Po.
- Teaching corporate students and working with the Association for Finance Professionals developing digital transformation curricula specifically designed for Finance and Treasury.
Madelyn: What’s your advice to students who are interested in becoming data professionals?
Gauthier: (laughing) Take my class. Go see your director or dean, and tell them that you want a data class. It’s a 20 to 30-hour curriculum. That would be the first step. I think that the key to getting started in becoming a data professional is not about technology and data per se. You have to have that thing inside you that pushes you to try to do everything better the next day, and as a result, you try to make things better every day. That continuous will to improve yourself, just to make yourself better, be a better employee, be a better person is the main drive to say: “Hey I can work faster, I can work better, I can do better things, but I need to master the tools that empower me to do this.” One of the core toolsets is mastering your data. So if you have that will, you will naturally become open to learning about data. I would find this will inside yourself that’s how it begins.
As for the class, there are a few technical and data-related things that you need to learn. They are not complex; they just need to be learned. There’s a learning curve to it. Be ready to iterate and constantly push yourself to go back to this practice. Data is like a sport: the more you practice the better you become at it. If you make it part of your daily discipline, you will become not only good, you will become an expert.
A Brief Introduction to Big Data
Madelyn: What is Big Data?
Gauthier: That could be a long debate. People used to refer to Big Data saying that it is the volume; the velocity, the speed at which it comes to you; and the variety, there are so many different types. These are some of the defining elements of Big Data.
However, when you talk to people and say: “Hey, you’re doing Big Data! That’s cool!” They turn to you and say, “I don’t know I’m just doing data.” I mean, Big Data is just data – big indeed – it is just a lot of Data. And the same concept applies to a car, a car is a car, then yet we have fast cars and slow cars but at the end of the day, it is still a car.
So, I would love to take the definition of Big Data more towards the challenge of the new type of analysis we need. Big Data is about bringing a vast variety of different data together to connect them in new ways and find new ideas or solutions. Volume and velocity can easily be harnessed by technology. What matters is really bringing together a lot of data and combining it in a smart way. But here again, the core is not the data itself, it is the question you ask that will lead to the data you need, and then the courage and the ability to go get that data. I think that this is really the true challenge of tomorrow’s data, it doesn’t rely on Big Data, it relies more on the holistic view of what data can do for you. Holistic data is the new Big Data, which means having an overview of a situation using data. That’s how I would define it.
Madelyn: What’s at the core of Big Data?
Gauthier: The ability to bring a lot of different data together. That’s at the core. Since the human brain cannot really harness and comprehend this massive complex data set, we can also find at the core of big data the application of algorithms and machine supported techniques to see through this data. That’s where the classic terms we hear more and more – artificial intelligence and machine learning – come into play. As funky as they seem, they are only tools, bigger tools, more powerful tools that we need to apply to more complex data sets.
Madelyn: Can you learn Big Data? How?
Gauthier: In my opinion, there’s no such thing as learning Big Data. Big Data is a multifaceted world and you need to learn about the systems through which the data is going to flow. You need to learn about how data is going to be structured and how to acquire the data, how to make it look good, how to organize it, how to structure it; but then, you need to know how to position the people around the data, because people need to ask the right questions and they need to make the right conclusions and execute upon that data. And finally, you need to organize everything into a process that’s going to be efficient and sustainable.
So can you learn Big Data? Can you learn data?
Well, actually it is a whole discipline that you need to learn which has 4 main facets, as I said: technology, data, people, and process. Now the beauty is that it is not that hard to learn. I’m not talking about data scientists and mathematicians, but for professionals like you and I, we can harness these 4 facets and actually make them work for us. That’s the purpose of the class, that’s what I’m doing.
Madelyn: How can we get the right data?
Gauthier: Getting the right data starts with asking the right question. I mean, if you ask the wrong question, if you are not accurate enough in the question you ask, you will get the wrong data and you will do stupid stuff. Sometimes you are going to ask the right question, and you will still go after data that is not relevant to the answer. Because the data is right under your nose and you say, “Well, that is the one that I am going to take.” It takes the right question, it takes a certain amount of boldness to say, “No, for this answer we need to go get that data here no matter where it is.” And then, take some creativity and some good technology knowledge because you usually are going to need it to go get it.
How do you collect, transform, and sell data? This is a mix of data management, technology, process, and also people engagement that you need to put together to get the right data. But it is a never-ending quest, this is just part of the whole ordeal of these data projects – getting the right data – because you have to get it once but you also have to get it again and again. Getting the right data once is useless because you need to run these analyses all the time to start seeing trends. It is about building processes and protocols that can stand the test of time.
Madelyn: What’s new in the data world, and what is expected to change?
Gauthier: Although algorithms, statistics, and machine learning have been around for several decades, what makes them look new is that they can – technology-wise – be run by people at their full power thanks to the evolution of technology, processors, and memory.
Now having said that what I think is going to be new very soon is how we are going to manage the ethics of data. How are we going to put safeguards to prevent people from doing bad stuff? There’s a very fine line between marketing and Big Brother, and right now, the law just can’t keep up with this. What is going to probably change? Maybe the law will come down hard on the way people manage other people’s data. And also, maybe we will see the rise of leaders that are going to say, “I’m sorry, we do not do these things because they are unethical and they are a threat to people’s privacy.”
I think that’s something that might change besides the classic technological performance growth that everyone can anticipate.
Big Data and the Language Industry
Madelyn: Is there a difference between Machine Learning and Big Data?
Gauthier: Usually, Big Data as a whole includes approaches such as machine learning or artificial intelligence. But basically, machine learning and artificial intelligence, just consider them as tools, reading tools, processing tools of raw material – which is data – and when there are lots of volumes of them you call them Big Data. These are essentially tools, the same way you need to hammer a nail, you can use a hammer, but if you need to hammer a thousand nails you are probably going to use an electrical hammer. Artificial intelligence and machine learning are just bigger tools that compensate the limitations of the human brain.
Madelyn: What’s the potential of Big Data across multiple industries?
Gauthier: The potential is obviously big. It is related to the questions people are going to ask, and their ability to secure the raw data that’s going to feed into these Big Data quantum approaches. So the potential is big, but I think it’s going to be limited by the human power of being smart, and curious, and innovative about the applications. Everybody can do massive analytics and draw some massive conclusions, but these conclusions, at some point, have to be meaningful, they have to answer a need. They have to be applicable to a reality of business and the machine doesn’t know this, only people will know by asking the right questions, making sure the right data is employed, and making sure that whatever the conclusions are, the companies and organizations can pivot, and apply these conclusions.
Madelyn: What does Big Data represent to the language industry?
Gauthier: This is actually a very interesting area. When you start studying a language, first of all, you have massive amounts of unstructured data, volume, and complexity. Then, understanding a language or translating a language is not a binary algorithm. I mean, there’s a lot of semantic understanding, ontology understanding; there are a lot of cases and subcases. All this is extremely complex. I’ve seen a machine learning training for store reviews for a local company that specializes in customer reviews. It is very, very complex. It requires massive firepower… and many low paid analysts to train the machine
I think both process and power – memory capacity – and the ability to apply complex algorithm and machine learning to massive data sets is a huge opportunity for the language industry. 10 years ago, even though people had models and theorized about what you could do with languages and semantics, they just couldn’t do it, because they didn’t have the computers. Now we do. Every day and every month that passes, there is, even more firepower available to do this processing. Basically, we are unleashing massive amounts of research and possibilities for the language industry.
Madelyn: How can we apply Big Data to Technical Communication and Localization projects?
Gauthier: Well, I think it relates to what we just discussed. Localization is a lot of unstructured data. I mean, obviously it is longitude, latitude. But then, it is about addresses that are not necessarily standardized, that you are going to have to recognize and fuzzy match to address references. As you start localizing people, behaviors – same thing – you still have to consider long and lat coordinates, but you are going to attach behaviors and comments. It creates massive amounts of information, much of which is unstructured. With the ability to do this massive language processing – correlation processing – basically, there’s nothing new under the sun, except for the fact that now we can run these analyses, and we can dream of doing so much more because we have the firepower.
That’s what actually Google is putting together as they have a large stake in Uber. They know what you search, and now they know where you go physically. Soon, they’ll be able to suggest – based on your searches, and your Uber usage patterns – points of interest you might want to go to. Because they understand the language you use for reviews, they know what you write, and they will even be able to help you select the best stores to go to around the area where the Uber is taking you to.
Madelyn: What are the benefits of using Big Data?
Gauthier: First of all, let’s be careful when you use Big Data. Using Big Data doesn’t necessarily mean benefits. Sometimes using Big Data is actually going to make things far too complex, and there’s probably a much simpler question and a much simpler process that can yield the same results.
For me, there is no such thing as Big Data for Big Data. Big Data has to come into play when you are looking at a question for which the data you need has reached a level of volume and complexity that your brain, your mouse, and clicks just can’t touch. That’s really where Big Data kicks in. And this is where you are going to say: “Well, to dig a small hole in my garden I can use a shovel, but to dig a canal I’m going to use a mechanical shovel. I’m going to use machines.” Well, the same thing applies to data. When the volume gets too big, you are going to have to apply bigger tools. Your brain and mind will need to be supported by machines in the shape of algorithms such as machine learning and artificial intelligence.
Madelyn: Is there anything else about Big Data that you would like to add?
Gauthier: Just a word of caution. A lot of people are leading in with Big Data, and we need to do Big Data, but we have to bring it down to something much simpler. Once again, Data is a means to an end, and what is that end? Let’s make sure we ask the relevant questions. We are trying to solve the relevant challenges, and by just narrowing and zeroing in these questions or these challenges we will find out the clear definition of the data that we need.
And then two things:
Either because we are focused – so most of the time the data set we need won’t be that big – or because the question we asked goes beyond what we had asked before, or the challenge we want to solve hasn’t been solved yet, this is where we are going to call for more data. Instead of saying “Let’s do Big Data”, let’s say “OK we need more data”, and instead of saying “Let’s do Machine Learning”, let’s have it triggered by the moment where we realize “My brain can’t see the signal through the noise, I’m going to need to be supported by a machine.” Big Data has to be a natural progression, not something to decide to do. Doing Big Data has to be the conclusion of a process that says conventional means are not going to get us an answer: “We’ve tried it, and it’s not enough. Let’s push the boundaries and let’s open up to these new techniques.” It has to be the extended continuation of a journey, you can’t just jump to Big Data, it doesn’t make sense.
Madelyn: Thank you, Gauthier!
We will keep you posted about the latest developments in the world of Big Data. In the meantime, follow us on Facebook, Twitter, and LinkedIn to receive blog article updates and interesting facts about Technical Communication and Localization with our #DidYouKnow? #FunFact of the week.