On Data Science Roles...
Recently, I have been bashing my head against the wall as to where do I fit in the big world of data science. I consider myself a data scientist, and have been doing so for the most part of my career (7-ish years or so). But lately, ever since my title changed, my role and tasks are somewhere in a bizarre transition, or better, in an constantly-evolving void of multifaceted responsibilities which I struggle to put on a piece of paper, therefore making it hard to determine where to focus in my career development endeavors. This is in no way a rant or a complain, I simply thought it was a good moment to express my opinions of the current inventory of roles after a few years in the field.
In a previous post of this blog, I wrote about how I moved from a physics degree to data science in higher education. I started off as an Associate Research Scientist: a fancy title that helped me put my name and see my contributions on multiple peer-reviewed papers, something I am very proud of. A few years later, I got my promotion and a title change to something that was odd to me at first, but with time it has started to stick with me, the infamous Data Science Engineer role.
What is a Data Science Engineer?
I have been asked about this in multiple conversations with people in the data space, and I always end up feeling like I could have explained it a bit better.
I have been trying to put together a clear and concise definition of this role as it is not a common role you see on job boards of companies or in social networks like LinkedIn. It is why in this entry I’d like to provide a bit more insight into the profile of a Data Science Engineer and share my experience to also shed a light to anyone interested. You can read this as a blatant self-marketing/promoting piece of writing, nonetheless, I do think it is important to help job seekers and companies alike understand that people like me exist and that there are no clear pathways in this “sexiest job of the 21st century BS”.
What can a Data Science Engineer do?
You’ll hate this answer but, pretty much anything. In my particular case, coming from a physics background I think the biggest asset has been troubleshooting, researching, and trial and error. Stats were not something that came natural to me and I had to work longer hours to try and get the concepts in my head. To the day, I still need to review basic things to understand what is going on. From a software perspective, I never really learned Computer Science basics like Data Structures and Algorithms, Architecture, none of that. However I did pick up basic (really basic) concepts pretty quickly once I started to get exposed to them, and of course, I was hooked.
Are you a data scientist, data analyst, Data Engineer, or a Machine Learning Engineer?
Neither and all of them. On a daily basis, I can go from crafting some interesting visuals for a paper, poster, or dashboard. The next day I am monitoring our ML pipelines and fixing the features store or codebase in case there are errors on the batch run from the night before. Some afternoons I see myself opening a stats book trying to remember how to interpret a logistic regression, or how to perform propensity score matching for a specific ad-hoc request from a stakeholder. In my free time, I troubleshoot and develop R Shiny applications as a freelancer, and I do everything from making a button work to putting the app on AWS for publication and sharing.
Now here’s the problem. A data scientist by paper is focused primarily on building cool models, training algorithms in a Jupyter notebook, or reading a recent paper on some new fancy neural network architecture, I am not that person, but have been time to time. A data analyst crafts dashboards and know metrics well enough to build engaging reports after arduous data transformation work done in SQL (mostly). A Machine Learning Engineer may come more frequently from the Software Development space, with a broader knowledge on architecture, infrastructure, and super focused on more than classic statistical learning to include deep learning concepts. Finally, a data engineer focuses on building data pipelines, data models, and allocating resources usually on the cloud. This kind of profile is the one you’d hired 10 years ago for “Big Data”.
So now, back to our original question: Who am I? Why am I here? I have been exposed to many of the above “job responsibilities” to a certain degree. My role has varied by project, by business need, and by employer. I would probably attribute this to the fact that I started in data when there was still no clear definition of roles. This I take as both an advantage and a liability. On the brighter side, I have developed a set of skills that allow me to refer to myself as a Swiss Army knife of data. I can talk to different people within my organization and understand the technical jargon, which then allows me to translate easily to non-technical stakeholders and other teams. The down side, I haven’t fully developed a single path so I can continue growing with the proper focus, nor have I been capable of promoting my contributions other than say
At first, as many junior data scientists, I was obsessed with advanced mathematics and their application to real-world problems. I have been lucky enough to work on a variety of projects involving complex concepts from psychometric theory and advanced statistics; the math has been there by my side and I have witnessed my impact on tangible problems. However, as time went by, I started to feel like many of my contributions were only limited to literature and academic discussions. I was slow at delivering value and outcomes because I was highly insecure about the rigor or sophistication of my work. It was at that point when I decided to focus on building stuff however imperfect, incomplete, buggy or raw it may be I just needed to put things out there.
After a few months of prototyping multiple R Shiny apps and integrating cleaner code in my existing pipelines I decided to focus on moving away from the classic Data Scientist role onto a more engineering-heavy position. Many of the skills required to succeed as a ML or Data engineer come from having solid CS foundations, something a Physics degree or a data science position rarely will prepare you for. So, was this a smart move? Maybe not, as I knew I’d be climbing a steep learning curve. To be fair, my journey to become a data scientist was not short of challenges and rabbit holes, but at least the research and stats required to thrive was more “familiar” coming from physics.
To be continued ….