Pirates Research Apprentice Application Details

January 25, 2025

Profile picture

Admin is Harrison Boyd who lives and works in the Pittsburgh Metro building useful things and following baseball. Connect on LinkedIn!

In the Pirates UI Developer Application Details article I shared my application answers for the UI Developer job posting and why they created the position in 2024. In 2023, I applied to be a research apprentice and received a response to complete a coding challenge. Due to unforeseen circumstances I was not able to start on the coding challenge, and I thereafter never heard back.

Please take the time to read a previous article about the Ui Developer Job Posting and my application for it, Pirates UI Developer Application Details

Like the UI Developer application, the application for the Research Apprentice is a pretty standard application. Below are the questions and my answers for my application to the job posting. Note that I did not mark down the exact questions, so I am recalling the questions based on the answer and paraphrasing the question. I noted the exact answers I gave, so those are exact. I think there are one or two questions I did not keep my answers to, so I will not include them below.

Question 1: Why do you want to work for the Pittsburgh Pirates

My response is below:

I am a lifelong Pirates fan and live in the Pittsburgh Metro. I follow several media outlets covering the Pittsburgh Pirates including DK Pittsburgh Sports, Pirates Prospects, Pittsburgh Baseball Now, Rum Bunter, and Bucs Dugout. I often watch the Pirates game with my dad at my parent's house. What I really want is to win, and the timing is right for me to move into a new position. I have a project going to production at the end of September and I will probably not have as easy an offboarding in the next couple of years than I do now.

I cannot play on the field, but I am a senior software developer just getting into Data Science and Machine Learning. I am around 2/3 finished with a Machine Learning Engineering Python bootcamp at Springboard where I am working with PyBaseball and Panda Data Frames, and found I love working with data all the way from collection to cleaning to EDA to building and training a model to deploying said model. In addition to my day-to-day job I have dipped my toes into cPanel dedicated server, MySQL, Postgres, AWS Amplify, AWS Lambda, Payment Processing, Netlify, and Heroku.

I am a web developer by trade and love learning things. My passion for learning and my passion for the Pittsburgh Pirates provides an opportunity for greater than 40 hours of work seem not so much like work. I have been hearing news about the Pirates like purchases of lots and restaurant space, possible new TV deal, players like David Bednar seemingly really into marketing for the team, and pitchers joining the Pirates in order to revamp their careers. To me it seems like the Pirates are working multiple fronts to find competitive advantages and I would love to be part of that team. Really whatever helps the team win and grow whether it is R & D, Business Development, Player Development, Operations, Marketing, Technology, Etc...

I am looking forward to continue learning Data Science and many more skills, hopefully with the Pittsburgh Baseball Club. The R & D aspect in the job description as well as what I am hearing in the news about the Pirates trying new things to gain a competitive advantage makes me think the Pirates aren't just competing on the field, but there is a drive to compete in all aspects of a Baseball Club. I really want to be part of that team and look forward to hearing back.

Question 2: Why do you want to work in Data Science and Machine Learning

My response is below:

I am already a professional Software Engineer/Developer of 8 years. My 3 strongest programming languages are Java, JavaScript, and Python. I am fluent in the Angular and Springboot Frameworks as well. I enjoy my time as an engineer and love creating reusable components. I was the primary engineer behind an Angular Component Library used by dozens of teams at a large retail bank in the Pittsburgh Area.

I also love debugging problems. Just today I am working on fixing, deploying, and preparing a case to explain why we got a false positive on my team's performance test which in a test environment booked 15000 credit cards in span of hours. All 15000 accounts have been booked at a third party as expected and in my team's Oracle we have a record for each 15000 but around 2000 account numbers of those 15000 records are null. I received a master spreadsheet of all the accounts booked and see there is one type of account that has been booked and the account number is null in our Oracle. I see our Select for Update query which selects those types of records for update does not have a LIMIT on it therefore I am hypothesizing that our Oracle dropped the transaction for the records that have null account numbers. I can name so many of these interesting problems I have to solve. Not only do the technical problems need to be solved but explaining and working with stakeholders without their hands on the code needs to be considered as well. The performance testing team says their tests pass that they have performed on our distributed system. However, I need to explain to them and other stakeholders that their test results are a false positive. The performance test team said since the 15000 accounts are booked that their tests have passed, but I need to explain that they aren't considering the entire architecture and the tests need rerun after I fix the issue with the query.

Besides working with data as a next step to advance my career I find cross-disciplines fields fascinating. I am starting to dabble with robotics which includes fields such as electronics, mechanics, computer science, physics, chemistry, physiology, computation simulation, manufacturing, supply chain, and so much more. Learning a cross-discipline field is difficult to do, however it draws applications and competitive advantages not understood when in a single field bubble.

I have a good foundation in software development with 3rd and 4th generation languages. I have also experienced headwinds and embrace it. Setting up pipelines and delivery which make an infrastructure and framework seem like they work by creating a perfect vacuum is the goal of engineering IMO, but there needs to be preparation for when reality shows that a perfect vacuum is not real. I am ready to learn the next with Pittsburgh Baseball Club.

Thanks, Harrison

Question 3: What experiences do you have working with data

My response is below:

My current project inserts, updates, and deletes records from related tables in a one to many relationship.

I do joins, usually a full outer join to see the data inserted into a record of a newly booked account. I have done select for update queries since multiple pods/instances could attempt to read then update the same records. I have also ran create table statements with primary key and foreign key constrains as well as json constraints. I have not worked with the NoSQL platform yet like Mongo. I have created identity columns, unique constrains, and sequences used for Company IDs and Credit Card Application IDs. I have used Char, VarChar2, Blob, Timestamp, and number types. I haven't run into an application for Unions yet professionally.

I have also uploaded millions of records to a MySQL instance running on a dedicated server. To do that I had to create several tables breaking them up and creating sql dump files on my local and then ftp them to the dedicated server and run the dump files on the dedicated server to create the tables with the data needed.

In college we performed normalization exercises to understand when we needed to create a related entity and also ran nested select statements. We have also created MS Access databases with tables and junction tables with many to many relationship.

Thanks, Harrison

Question 4: What machine learning projects have you worked on

My response is below:

I haven't completed any machine learning projects yet. However, I am in process of working on one at the Springboard bootcamp for MLE. The project I am working on is predicting the outcome of MLB games at a better rate than Vegas money line odds. I am using features such as betting odds, recent records by home and away team, pitching metrics, and batting metrics. I have collected the data using PyBaseball loading data into data frames from Statcast. I also found online a set of spreadsheets with the money line open and close, pitchers, and outcomes from 2010 to 2019. I calculated the average number of times the money line favorite won the game over those years and came to ~58%. I am using that as the benchmark for the performance of my custom model which is still being worked on. I will compare classification models such as Random Forrest and XGBoost to see which produces the best results.

Thanks, Harrison

Question 5: What experience do you have working on a team

My response is below:

I work on a team every day. I am have been a developer lead for a few years and currently I am a develop co-lead on my new team since the workload is so large. I have been on two immediate teams since 2018. There has been differences between the two teams. The first team is when we built an Angular Component Library for a large retail bank. The team started with two developers and grew to five. I was the dev lead when the team grew from 3 to 5. I tried to make the team a flat structure. When the team was two or three it was easier to be flat. However, when the team grew to 5 tasks seemed to funnel through me making a hierarchy. Some good things that came from that is I learned a lot to the point when I am considered one of the best front end developers at the company. I have been impromptu asked to train new members in Angular Development. I have also been sent to potential client hack-a-thons to show what CGI can do. It's been almost a year and a half since I have done Angular Development and I still get questions about Angular and the Angular Component Library. The downside of virtually all tasks funneling through me is that other members did not develop as fast as they should and I had a lack of time so instead of mentoring I was just doing their work. Our team most of the time was still very preformat as we met our sprint velocity week in and week out. The client liked me and was very happy with what our team was delivering. When the time was right for the team, I moved on to my current team. I am not the dev lead on this team. We are still very performant and flexible. We have worked on POC for work that is not part of our SOW at request of the client. Our team can do front end work, back end work, database work, and business process management work with the flowable orchestrator. This team is a flat structure. I take the lead of things voluntary as does the other 2 senior developers. The two junior developers also take lead on some things which is different than my previous team. It is a good balance of me leading, answering to the dev lead, and mentoring the juniors. I also get mentored as well I mentor the other seniors depending on the topic is. The team seems to naturally take responsibility and become subject matter experts without being guided to it. The downside of this is that I am not knowledgeable on all the tasks that happen on the team.

I have worked as a business analyst, developer lead, developer, QA when my first team was small. I enjoy doing that since it helps me learn more. I believe it is good for a rookie to first try multiple aspects of the project because it makes a good leader later on.

Conclusion

These questions and responses are typical for an entry level data analyst initial screening. This job posting is no longer there. It was removed before the posting of the UI Developer position which shows me where the Pirates are at in building their Data Infrastructure. The apprentice position was more of data engineer than data scientist position. A data engineer essentially designs, builds, and maintains the infrastructure that collects, stores, and processes raw data. The apprentice position was there to do the grunt work building the data infrastructure. It seems the Pirates are past that point and are ready to share their insights and information found from their data to the coaches and players.


© 2025, Post Bucs