Onboarding guide
Bovi-Analytics onboarding guide
First of all, welcome!
You might get overwhelmed at the beginning of this journey, but we are there to guide you through!
Getting started at the Bovi-Analytics lab
First, we want to tell the world you are part of the bovi-analytics team so you need to introduce yourself!
Create a https://github.com profile (if you don’t have one) and request Miel to be added to the Github bovi-analytics team. That way you can have access to all repositories.
Create your own Quarto file (qmd extension) using R studio or any other text editor and a profile headshot https://github.com/Bovi-analytics/bovi-analytics-website
Profile qmd file => https://github.com/Bovi-analytics/bovi-analytics-website/tree/main/researchers
Lab headshot in png format => https://github.com/Bovi-analytics/bovi-analytics-website/tree/main/files/img/headshots
You can also do this within GitHub directly nowadays
Bovi-analytics team communication
We use different ways to communicate, directly and indirectly. I (Miel) don’t like emails for direct communication. Similarly, I avoid sending files that need editing across teammates. Therefore, I started using several tools:
Join Lab Teams Workspace
Ask Miel or Enhong for guest access if you do not have a Cornell account
We typically use this for project & file management (and plan teams meetings of course)
Join Lab Slack
- Ask Miel or Enhong for Slack invite (or use this link https://bovi-analytics.slack.com/ )
Slack was created before Teams was, and I have been using it for a while now. I prefer it over Teams for direct messaging.
Project management
I have been using several tools in the past, such as JIRA. Right now I’m using the Planner tool within the Office365 suite from Microsoft which is available for all Cornell personel and students via https://portal.office.com . There you can create a planner for a project using e.g. KanBan project management. I like it that you can link it to a Teams project channel.
Computational languages
The team uses multiple languages but the most popular once are:
R => for statistics, don’t use it for data intensive tasks!
- IDE = R-Studio
Python => data pipelines, compute intensive, AI
- IDE => PyCharm
Scala => scalable parallel processing of data
- IDE => IntelliJ IDEA
Computational frameworks
One of the most popular frameworks to perform ‘big data’ is https://spark.apache.org . It’s often used by Databricks. At a certain moment you might use it when you want to analyze data that doesn’t fit into your computer’s memory anymore.
Computational resources and access to them
Most of what we do computationally uses the Microsoft Azure cloud. You probably won’t need access to it, but it’s nice for you to understand what it is.
One of the most important cloud-based tools you probably will use is Databricks. The platform allows you to perform data science in the cloud, and especially work together easily across the team.
Start by trying it using the following guide => Community Edition of Databricks
Once you get use to this version we will introduce you to the entire platform here (Databricks Cornell), but it’s nice to start small
Data visualization
One of the very popular visualization tools we use is http://tableau.com. You can get a version of it using this tutorial (https://bovi-analytics.com/tutorials/tableau.html )!
Best practices
Try to get familiar with the FAIR principles (https://www.go-fair.org/fair-principles/ ) as they are an important part of the philosophy of the lab.
Because of that
Make sure to get used to work with version control using git. Follow some tutorials, try using it with R-studio, PyCharm, Intellij IDEA
Within everything you do, start applying a naming convention as soon as possible (See here https://bovi-analytics.com/tutorials/general-naming-convention.html ). Within your programming work, I can imagine you want to follow the guidelines within the programming language.