Onboarding guide

best practices
Author

Miel Hostens, et al.

Published

October 18, 2024

Bovi-Analytics onboarding guide

First of all, welcome!

You might get overwhelmed at the beginning of this journey, but we are there to guide you through!

Getting started at the Bovi-Analytics lab

First, we want to tell the world you are part of the bovi-analytics team so you need to introduce yourself!

  1. Create a https://github.com profile (if you don’t have one) and request Miel to be added to the Github bovi-analytics team. That way you can have access to all repositories.

  2. Create your own Quarto file (qmd extension) using R studio or any other text editor and a profile headshot https://github.com/Bovi-analytics/bovi-analytics-website

You can also do this within GitHub directly nowadays

Bovi-analytics team communication

We use different ways to communicate, directly and indirectly. I (Miel) don’t like emails for direct communication. Similarly, I avoid sending files that need editing across teammates. Therefore, I started using several tools:

  • Join Lab Teams Workspace

    We typically use this for project & file management (and plan teams meetings of course)

  • Join Lab Slack

    Slack was created before Teams was, and I have been using it for a while now. I prefer it over Teams for direct messaging.

Project management

I have been using several tools in the past, such as JIRA. Right now I’m using the Planner tool within the Office365 suite from Microsoft which is available for all Cornell personel and students via https://portal.office.com . There you can create a planner for a project using e.g. KanBan project management. I like it that you can link it to a Teams project channel.

Computational languages

The team uses multiple languages but the most popular once are:

  • R => for statistics, don’t use it for data intensive tasks!

    • IDE = R-Studio
  • Python => data pipelines, compute intensive, AI

    • IDE => PyCharm
  • Scala => scalable parallel processing of data

    • IDE => IntelliJ IDEA

Computational frameworks

One of the most popular frameworks to perform ‘big data’ is https://spark.apache.org . It’s often used by Databricks. At a certain moment you might use it when you want to analyze data that doesn’t fit into your computer’s memory anymore. 

Computational resources and access to them

  • Most of what we do computationally uses the Microsoft Azure cloud. You probably won’t need access to it, but it’s nice for you to understand what it is.

  • One of the most important cloud-based tools you probably will use is Databricks. The platform allows you to perform data science in the cloud, and especially work together easily across the team.

Data visualization

One of the very popular visualization tools we use is http://tableau.com. You can get a version of it using this tutorial (https://bovi-analytics.com/tutorials/tableau.html )!

Best practices

  • Try to get familiar with the FAIR principles (https://www.go-fair.org/fair-principles/ ) as they are an important part of the philosophy of the lab.

  • Because of that

    • Make sure to get used to work with version control using git. Follow some tutorials, try using it with R-studio, PyCharm, Intellij IDEA

    • Within everything you do, start applying a naming convention as soon as possible (See here https://bovi-analytics.com/tutorials/general-naming-convention.html ). Within your programming work, I can imagine you want to follow the guidelines within the programming language.