Tools
Which tools are commony used by collaborators
Programming languages
In the early ages of my programming I started programming in VB.NEt, did most of my statistics in SAS for most of my PhD studies. However, as open source became mainstream I have switched (maybe to often). Nowadays, I tend to advice on using several languages. Once you know 1, switching is more easy. And there is always StackOverflow!
[R] when using smaller datasets that need quick and dirty frequentist statistics. Often rendered using Rmd notebooks and pushed to Github, e.g.
Scala language when performing more complex data engeneering projects, e.g.
Local processing (when data fits on-prem)
Development environments
For my local processsing, I combine multiple IDE’s. I tried one-size-fits-all approaches such as visual studio but ended up using the following options.
Python -> Pycharm -> Jupyter notebook
[R] -> R Studio
Scala -> IntelliJ IDEA
Clould processing (when data doesn’t fit on-prem)
Data processing framework
Python and [R] can be easily used through Google Colab.
Python, [R] and scala can be performed on Databricks Community for students. Apply for the community edition here, be sure not the click any of the providers (Google, Amazon, Azure) but select the ‘Get started with Community Edition’.
When in need of larger scale processing power I love to use Apache Spark for distributed analysis and parallel processing using the Apache Spark on Azure Databricks.
Microsoft Azure as current cloud platform.
Others tools
- Github as code repository, this website is hosted through github pages.
- This website is build using Quarto, a new tool integrated well in R studio and others.
- Markdown ( https://www.markdownguide.org/ ).