logo
Github
  • Home
  • Pricing
logo
Github
Copyright © 2025 Github. Ltd.
Links
SubscribeManage Subscription
Powered by Postion - Create. Publish. Own it.
Privacy policy•Terms

Postion

Github weekly trending - DataExpert-io / data-engineer-handbook

Github weekly trending - DataExpert-io / data-engineer-handbook

k
by kuo
•Jun 28, 2025

The Data Engineering Handbook

This repo has all the resources you need to become an amazing data engineer!

Getting started

If you are new to data engineering, start by following this 2024 breaking into data engineering roadmap

If you are here for the 6-week free YouTube boot camp you can check out

  • introduction

  • software needed

For more applied learning:

  • Check out the projects section for more hands-on examples!

  • Check out the interviews section for more advice on how to pass data engineering interviews!

  • Check out the books section for a list of high quality data engineering books

  • Check out the communities section for a list of high quality data engineering communities to join

  • Check out the newsletter section to learn via email

Resources

Great list of over 25 books

Top 3 must read books are:

  • Fundamentals of Data Engineering

  • Designing Data-Intensive Applications

  • Designing Machine Learning Systems

Great list of over 10 communities to join:

Top must-join communities for DE:

  • DataExpert.io Community Discord

  • Data Talks Club Slack

  • Data Engineer Things Community

Top must-join communities for ML:

  • AdalFlow Discord

  • Chip Huyen MLOps Discord

Companies:

  • Orchestration

    • Mage

    • Astronomer

    • Prefect

    • Dagster

    • Airflow

    • Kestra

    • Shipyard

    • Hamilton

  • Data Lake / Cloud

    • Tabular

    • Microsoft

    • Databricks

    • Onehouse

    • Delta Lake

    • Ilum

  • Data Warehouse

    • Snowflake

    • Firebolt

    • Databend

  • Data Quality

    • dbt

    • Metaplane

    • Gable

    • Great Expectations

    • Streamdal

    • Coalesce

    • Soda

    • DQOps

    • HEDDA.IO

    • Dingo

  • Education Companies

    • DataExpert.io

    • LearnDataEngineering.com

    • AlgoExpert

    • ByteByteGo

  • Analytics / Visualization

    • Preset

    • Starburst

    • Metabase

    • Looker Studio

    • Tableau

    • Power BI

    • Hex

    • Apache Superset

    • Evidence

    • Redash

    • Lightdash

  • Data Integration

    • Cube

    • Fivetran

    • Airbyte

    • dlt

    • Sling

    • Meltano

  • Semantic Layers

    • Cube

    • dbt Semantic Layer

  • Modern OLAP

    • Apache Druid

    • ClickHouse

    • Apache Pinot

    • Apache Kylin

    • DuckDB

    • QuestDB

    • StarRocks

  • LLM application library

    • AdalFlow

    • LangChain

    • LlamaIndex

  • Real-Time Data

    • Aggregations.io

    • Responsive

    • RisingWave

    • Striim

  • Data Lineage

    • OpenLineage

Data Engineering blogs of companies:

  • Netflix

  • Uber

  • Databricks

  • Airbnb

  • Amazon AWS Blog

  • Microsoft Data Architecture Blogs

  • Microsoft Fabric Blog

  • Oracle

  • Meta

  • Onehouse

Data Engineering Whitepapers:

  • A Five-Layered Business Intelligence Architecture

  • Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics

  • Big Data Quality: A Data Quality Profiling Model

  • The Data Lakehouse: Data Warehousing and More

  • Spark: Cluster Computing with Working Sets

  • The Google File System

  • Building a Universal Data Lakehouse

  • XTable in Action: Seamless Interoperability in Data Lakes

  • MapReduce: Simplified Data Processing on Large Clusters

  • Tidy Data

Social Media Accounts

Here's the mostly comprehensive list of data engineering creators: (You have to have at least 5k followers somewhere to be added!)

YouTube

NameYouTube ChannelFollower CountByteByteGoByteByteGo1,000,000+Zach WilsonData with Zach150,000+Shashank MishraE-learning Bridge100,000+Seattle Data GuySeattle Data Guy100,000+TrendyTechTrendyTech100,000+Darshil ParmarDarshil Parmar100,000+Andreas KretzAndreas Kretz100,000+The Ravit ShowThe Ravit Show100,000+Guy in a CubeGuy in a Cube100,000+Adam MarczakAdam Marczak100,000+nullQueriesnullQueries100,000+TECHTFQ by ThoufiqTECHTFQ by Thoufiq100,000+SQLBISQLBI100,000+Alex FrebergAlex The Analyst100,000+Ankur RanjanBig Data Show100,000+Prashanth Kumar PandeyScholarNest77,000+ITVersityITVersity67,000+Soumil ShahSoumil Shah50,000Ansh LambaAnsh Lamba18,000+Azure LibAzure Lib10,000+Advancing AnalyticsAdvancing Analytics10,000+Kahan Data SolutionsKahan Data Solutions10,000+Ankit BansalAnkit Bansal10,000+Mr. K Talks TechMr. K Talks Tech10,000+Samuel FochtPython Basics10,000+Mehdi OuazzaMehdio DataTV3,000+Alex MercedAlex Merced DataN/AJohn KutayJohn KutayN/AEmil KaminskiDatabricks For Professionals5,000+

LinkedIn

NameLinkedIn ProfileFollower CountZach WilsonZach Wilson400,000+Chip HuyenChip Huyen250,000+Shashank MishraShashank Mishra100,000+Seattle Data GuyBen Rogojan100,000+TrendyTechSumit Mittal100,000+Darshil ParmarDarshil Parmar100,000+Andreas KretzAndreas Kretz100,000+ByteByteGo (Alex Xu)Alex Xu100,000+Azure Lib (Deepak Goyal)Deepak Goyal100,000+Alex FrebergAlex Freberg100,000+SQLBI (Marco Russo)Marco Russo50,000+Ankit BansalAnkit Bansal50,000+Marc LambertiMarc Lamberti50,000+Ankur RanjanAnkur Ranjan48,000+ITVersity (Durga Gadiraju)Durga Gadiraju48,000+Prashanth Kumar PandeyPrashanth Kumar Pandey37,000+Alex MercedAlex Merced30,000+Ijaz AliIjaz Ali24,000+Mehdi OuazzaMehdi Ouazza20,000+Ananth PackkilduraiAnanth Packkildurai18,000+Ansh LambaAnsh Lamba13,000+Manojkumar VadivelManojkumar Vadivel12,000+Advancing AnalyticsSimon Whiteley10,000+Li YinLi Yin10,000+Jaco van GelderJaco van Gelder10,000+Joseph MachadoJoseph Machado10,000+Eric RobyEric Roby10,000+Simon SpätiSimon Späti10,000+Constantin LunguConstantin Lungu10,000+Lakshmi SontenamLakshmi Sontenam9,500+Soumil ShahSoumil Shah8,000+Arnaud MillekerArnaud Milleker7,000+Dimitri VisnadiDimitri Visnadi7,000+LennyLenny A6,000+Dipankar MazumdarDipankar Mazumdar5,000+Daniel CiocirlanDaniel Ciocirlan5,000+Hugo LuHugo Lu5,000+Tobias MaceyTobias Macey5,000+Marcos OrtizMarcos Ortiz5,000+Julien HuraultJulien Hurault5,000+John KutayJohn Kutay5,000+Hassaan AkbarHassaan Akbar5,000+SubhankarSubhankar5,000+NitinNitinN/AHassaanHassaan5000+Javier de la TorreJavier5000+

X/Twitter

NameX/Twitter ProfileFollower CountByteByteGoalexxubyte100,000+Dan Kornas@dankornas66,000+Zach WilsonEcZachly30,000+Seattle Data GuySeattleDataGuy10,000+SQLBImarcorus10,000+Joseph Machadostartdataeng5,000+Alex Merced@amdatalakehouseN/AJohn Kutay@JohnKutayN/AMehdi Ouazzamehd_ioN/A

Instagram

NameInstagram ProfileFollower CountSundas Khalidsundaskhalidd300,000+Zach Wilsoneczachly150,000+Andreas Kretzlearndataengineering5,000+Alex Merced@alexmercedcoderN/A

TikTok

NameTikTok ProfileFollower CountZach Wilson@eczachly70,000+Alex Freberg@alex_the_analyst10,000+Mehdi Ouazza@mehdio_datatvN/A

Great Podcasts

  • The Data Engineering Show

  • Data Engineering Podcast

  • DataTopics

  • The Data Engineering Side Of Data

  • DataWare

  • The Data Coffee Break Podcast

  • The Datastack show

  • Intricity101 Data Sharks Podcast

  • Drill to Detail with Mark Rittman

  • Analytics Power Hour

  • Catalog & cocktails

  • Datatalks

  • Data Brew by Databricks

  • The Data Cloud Podcast by Snowflake

  • What's New in Data

  • Open||Source||Data by Datastax

  • Streaming Audio by confluent

  • The Data Scientist Show

  • MLOps.community

  • Monday Morning Data Chat

  • The Data Chief

Great list of 20+ newsletters

Top must follow newsletters for data engineering:

  • DataEngineer.io Newsletter

  • Joe Reis

  • Start Data Engineering

  • Data Engineering Weekly

Glossaries:

  • Data Engineering Vault

  • Airbyte Data Glossary

  • Data Engineering Wiki by Reddit

  • Seconda Glossary

  • Glossary Databricks

  • Airtable Glossary

  • Data Engineering Glossary by Dagster

Design Patterns

  • Cumulative Table Design

  • Microbatch Deduplication

  • The Little Book of Pipelines

  • Data Developer Platform

Courses / Academies

  • DataExpert.io course use code HANDBOOK10 for a discount!

  • LearnDataEngineering.com

  • Technical Freelancer Academy Use code zwtech for a discount!

  • IBM Data Engineering for Everyone

  • Qwiklabs

  • DataCamp

  • Udemy Courses from Shruti Mantri

  • Rock the JVM teaches Spark (in Scala), Flink and others

  • Data Engineering Zoomcamp by DataTalksClub

  • Efficient Data Processing in Spark

  • Scaler

  • DataTeams - Data Engingeer hiring platform

  • Udemy Courses from Daniel Blanco

Certifications Courses

  • Google Cloud Certified - Professional Data Engineer

  • Databricks - Certified Associate Developer for Apache Spark

  • Databricks - Data Engineer Associate

  • Databricks - Data Engineer Professional

  • Microsoft DP-203: Data Engineering on Microsoft Azure

  • Microsoft DP-600: Fabric Analytics Engineer Associate

  • Microsoft DP-700: Fabric Data Engineer Associate

  • AWS Certified Data Engineer - Associate

Comments (0)

Continue Reading

Github weekly trending -twentyhq / twenty

Published Jul 1, 2025

Github weekly trending - stanford-oval / storm

This is a tool for routing Claude Code requests to different models, and you can customize any request.

Published Jun 28, 2025

Github weekly trending - DrKLO / Telegram

Published Jun 28, 2025

Github weekly trending --ottomator-agents

Published Jul 3, 2025

Github weekly trending --Graphite

Published Jul 3, 2025

Github weekly trending - AykutSarac / jsoncrack.com

Published Jul 3, 2025