1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Jul 31, 2025 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Bucketize an image based on exhaust data and AI generated data. industry-solutions azure azure machine learning services computer-vision big data big data analytics machine learning image recognition manufacturing quality control cognitive services
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Plugin offering views, operators, sensors, and more developed at Pandora Media.
This project analyses and correlates student performance with different attributes. Then at last, it determines most suitable algorithm from bunch of them.
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
Iot,Big Data Analytics using Apache-kafka,spark and other aws services
In this tutorial we explain how to get real time analytics of energy produced and consumed from two solar stations simulators using influxDB together with grafana hosted on the kubernetes engine of google
Repository for the Big Data Specialization from University of California San Diego on Coursera
Real-time YouTube comment sentiment analysis using Kafka, Spark, and Streamlit dashboard.
SSVC Ore Miner - www.rapticore.com
EpiData IoT Data Science Platform - Community Edition
This is a repository containing my code samples that helped me understand the concepts of distributed storage and processing of Big data using Apache spark and Python.
SocialSituSecu is a project exploring the social network security, computing and intelligence basd on social situational metadata, which is sponsored by National Natural Science Foundation of China Grant No.61972133, and Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant No.204200…
This repository analyzes the Multivariate workload data of Google Cluster machines.
Add a description, image, and links to the big-data-analytics topic page so that developers can more easily learn about it.
To associate your repository with the big-data-analytics topic, visit your repo's landing page and select "manage topics."