large-dataset
There are 105 repositories under large-dataset topic.
DiskFrame/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
fair-acc/chart-fx
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.
sharmaroshan/Fraud-Detection-in-Online-Transactions
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
extjfx/extjfx
JavaFX extensions
sileod/Discovery
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Esri/large-network-analysis-tools
Tools and code samples for solving large network analysis problems in ArcGIS Pro
zzw922cn/TensorFlow-Input-Pipeline
TensorFlow Input Pipeline Examples based on multi-thread and FIFOQueue
privefl/bigreadr
R package to read large text files based on splitting + data.table::fread
kyegomez/EXA-1
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
matteodelabre/saxophone
Fast and lightweight event-driven streaming XML parser in pure JavaScript
Imtiazkarimik23/SPEC5G
This repository contains the code and data of the paper titled "SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis" published at AACL 2023.
ec-jrc/Thalassa
Large scale visualizations of unstructured mesh data
MaxHalford/tuna
:fish: A streaming ETL for fish
guypeer8/csv-streamer
💧A stream based csv aggregator for limiting RAM usage while processing large data sets.
vunguyentuan/react-virtual-slider
Virtual Slider/Carousel for React
hto/redis-large-hset-del
Large HSET keys delete on Redis.
zglu/ivis
IVIS is a chart editor for interactive visualisation, based on jQuery and HighCharts. Chart types include dot/scatter, 2D scatter, line, bar/column, pie, and heat map. It's powerful when analysing large data sets.
gjcampbell/ooffice
Some components for internal, line of business angular apps
davidssmith/RawArray.jl
Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage
imdeepmind/AmazonReview-LanguageGenerationDataset
Processed Amazon Review Dataset for Language Generation (Character Level)
JakobLS/100-million-rows-with-spark
Is it feasable to train a model on 100 million ratings using nothing more than a common laptop? Let's find out.
JJChenCharly/OrthoSLC
OrthoSLC: A pipeline to get Orthologous Genes using Reciprocal Best Blast Hit (RBBH) Single Linkage Clustering, indenpendent of relational database management system
tejas-gokhale/music_reco_deep_learning
Project for CMU 15-780 Graduate Artificial Intelligence
data-preservation-programs/open-panda
A platform for the world's largest open datasets, stored on a decentralized network
DevExpress-Examples/winforms-grid-load-refresh-detail-data-from-database
Dynamically load and refresh detail data (master-detail).
vjgpt/Home-Credit-Default-Risk
Objective of this competition is to use historical loan application data to predict whether or not an applicant will be able to repay a loan.
bugthesystem/cerebro
Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)
ChelseaGH/KSL-Guide
KSL-Guide: A Large-scale Korean Sign Language Dataset Including Interrogative Sentences for Guiding the Deaf and Hard-of-Hearing, FG, 2021
mofesolapaul/tableau
Experiment in a bid to produce a custom-design table for presenting large data
samridhishree/Machine-Learning-for-Large-Datasets
Machine Learning models for large datasets
jblasch/ECE428___K-L_Tool
VLSI with CAD: Python program which accepts file input and determines the minimum cutset
JoetheManHowie/NUSCAN
All the code required to reproduce the results in our paper "Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem"
Lizhecheng02/Kaggle-LLM-Detect_AI_Generated_Text
Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts.
m-wells/AlignedBinaryFormat.jl
Memory-mapping made easy.
ThomasByr/BioInfo-genome
🧬 large scale genes data analysis software
Gautamaggrawal/Realtime-logging
A realtime web based watching solution, akin to UNIX's tail -f command, employs Django Channels for real-time monitoring. It obviates page refreshes, efficiently streams updates, supports multiple connections, and shows the last 10 lines of the log.