Yelp/mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services

PythonNOASSERTION

Issues

Python 3.12 support
#2222 opened 10 months ago by dotlambda
0
TypeError when writing to stderr within a job on Python 3
#2153 opened 5 years ago by riazjahangir
1
Read Specific Column From csv file
#2219 opened 2 years ago by Hetaksh
0
total sort
#2217 opened 3 years ago by heckboy-star
1
Failure to run mrjob on dataproc
#2216 opened 3 years ago by BradHolmes
0
code breaks locally but runs fine remotely on hadoop cluster
#2211 opened 3 years ago by my-umd
2
trying to run mr job python script
#2213 opened 3 years ago by iitspratham
0
Hadoop counter in mrjob
#2212 opened 3 years ago by ShunyangLi
0
Error when running on hadoop "Found 2 unexpected arguments on the command line"
#2201 opened 4 years ago by nadavdor15
1
running local mode error
#2168 opened 4 years ago by logique233
0
ignore unrecognized arguments
#2210 opened 4 years ago by dhuy237
1
Assign tags on EMR creation in single API call
#2207 opened 4 years ago by mgmarino
0
Can I write map and reduce in many different class?
#2206 opened 4 years ago by dhuy237
0
It possible to prevent decompression and/or splitting in local or inline mode
#2205 opened 4 years ago by anjackson
0
add_passthru_arg on hadoop
#2204 opened 4 years ago by lobequadrat
0
NameError: argments is not defined
#2190 opened 4 years ago by azzedineA
1
upgrade boto3/botocore to support StepConcurrencyLevel
#2193 opened 4 years ago by coyotemarin
2
useless return value from make_pooled_cluster() in pooling tests
#2196 opened 4 years ago by coyotemarin
0
max_clusters_in_pool option
#2192 opened 4 years ago by coyotemarin
0
add pool_timeout_minutes option
#2199 opened 4 years ago by coyotemarin
0
pool_wait_minutes shouldn't wait if pool is empty
#2198 opened 4 years ago by coyotemarin
0
add pool_jitter_seconds option
#2200 opened 4 years ago by coyotemarin
1
progress indicators are wrong when steps run simultaneously
#2195 opened 4 years ago by coyotemarin
1
integrate describe_cluster() calls with cluster cache
#2186 opened 4 years ago by coyotemarin
0
join pooled clusters based on yarn cluster metrics
#2191 opened 4 years ago by coyotemarin
2
fetching progress from resource manager shouldn't rely on SSH tunnel
#2194 opened 4 years ago by coyotemarin
0
concurrent steps on EMR clusters
#2185 opened 4 years ago by coyotemarin
6
support docker on EMR 6.x AMIs
#2179 opened 4 years ago by coyotemarin
3
Is there any way to connect to a remote Hadoop cluster?
#2180 opened 5 years ago by FlorinAndrei
1
PipeMapRed.waitOutputThreads(): subprocess failed with code 1
#2182 opened 5 years ago by sdiniz73
0
How to launch more than one reducer to execute a job?
#2181 opened 5 years ago by ParadoxZW
0
EMR: TERMINATED_WITH_ERRORS: The given SSH key name was invalid
#2178 opened 5 years ago by cicerojmm
0
Spark harness is not populating counters when counter-output-dir is not an S3 path
#2176 opened 5 years ago by 88manpreet
1
put most pooling info in cluster name
#2160 opened 5 years ago by coyotemarin
14
lock clusters with EMR tags, not S3
#2161 opened 5 years ago by coyotemarin
8
pooling should use any usable cluster before looking for more
#2164 opened 5 years ago by coyotemarin
1
tags should start with "mrjob:" not "__mrjob"
#2173 opened 5 years ago by coyotemarin
0
cluster locks are never released
#2162 opened 5 years ago by coyotemarin
1
don't list steps when pooling
#2159 opened 5 years ago by coyotemarin
1
default to 'python2.7', not 'python' when on Python 2
#2151 opened 5 years ago by coyotemarin
2
Support Python 3.8
#2150 opened 5 years ago by coyotemarin
1
extra_cluster_params should merge dictionaries
#2154 opened 5 years ago by coyotemarin
0
newer PyYAML doesn't work with Python 3.4
#2149 opened 5 years ago by coyotemarin
2
Suggestion: Add an option to accept empty lines in json
#2148 opened 5 years ago by trisch-me
1
Support --conf-path when using mrjob spark-submit
#2147 opened 5 years ago by mj3c
0
Deprecation warning due to invalid escape sequences in Python 3.8
#2146 opened 5 years ago by tirkarthi
0
Support not waiting on job completion
#2145 opened 5 years ago by mgmarino
1
Override SparkStep's input_path
#2144 opened 5 years ago by mj3c
2
How to use mrjob to read pyspark rdds or dataframe?
#2140 opened 5 years ago by Alxe1
0
import pandas will raise exception: mrjob returned non-zero exit status 256
#2139 opened 5 years ago by Alxe1
1