tobiasquinteiro/walkaround

Wave index wiped due to Full Text Search launch, re-indexing mapreduce fails due to bad data

Closed this issue · 12 comments

What steps will reproduce the problem?
1.Import waves from google wave
2.Add more waves
3.Try and check those waves

What is the expected output?  What do you see instead?
I expect there to be twenty odd waves instead there are none

What browser and browser version are you using?  On what operating 
system?Firefox 12.0, Mac OS X


What URL does your browser show when the problem occurs?  Did you compile
walkaround on your machine, or are you using a public instance?
Public server. URL is https://wavereactor.appspot.com/

Please provide any additional information below.
The waves were fine and intact yesterday.

Original issue reported on code.google.com by arcticCr...@gmail.com on 7 May 2012 at 4:30

It looks like App Engine's full text search index was cleared.  This affects 
all walkaround instances.  The solution is to go to /admin/mapreduce and run a 
"Re-index all waves" job.  This is currently running on wavereactor, your waves 
should reappear shortly.

Original comment by oh...@google.com on 7 May 2012 at 5:46

  • Changed state: Accepted
It seems like the indexing stopped after indexing about ~23K waves. 

Original comment by vega113 on 8 May 2012 at 6:40

Seems the indexing is choking on bad participant IDs that have spaces in them; 
I attempted a fix in 
http://code.google.com/p/walkaround/source/detail?r=b3059346344296f044929343c4f9
be0aa7444406 but haven't tested it.  Yuri, can you deploy this and re-run the 
mapreduce?

The stack trace is

com.google.walkaround.wave.server.servlet.ServerExceptionFilter sendError: 
IllegalArgumentException; overQuota=false; sending 500: Internal server error
java.lang.IllegalArgumentException: indexName must be ASCII visible printable: 
USRIDX5- wavegroupy@appspot.com
    at com.google.appengine.api.search.checkers.Preconditions.checkArgument(Preconditions.java:85)
    at com.google.appengine.api.search.checkers.IndexChecker.checkName(IndexChecker.java:40)
    at com.google.appengine.api.search.IndexSpec$Builder.setName(IndexSpec.java:45)
    at com.google.walkaround.wave.server.index.WaveIndexer.getIndex(WaveIndexer.java:567)
    at com.google.walkaround.wave.server.index.WaveIndexer.index(WaveIndexer.java:446)
    at com.google.walkaround.wave.server.index.WaveIndexer.indexConversation(WaveIndexer.java:304)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper$Handler$1.run(ReIndexMapper.java:58)
    at com.google.walkaround.util.server.RetryHelper$3.run(RetryHelper.java:182)
    at com.google.walkaround.util.server.RetryHelper$3.run(RetryHelper.java:180)
    at com.google.walkaround.util.server.RetryHelper.runBodyOnce(RetryHelper.java:142)
    at com.google.walkaround.util.server.RetryHelper.run(RetryHelper.java:156)
    at com.google.walkaround.util.server.RetryHelper.run(RetryHelper.java:180)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper$Handler.process(ReIndexMapper.java:53)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper.map(ReIndexMapper.java:72)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper.map(ReIndexMapper.java:43)
    at com.google.appengine.tools.mapreduce.v2.impl.handlers.Worker.processMapper(Worker.java:111)
    at com.google.appengine.tools.mapreduce.v2.impl.handlers.Worker.handleMapperWorker(Worker.java:289)
    at com.google.appengine.tools.mapreduce.MapReduceServlet.doPost(MapReduceServlet.java:190)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

Original comment by oh...@google.com on 9 May 2012 at 12:38

Done. Redeployed with updated code and initiated a new mapreduce job.

Original comment by vega113 on 9 May 2012 at 9:30

Seems like it got stuck again.

Original comment by vega113 on 10 May 2012 at 6:04

Meh, there seem to be participant IDs with newlines as well, not just spaces.  
Another attempt at working around this is in 
http://code.google.com/p/walkaround/source/detail?r=8d9e56efc0b756d279d4399801a2
3c9d81ad1004 .  I filed 
http://code.google.com/p/walkaround/issues/detail?id=109 to catch invalid 
participant IDs before they get added to waves.

For some waves, the model code also fails to create a conversation with a stack 
trace like below.  I committed a fix to this, too, 
http://code.google.com/p/walkaround/source/detail?r=b02d422a7e2a8906cfffa66eb7df
a658f1fe44ef .

Yuri, can you deploy another version?  I think the mapreduce will transparently 
start running the new code, no need to restart it.


com.google.walkaround.wave.server.servlet.ServerExceptionFilter sendError: 
IllegalArgumentException; overQuota=false; sending 500: Internal server error
java.lang.IllegalArgumentException: Failed to create conversation on wavelet 
[WaveId walkaround/w+WeNolo0kZlTLD0hD] [WaveletId 
walkaround/conv+WeNolo0kZlTLD0hD]
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversation.<init>(WaveletBasedConversation.java:302)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversation.create(WaveletBasedConversation.java:271)
    at org.waveprotocol.wave.model.conversation.WaveBasedConversationView$ConversationContainer.getConversation(WaveBasedConversationView.java:73)
    at org.waveprotocol.wave.model.conversation.WaveBasedConversationView.createContainer(WaveBasedConversationView.java:246)
    at org.waveprotocol.wave.model.conversation.WaveBasedConversationView.<init>(WaveBasedConversationView.java:143)
    at org.waveprotocol.wave.model.conversation.WaveBasedConversationView.create(WaveBasedConversationView.java:123)
    at com.google.walkaround.wave.server.index.WaveIndexer.getConversation(WaveIndexer.java:514)
    at com.google.walkaround.wave.server.index.WaveIndexer.getConvFields(WaveIndexer.java:375)
    at com.google.walkaround.wave.server.index.WaveIndexer.indexConversation(WaveIndexer.java:286)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper$Handler$1.run(ReIndexMapper.java:58)
    at com.google.walkaround.util.server.RetryHelper$3.run(RetryHelper.java:182)
    at com.google.walkaround.util.server.RetryHelper$3.run(RetryHelper.java:180)
    at com.google.walkaround.util.server.RetryHelper.runBodyOnce(RetryHelper.java:142)
    at com.google.walkaround.util.server.RetryHelper.run(RetryHelper.java:156)
    at com.google.walkaround.util.server.RetryHelper.run(RetryHelper.java:180)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper$Handler.process(ReIndexMapper.java:53)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper.map(ReIndexMapper.java:72)
    at com.google.walkaround.wave.server.wavemanager.ReIndexMapper.map(ReIndexMapper.java:43)
    at com.google.appengine.tools.mapreduce.v2.impl.handlers.Worker.processMapper(Worker.java:111)
    at com.google.appengine.tools.mapreduce.v2.impl.handlers.Worker.handleMapperWorker(Worker.java:289)
    at com.google.appengine.tools.mapreduce.MapReduceServlet.doPost(MapReduceServlet.java:190)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.walkaround.util.server.servlet.RequestStatsFilter.doFilter(RequestStatsFilter.java:95)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.walkaround.wave.server.servlet.ServerExceptionFilter.doFilter(ServerExceptionFilter.java:121)
    at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.google.apphosting.utils.servlet.ParseBlobUploadFilter.doFilter(ParseBlobUploadFilter.java:102)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.google.apphosting.runtime.jetty.SaveSessionFilter.doFilter(SaveSessionFilter.java:35)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:43)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.handle(AppVersionHandlerMap.java:249)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
    at com.google.apphosting.runtime.jetty.RpcRequestParser.parseAvailable(RpcRequestParser.java:76)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:135)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:446)
    at com.google.tracing.TraceContext$TraceContextRunnable.runInContext(TraceContext.java:449)
    at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)
    at com.google.tracing.TraceContext.runInContext(TraceContext.java:695)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContextNoUnref(TraceContext.java:333)
    at com.google.tracing.TraceContext$AbstractTraceContextCallback.runInInheritedContext(TraceContext.java:325)
    at com.google.tracing.TraceContext$TraceContextRunnable.run(TraceContext.java:453)
<continued in next message>
E 2012-05-10 14:20:09.408
<continued from previous message>
at 
com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java
:251)
    at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.NullPointerException: StringMap cannot contain null keys
    at org.waveprotocol.wave.model.util.Preconditions.checkNotNull(Preconditions.java:126)
    at org.waveprotocol.wave.model.util.CollectionUtils$StringMapAdapter.put(CollectionUtils.java:155)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversationBlip.adaptThread(WaveletBasedConversationBlip.java:458)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversationBlip.create(WaveletBasedConversationBlip.java:191)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversationThread.adaptBlip(WaveletBasedConversationThread.java:324)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversationThread.create(WaveletBasedConversationThread.java:88)
    at org.waveprotocol.wave.model.conversation.WaveletBasedConversation.<init>(WaveletBasedConversation.java:299)
    ... 87 more


Original comment by oh...@google.com on 10 May 2012 at 9:33

Original comment by oh...@google.com on 10 May 2012 at 9:33

  • Added labels: Priority-Critical
  • Removed labels: Priority-Medium
Well meant, but possibly bad advice based on gut-feel and not founded on hard 
facts...

Now I don't know your code and I haven't looked at the data, but are you sure 
the root cause of the crash is just blanks and newlines showing up in ID's?  On 
the one hand, it makes sense to harden the handling of ID's so the code is 
robust against bad ID's, and as noted in issue 109, so the bad ID's are kept 
from getting into the system in the first place.  But on the other hand, didn't 
this problem first surface while doing a conversion of old waves to Walkaround? 
  Could this be telling you there is something wildly not lined up in the 
parsing of the old wave data?  I'm thinking that this smells like the 
sensitivity to bad ID's is just the canary that is telling you something is 
more deeply wrong.   Try to reverse engineer how a small surprise in the data 
to be converted could deeply mess things up.  e.g.: Would a random null 
character in the old wave trip up the parsing of which field is what one, so 
you are picking up ID's that weren't really supposed to be ID's in the first 
place?

I base this on having seen many a S0C7 crash in IBM/360 mainframe programs that 
were expecting records to contain packed digital data, but ended up with a 
blank character scrogging the bits that were supposed to be the sign.   The 
real problem was to find how the bad data was getting into the field where 
there were supposed to be numbers.  Rarely was it more than a workaround to 
make the program simply not barf on non-numeric data in the place it was trying 
to treat as a number.

Original comment by r.drew.d...@gmail.com on 12 May 2012 at 3:07

It's possible in principle that our JSON or protobuf encoding/decoding logic is 
buggy and injects spurious spaces, but that logic is used for many other 
purposes as well; so unless we start seeing similar errors in other areas, it 
seems more likely that the problem is that users have added participants with 
bad IDs, either in Google Wave or in walkaround.

Original comment by oh...@google.com on 12 May 2012 at 10:25

Original comment by oh...@google.com on 12 May 2012 at 10:27

  • Changed title: Wave index wiped due to Full Text Search launch, re-indexing mapreduce fails due to bad data
I deployed updated version and it seems to work fine this time. I think the 
issue can be closed.

Original comment by vega113 on 14 May 2012 at 10:55

Original comment by ohl...@gmail.com on 25 May 2012 at 8:16

  • Changed state: Fixed