/summarizer

Text Summarization using LSA in Apache Spark

Primary LanguagePython

Product Review Summarizer

Table of Contents

Introduction

The idea of this project is to build a model for e-commerce data that summarize large amount of customer reviews of a product to give an overview about the product. The result of this model can be used to get an overview of what are the most important reviews that many customers complaining or praising about a particular product without reading all of the reviews. The main tasks involved in this project are data collection, data cleaning, implementation of two summarization algorithms and getting the final summary in Apache Spark.

Dataset

Data Collection

The dataset used for this project is crawled from Amazon.com. The dataset contains products' reviews in separate files for each product, each file contains maximum of 1000 reviews. Since the reviews are sorted by ranking, the first thousand reviews are more than sufficient for the summarization task. Each review in a file contains review_id, ratings, review_title, helpful_votes, total_votes and full_review. The reviews file for a product is named by its product_id and all the metadata about the products are stored in a file called iteminfo.txt.

Note: Data is collected using the customer-review-crawler which is forked from maifeng's crawler written in java. The original version was outdated, so i had to rewrite the whole code that are used for data collection for this project. Please see the commit history for the changes that I made to the forked version.

Commit log comparison between maifen and myself(iamprem)

Description of dataset

product_id.txt -- a file that contains reviews about a product.

review_id       -   Unique id given to a review
ratings         -   Integer value ranges from 1 to 5, describes rating of the product
review_title    -   Punch line given by the reviewer for their review
helpful_votes   -   Number of people found the review was helpful(upvoted)
total_votes     -   Number of people upvoted or downvoted the review
full_reveiw     -   Full review given by the reviewer

itemsinfo.txt -- a file that contains all the product metadata

product_id      -   Unique id for a product
product_name    -   Listed name for the product in Amazon.com
price           -   Price in US Dollars

Challenges in Data Collection

  • Main challenge in data collection is denial-of-service from Amazon.com. So no more than one request can be given per second. This was handled by backing-off for a minute when 503 error occurs and retry after a minute. Though data collection is time consuming, the summarization task doesn't depend on the number of data samples available. So product reviews for 25 products are retrieved for this task.

Sample Data

First five reviews of product "Samsung Galaxy Note Pro 12.2"(B00HWMPSK6)

R3M37E07KXEPVP	5.0	One hell of a tablet	1190	1214	*** Updated 5/8/2014 - if you're using Dolphin Browser with Adobe Flash player do yourself a favor and do NOT update Dolphin Browser if prompted to in the Google PlayStore. It seems like every time they update Dolphin there's a chance for Flash to no longer work without jumping through more hoops *** I come from a Windows OS background and I'm an ex-Apple iPhone (now Galaxy Note 2) and iPad user. I've been waiting a long time for a tablet like this to be released: a large screen, expandable memory and the freedom of Android. Thanks to my Note 2 I have a little over a years of experience with Android OS but I knew what to expect from the Note Pro 12.2. Originally I had my eyes set on the Samsung Galaxy Note 10.1 2014 Edition (32GB, Black), after months of waiting and playing I decided to bite the bullet and buy it. Just before doing so I read of a possible 12" model coming out in the following year so I figured another 6 weeks of waiting can't hurt and patience definitely paid off. Comparing the Note 10.1 2014 Edition and the Note Pro 12.2: One big issue that annoyed me with the Note 10.1 2014 Edition ("14E") was the ridiculous My Magazine app that never goes away. It constantly hogs system resources as it runs in the background even after forcing shut downs of the app, and disabling the Home button link to this app didn't help as it would just keep running in the background. I noticed that the 14E was constantly using 1.5-1.7GB of its 3GB RAM total at all times even after factory resets, disabling apps and reboots. Thankfully the Note Pro 12.2 ("NP12") doesn't have this issue as it hovers around 1.2GB of memory used after a reboot. Even having pages of the magazine feeds and widgets running it still allocates memory usage better and best of all: no pesky My Magazine app! Luckily I was able to compare both tablets side by side at the local Best Buy. This was a large factor in my decision to go with the NP12 over the 14E. Both tablets are about the same when it comes to hardware but there are other major differences between the two. Let's go over some of the features that the NP12 has over the 14E... 1. Over 2 extra inches of real estate on the screen. This means writing on the NP12 with the stylus while in portrait mode is very much like writing on a real piece of 8.5 by 11 inch piece of paper, aka standard notebook paper. The palm rejection does a great job of making sure only the stylus gets recognized for input (if you want that on). I find that a 10 inch screen is too cramped to comfortably write notes on. Do your research, this is a 12 inch tablet so don't be surprised and don't consider it a downside. 2. Larger, longer lasting battery rated at 13 hours (9,500 mAh) versus the 14E's 9 hours (8,220 mAh). My mixed use gets me anywhere from 8 to 12 hours before plugging it in for a charge, depending on what apps I'm using and what the brightness/volume is set at. Lots of gaming/graphic intensive apps will exhaust the battery quickly whereas reading PDFs, E-books, etc. is where the battery really stretches its legs. There are numerous settings and options for the tablet to use minimum energy which I will go over later. 3. Multitasking on the NP12 is fantastic. Multi Window allows you to run 2, 3 or 4 apps on screen at once (the 14E supports up to 2 apps side by side). There are many apps compatible with Multi Window. You can also resize the windows depending on where you need the most room on the screen. Chatting with friends on Hangouts, surfing on Dolphin Browser, watching your favorite movie and checking/writing Emails all on the same screen is productivity at it's finest. You can save Multi Window templates too for quick access so that you don't need to drag the apps one by one every time you want to multitask. Just select your template and all the apps open up automatically. I usually have 3 apps going at once: Video (or Music) at the top left, AquaMail at the bottom left and Dolphin Browser taking up the entire right side of the screen. Bonus feature: there is a trick that allows more than 4 apps open at once on the NP12, up to 9 apps total. First use Multi Window to load up whatever 4 apps you want on the screen, then hold the S-pen above the screen and press the pen's button which bring up Air View. On the Air View pop-up select Pen Window. Now use the S-pen to draw a box and then choose an app to launch in that box. The app loads up in the box you drew. It comes up as a floating window that can be resized, moved around on the screen and also minimized to a floating "bubble" that you can move around anywhere you want. You can do this 5 times total. So 4 apps via Multi Window and 5 apps via pen window. The Multi Window apps are all joined on the main screen while the 5 pen window apps are solo floating windows and can be minimized to a bubble to have them out of the way. Pretty cool! 4. KitKat 4.4: the new "Magazine Home" user interface (UI) is a little like Windows 8 but at the same time its different. After playing with it for a little while you'll actually want to use it and set up multiple pages filled with your news subscriptions, feeds, widgets, etc. I did not think I would like the dashboard/widget style UI but on a screen this size it works well and looks great. Basically you can set up different pages filled with all sorts of widgets ranging from: Application widgets (Email, Calendar, Music, Video etc.) Social widgets (Twitter, Flickr, YouTube etc.) News widgets (Art & Culture, Science & Technology, News, Style, Sports, Business, etc.) Each news category has dozens and dozens of publications to subscribe to and I find myself spending more time reading articles than I ever thought I would. If you're not a fan of the new UI the familiar Android desktop is still there with your standard icons and widgets. If you'd rather customize the tablet completely I highly recommend Nova Launcher. Those are the main differences to me. Now onto some general information about the tablet as well as apps I use and features worth checking out: Like every other phone, tablet and computer out there the NP12 does come with some bloatware. There are different types of bloatware so I'll cover each one: 1. Some preloaded apps can be uninstalled completely giving you back some minor storage space. This is a good thing. 2. Some preloaded apps cannot be uninstalled and instead can have "most" of their data cleared and then "turned off" - these apps remain installed on the tablet but are disabled. Some of the preloaded apps that can only be "turned off" are Chrome, e-Meeting, DropBox, Gmail, Google+, Twitter, Cisco WebEx and others. These are just some that I do not use myself so I turned them off. Even when turned off/disabled, these apps still take up some room but its very little, 5MB or less per app. 3. Some apps cannot be uninstalled nor can they be turned off. They can have most of their data cleared though. Either way this is just plain stupid. I can only think of two off the top of my head: Evernote and RemotePC. I don't use these particular apps at all so why can't I at least disable them so they don't show up in my apps list? Both of these apps use about 9MB of space combined. Again it's not a deal breaker for me but Samsung y u do dis?! I bought the 32GB model. The tablet needs about 6-6.5GB for the operating system and preloaded software/apps. Out of the box you have around 25.5GB of space to play with, this of course depends on what preloaded apps you choose to keep/update or disable/uninstall so your free space will vary. A great investment would be a SanDisk Ultra 64 GB microSDXC Class 10 UHS-1 Memory Card 30MB/s with Adapter SDSDQUA-064G-U46A, which gives you another 60GB~ of extra space for apps, movies, music, pictures, whatever you want to put on it. This SD card plus the tablet's free space adds up around 85GB which for me was more than enough. It's very easy adding your media files to the SD card, just plug the card into your PC with the supplied adapter and create folders like Music, Movies, etc. then copy/paste or drag and drop Artists folders into the Music folder, and videos into the Movies folder. I loaded music, movies and pictures on the SD card and I leave the tablet's internal storage for apps since applications seem to run faster off of the internal storage as opposed to off the SD card. If you think you need more space you can go big with SanDisk's newest 128GB card also found on Amazon. The screen is beautiful. High quality photographs and wallpapers at the native resolution (2560x1600 or bigger) look really, really good. If you're looking for some stunning wallpaper for your tablet check out: interfacelift.com/wallpaper/downloads/date/widescreen_16:10/2560x1600/ This model is WiFi ONLY! There is NO slot for a SIM card. The WiFi model also works with your MiFi device and can also connect to the internet via a mobile hotspot such as your iPhone or Galaxy smartphones. I use my Note 2 smartphone as a mobile hotspot and it works great. If you are looking for the 4G/LTE version you'll need to check out the model that Verizon offers. It's available through Verizon or here on Amazon, just search "note pro 12.2 verizon" GPS - yes the tablet has GPS built in, you do not need to be connected to the internet for GPS to work (such as the Google Maps app). Hancom Office is included with the NP12. This is Microsoft Office for Android, simple as that. You get full versions of word processing (Word), spreadsheets (Excel), and slideshow (PowerPoint). It all works very well with the large 12.2" screen and I find it very familiar after using MS Office for so long. You can even open existing word, excel and powerpoint documents with Hancom Office and continue editing and creating. Hancom Office is fantastic, however I must admit it was a little involved to get it up and running, so I will explain as best I can: After the initial setup of your NP12, go to your apps (bottom right, little white squares) and launch the "Samsung Apps" app, it will require an update so go ahead and download then install the update. After that is done, go back to your apps and launch Hancom Viewer (there is also a Hancom widget in one of the two default Magazine Home screens if you want to launch it that way). When you launch it a small white box should come up and it will say installing fonts and files, etc. so let it do that and when its done the Hancom Viewer app should open completely. On the left side select Office Download, then OK. This should open the Hancom Office Update Manager. When you do this there's a good chance the Update Manager will prompt you to update it so go ahead and do that. After its updated you can launch it again, the Hancom Office Update Manager is where all the Hancom apps are downloaded/updated/installed. You'll see 12 rows of Hancom apps such as Hcell, Hshow, Hwp, etc. Update and/or install all the ones that need to be done by pressing the rectangular button at the far right of each row. Each time you press the install or update button on the far right it will take you to Samsung Apps where you can press the Update button. Go ahead and update all 12 apps if needed, some may already be updated but will need to be installed (or vice versa). Once every button on all 12 rows is greyed out in Hancom Office Update Manager, then you know you're done! Now just go to your apps and launch whichever one you like. Hcell is excel, Hshow is powerpoint, and Hword is word processor. Hangouts is a cool app and it comes with the tablet. It lets you text/message anyone with an email address or phone number. If you are familiar with iMessage then you know what this is. *** UPDATED BELOW Dolphin Browser is a must for surfing the web. It is fast, supports tabs, and has a cool feature named speed dial. A speed dial is a bookmark placed on the Dolphin home screen for easy, one-tap access to the webpage. You can have pages of speed dials, its very useful and beats going down a list of bookmarks looking for a particular site. Perhaps the best feature of Dolphin Browser is that it supports Adobe Flash and Java. I bet your iPad can't do that! What is Adobe Flash? Many of the websites you visit require flash to display the content properly, some websites even have flash animation and to play those you will need the flash player. One example of flash required content is Youtube videos, and Youtube videos that are embedded in other websites. Here is how to enable Flash on the NP12: cultofandroid.com/49840/install-flash-player-android-4-4-kitkat/ *** NOTE: the link to download the Flash Player Installer on the website above is broken. Here is an updated working link for the Flash Player installer: downloadandroidfiles.org/Files/Apps%20%28APK%29/KitKat.Adobe.Flash.Player.11.1.apk When you're all done installing Flash player you should head over to the official Adobe Flash website to test the flash player so that you know it works. On the website you should see a bouncing red box. If not just refresh the page and you should see it the second time around. The Adobe page to test your flash player is: adobe.com/software/flash/about/ PROTIP: ONCE YOU GET DOLPHIN BROWSER AND FLASH PLAYER UP AND RUNNING, DO NOT UPDATE DOLPHIN BROWSER IN THE PLAYSTORE UNLESS ABSOLUTELY NECESSARY. It's not worth taking the chance of the update messing up Flash Player. Amazon Instant Videos - yes, you can watch them on this tablet. Yes, you'll need Dolphin Browser and Adobe Flash installed and correctly configured. If you followed the aforementioned links you should be all set by now. Here is another link specifically dealing with streaming Amazon Instant Videos on KitKat 4.4, which is the tablet's operating system. This worked for me and it should work for you!: the-digital-reader.com/nates-reviews/stream-amazon-instant-videos-android-tablet/ *** UPDATED ABOVE Splashtop 2 Remote Desktop is a remote desktop app that you can download for free and in my opinion works better than the RemotePC app that comes preloaded on the tablet. This app allows you to access your PC or Mac's desktop environment so that you basically have your entire PC/Mac on your tablet. It takes advantage of the 12.2" screen by filling it entirely with your computer's desktop. This is useful since you can run all your PC/Mac programs right on your tablet. It works really well and is probably my favorite app that I have come across. There is an optional paid upgrade to the app that let's you remote into your PC/Mac from anywhere. For now I am just using the free version to access my Windows PC, very cool and worth the download. If you want to connect this tablet to your HDTV, a projector or anything with an HDMI input, you'll need a Samsung MHL HDMI adapter. This adapter allows you to mirror your tablet's screen onto your HDTV or other source. I also noticed that you can play a movie on the tablet (which also displays on your TV) and still use the tablet for other apps while the movie plays uninterrupted in the background of the tablet but still in full view on the TV. In other words you can play your movies or shows on the big screen while surfing the net or whatever apps you want to use on the tablet. This adapter is available here on Amazon but I chose to buy mine off of ebay. If the one sold on Amazon does not say "shipped and sold by Amazon" there is a good chance you will get a cheap knockoff adapter that will not work, just read the reviews from other buyers for yourself. If you want to try your luck the adapter can be found here: Samsung ET-H10FAUWESTA Micro USB to HDMI 1080P HDTV Adapter Cable for Samsung Galaxy S3/S4 and Note 2 - Retail Packaging - White. The seller I found on ebay is selling the legitimate Samsung adapters that have the hologram sticker and QR Code on the box indicating that it is a genuine Samsung product. Mine arrived promptly and works great. Save yourself the headache and get a genuine one here: ebay.com/itm/350958988570 Bluetooth is available so you can connect a BT mouse and keyboard if you like. Samsung makes an S Action mouse especially for the Note Pro and Tab Pro line of tablets and it works very well. The mouse buttons are specifically linked to actions such as Recent apps, Multi Window, Back, Menu, etc. I got to play with this mouse recently and it's fun although I opted not to buy it as I don't see me using the mouse enough. If you are interested its available at Best Buy (search ET-MP900DBEGUJ) and here on amazon: Samsung S Mouse for Tablets (ET-MP900DBEGUJ) You can also connect USB peripherals to the tablet using a USB OTG cable like this: Black Color Micro USB 3.0 9pin OTG Host Flash Disk Cable for Samsung Galaxy Note3 N900 N9000 10cm. This allows you to connect a number of USB devices such as a mouse, keyboard, flash drive, external hard drive, even a PlayStation 3 controller to the tablet with the ability to access them - If you have a lot of movies on your external hard drive you can use the OTG cable to connect it to the tablet to watch movies. Another example is if you have a thumb drive on your keys, you can connect it to the tablet to access all your files. You can connect more than one USB device at a time by using a USB hub along with the OTG cable mentioned before. I don't recommend connecting more than two devices at the same time in the USB hub, especially if its a device that sucks up a lot of power like an external hard drive. Netflix, Vudu, Amazon Kindle, Google Earth all work great, no issues. If you want to stream video from the tablet to your Smart TV (like Apple AirPlay) check out the AllCast application made for Android devices. Extending battery life is simple: disable the vibration/haptic feedback and keyboard/stylus sounds as it really isn't necessary. Keep the brightness on Auto and the volume at half or less. When you're done using it for a while make sure you close all your apps and clear your RAM before putting it to sleep. This will greatly increase the standby time of the tablet so that when you pick it up again later it will still have lots of juice left in it. This is the procedure I do: Swipe down the top of the screen Clear your notifications by pressing the X under the sound bar Swipe down the top of the screen again Turn OFF WiFi (important) Press the "Recent apps" soft button (to the left of the Home button) Select Close all Press the Recent apps button again (no apps should be running now) Select Task Manager Select RAM manager on the left Press "clear memory" - do this AFTER closing your apps Press Home button Press Recent apps button a third time Select Close all (this closes Task Manager) Press Power button to put it in Sleep/Standby mode Doing the above every time sounds tedious but after a few times its second nature and you will notice the battery percentage stays where you left it (or it only drops only 1 or 2%) even after an entire 24 hours in standby. If you notice when opening/closing apps and switching between apps there is a slight visual delay, this is normal. It is meant to make the transition between apps look "pretty" - Microsoft OS and Apple iOS have something similar. This will eliminate the windows animation and transition effects which in a nutshell will make the tablet run much quicker and smoother: Swipe down and go to Settings Go to the General tab Scroll down to About Device Press the greyed out Build Number section continuously, 5-7 times Developer Options is now enabled right above About Device Under Developer Options, scroll down to the Drawing section Set all three of these settings to OFF: "Window animation scale" "Transition animation scale" "Animator duration scale" You'll notice how snappy it runs after doing that. Another available on/off toggle is the double-press Home button to bring up S-Voice. When you press the Home button twice quickly it will launch S-Voice. If you'd rather have this feature disabled read on: Launch S-Voice by pressing the Home button twice (or via apps > Samsung folder > S-voice) At the top right press the 3 small squares then select Settings On the next window the second row down is the "Open via the Home key" on/off toggle. Some apps that my girlfriend has recommended for drawing/painting/art. She is a graphic designer and bought her own Note Pro 12.2 soon after playing with mine so I will take her word for it. I have seen some of the pieces capable with this tablet and it is impressive. Also note that the files you create and save in some of these apps can be opened and worked on in Adobe Illustrator and Photoshop: Artflow Infinite Painter Sketchbook Pro Sketchbook for Galaxy (included with the tablet) Fresco Paint Pro Paperless Onto the accessories: I am using the Samsung 12.2" Book Cover with my NP12. It is well made, very sleek looking and does protect the back and corners well. The stand works well in both positions. Don't plan on taking too many pictures with this cover though because when opened the cover wraps around the back and blocks the camera lens. That is OK though because you should never be using a tablet to take pictures. Ever. Stop. No.... don't do it. You look ridiculous. I'm pretty sure there is a law against this - ipadisnotacamera.com - I rest my case. Anyway, I bought mine at the local Best Buy and you can find it on bestbuy.com - search for "samsung book cover 12.2" The IVSO Slim Smart Cover Case is nice too and protects the tablet a little better than the Samsung Book Cover but I find the IVSO's stand to be a little more flimsy and I don't like that it makes the tablet sit upside down (home button on top) when using it in your lap. On the plus side the camera lens isn't covered with this cover and there's a thin magnet that holds the cover to the tablet so it doesn't flop around. IVSO Samsung Galaxy Note Pro 12.2/Tab pro 12.2 Ultra Lightweight Slim Smart Cover Case with Auto Sleep/Wake Function-will only fit Samsung Galaxy Note Pro 12.2/Tab pro 12.2 Tablet (Black) I picked up this Galaxy Note Genuine Wacom Touch Pen 8pi Stylus (ET-S200EBEG) - Black. It is about the same thickness as a regular pen so its easier on your hand and it has an eraser on the end, too. Here's a tip: go to settings > controls > S-pen and at the top make sure "Turn off pen detection" is not checked, now you can use a second stylus without needing to remove the factory stylus from the tablet. If you're going to be using the S-pen a lot this is a very good investment. If you're looking for a great sleeve, check out the Merkury Innovations 12-Inch Solid Zipper Sleeve, Black/Blue (M-LL1090). It fits this tablet perfectly, even with the Samsung book cover on it and the quality is top notch. I love the fit and finish of this sleeve and how well it fits this particular tablet, the side pocket for the charge cable/external HDD is a bonus. Need a longer USB 3.0 charge cable? I bought one of these here on Amazon: Cable Matters SuperSpeed USB 3.0 Type A to Micro-B Cable in Blue 10 Feet. It works just as well as the factory cable for charging and is available in 6, 10 and 15 feet - useful if the nearest wall plug is too far away. I like choices so I bought another sleeve, this one has a handle though: Evecase 10.6~12 inch Tablet, Netbooks Ultraportable Neoprene Zipper Carrying Case with Dual Hidden Pocket & Handle - Black/ Red. This sleeve is great because of the handles so I can throw in the tablet and carry it around just like that. If using my backpack I'll use the other sleeve mentioned above inside my bag. This sleeve is just as nice as the other and also has a side pocket. The black/red color looks sharp. Last but not least: the User Manual. This has TONS of useful information about the tablet. Definitely worth it to look through this, in fact most questions that people keep asking about the tablet can be answered just by looking in the manual. The direct link to the PDF is: downloadcenter.samsung.com/content/UM/201403/20140312030602240/GEN_SM-P900_Galaxy_Tab_PRO_KK_English_User_Manual_NAE_F5.pdf If you have any questions just post a comment and I'll get back to you.
RU6XKJQEKQ6ZB	5.0	Not too big. great functionality for someone on the road or just at home	449	485	First let me say that I am an ex-apple customer that demanded more options and over time have switched from the iPhone to Note 3 and from an ipad to the Surface Pro and now to the Galaxy Note Pro 12.2. For those that have a Galaxy note phone, the Note Pro tablet is more than just a larger version of the same. The functionality is greater, the multitasking is excellent, and from both a personal and business use, it has met my every expectation. I will address a few areas of concern I had before purchase to try and help anyone that may be feeling the way I was. SIZE - my job has me on the road marketing every day. This tablet, to me, is portable enough to not be a problem or feel like I am lugging around something uncomfortable. I carry a clip board with a few forms and the note pro 12.2 in a slip cover case (until I can get my hands on the book cover case which is back ordered). When I read the initial reviews I was terrified to buy this thing sight unseen as many talked about it being too big or too heavy. It is not too big and the weight feels light next to the surface pro. If you are looking for a large tablet screen...This is for you. Also the large screen allows for the full size keyboard which is nice (this review is being typed on the screen) MAGAZINE UX - the difference in the updated interface had me scared. Is it like the new windows tiles? Is it completely different than what I am used to on the Galaxy Note 3 phone? Here is the deal. The Magazine UX is not bad. I am not a fan of the new Microsoft Windows tiles which i had on the Surface Pro. The UI is different and better in my opinion. You can set up your tablet to look almost identical to your Samsung Note phone if that is what you want. There has to be at least one screen of the magazine but to be honest, I have it set up to where I really like it. When I hit the button to go to the home screen it takes me to the same setup I have on my phone which is folders with different apps, time and weather, etc. 32 or 64 GB - Being an early adpoter, the 64gb model was not available on release. It is unlikely that you will be using this tablet to shoot a ton of HD video (which is the biggest storage hog) and with an addition of a 64GB sd card that you can swap out if needed, 32gb device memory works just fine. WiFi or 4G - that depends on your needs but if you can use your phone as a hotspot, why pay extra for a cell radio in the device plus pay more to add it to a cell plan? The answer for my needs was WiFi works perfect and tethers with my phone on the road. I can't say much about the battery life yet as I have only charged it once. It had about 50% charge when I pulled it out of the box after purchase. CONCLUSION - I was disappointed in the info updates on this device before the US release. All of the videos and reports were all from CES and there were no new hands on reports. I can tell you that this device has a premium price but also carries premium capabilities. If you just want to update a status on Facebook and play candycrush...This isn't for you unless you want to spend the money for the screen size. If you want a nice big screen to multitask and you are either familiar with Android or are willing to educate yourself on the possibilities. You will be impressed.
R17NYA6EFL1HHZ	5.0	Great Tablet! Well worth the price! w/ Bonus Zagg keyboard review!	207	223	April 2014 Update: It has been over a month of usage with the 12.2 and I have to admit my experience has become more and more positive. Things I am liking is the the stellar battery life, the way apps update to work better with 4.4os, the physical durability, the screen sharpness/brightness, and the internet browsing experience. I love the multi-day battery power you get. I realize even with the screen up at 80% brightness watching Netflix all day the battery still takes forever to drain. The only App that seemed to cause Battery drain and slow recharging issue was Word-With-Friends, Blurb Checkout and HP Printer Services. Once you delete those and make use of Task Manager to curb background activity everything improves. Also keeping GPS Antenna off helps. Apps seem to be updating regularly to work smoother on 4.4os and to also support the multi window display. So now more apps can be swiped into new windows when in multi window mode. By contrast I have some popular apps on my iPhone that are only now updating to work better with last year's iOS 7. I'm finding the feel of the tablet is great for daily use. I love the faux leather back and aluminum trim. You don't necessarily need a case like you would on a scratch prone-metallic iPad. The faux leather is soft to the touch and easy to grip, yet handles daily commuting bangs and scrapes well without any signs of wear and tear. One thing I didn't expect though is the faux leather to be a bit of a fingerprint magnet. I tried using other tablets since and I now realize how great the screen is. Apart from the slight fuzziness you get when importing low res pics (or browsing the Facebook/eBay app) hi res images, websites, eBook/Adobe Reader text and colors always look sharp and bright. I even take back what I said about the 10.1 Note Pro's screen being possibly better. The 12.2 has a greater overall screen. As long as you stay away from your Grandma's 1 MP pictures and don't expect extra sharp images on the Facebook and eBay app then this tablet will make you happy. Actually if you use Chrome or Internet to browse Facebook and eBay then the images will look sharper and will scaled up properly. This brings me to my final impressions for the 12.2: the web browsing experience is second to none! I'm very impressed with the consistent speed and page layout of the Chrome and Google web apps. This is the first tablet where I can say anyone's dependence on a full desktop PC or laptop PC for internet service will be greatly diminished. The only thorn in this positive is that Adobe Flash material may or may not load on certain websites but there is an Android Flash Player Patch (not found in the Play Store/Samsung Apps) that you can install thru your web browser to work around this. Positives aside there have been some hiccups with the 12.2. There is a slight interface delay when using the S pen on certain Apps. It works perfectly in S pen native apps. I don't find myself using S pen for what I normally do on my 12.2 so it is no bother to me but still worth mentioning. Also the Android/Samsung Gallery and File Management system can get cluttered if you tend to store a lot of documents and media. Add in the automatic Facebook/Dropbox media file import and you may find yourself lost in a sea of tricky to remove pictures that you never put on the tablet. However, coming from an iOs iPhone background where Apple manipulates your media and files for you I am actually enjoying the freedom of organizing my images and files the way I want. I don't really use the Magazine but, unlike widgets, the Magazine tiles do not auto update. [Yeah I checked with Samsung and] there is 'currently' no method for selecting them to automatically refresh with new stories and content. It has to be manually done for each tile every time you open Magazine. This actually defeats the purpose of what the feature is about-quickly exposing users to new and newsworthy information. On the flip side I'm curious how battery life will fare once Samsung allows background auto update of all those tiles. Over all I think anyone who chose this tablet will be happy. I still don't know why haters still rate the 12.2 low because Samsung wants $750 for a 12.2 tablet when in the same stores Samsung smart phones, iPads and iPhones sell for $799+. --------------------------------------------------------------------------- I originally wanted the Note 2014 10.1 for the holidays but decided to wait when the 12.2 appeared at CES. After seeing a some early online gadget reviews and videos I went into Best Buy held both the 10.1 and 12.2 and in under 3 minutes I was sold on the Pro 12.2. What I realized after looking at the 32GB 10.1 at $549 versus the 32GB 12.2 at $749 that for $200 more you get a lot more than just a bigger screen. After a month of daily use with my 12.2 did I make a good choice? Let's find out. PROS: The battery life is very good. I usually go all day with usage (bluetooth, wifi, apps running, screen brightness at 60%-auto brightness set to OFF) and never go below 70%. Some applications (words with friends) and widgets will drain battery quicker even if you are not directly using them so it is best to use the task and application manager in settings to see what is using your battery the most. Also GPS antenna being on all the time will kill the battery life. The first thing you will notice is that the 12.2 handles multiple tasks VERY quickly. It is not as quick as my iPhone 5S when opening heavy apps quickly like Facebook but because 99% of the time I am doing more than one thing at a time on my 12.2 any lag (minute as it maybe) is completely forgivable. In fact other than Facebook I have not notice anything to be slow on this tablet. YouTube, GPS-Google Earth/Maps, Web Browsing, Hancom Office, etc... can all run at 100% at the same time on the same screen. I had absolutely no issues with stuttering, slowing down or freezing up like some review videos have griped about. The software has updated since I took it out the box so that maybe the trick. Make sure to update your software and all the apps to get the most up to date performance. This tablet also doesn't get hot in my lap at all when doing heavy work. Netflix, YouTube, streaming Video playback is smooth, crystal clear Hi-Def and looks amazing! Images and text on Mazagine, web browsing apps, FlipBoard, emails, Adobe Reader, games, ebooks, etc... all look crisp, bright and vividly sharp. The Chrome and Internet web browsing apps are blazing fast! I never liked web browsing apps on tablets and mobile phones because it was never as quick and natural as internet on a PC or notebook but here it feels natural and fast. Pages are full pages and not scaled down versions of websites. I also found the Android web browsing experience MUCH BETTER than Safari on the iOS iPads. Contrary to some early YouTube reviews I did NOT find this tablet to be too big or heavy for one handed use. The average adult should be able to hold and use it in one hand while standing. The Note Pro 12.2 is the around the same weight as my fiancé's pre-iPadAir iPad3. At 8"+ by 11"+ inches the 12.2 Note Pro shares the same footprint as a standard sheet of paper (Go to your printer or office desk and pick up a standard sheet of paper it is the same size as this 12.2 tablet) If you are coming from an older 8" or 10" tablet then the Note Pro 12.2 will 'feel' much bigger but not much heavier. However if you are like me and never owned a tablet before while coming from a Laptop/Desktop background then the 12.2. will feel portable enough. In fact this tablet paired with the Zagg keyboard has already completely replaced my premium Lenovo Thinkpad for on the go Laptop duties. Speaking of Laptop PC Windows duties the Productivity Software that comes with the 12.2 is not the usual bloatware you would expect. In fact apps like Hancom and WebEx puts Windows Office to shame. These productivity apps put the practicality of Microsoft Office onto a tablet environment where everything runs smoothly (especially with S pen) without the costs, drama and high maintenance that you get with Microsoft Office products. I mainly use Hancom Word for writing (novels) and in this regard I recommend an external keyboard (like the Zagg) since the touch screen keyboard can hog some screen real estate. Other obvious positives to a 12.2 screen is that you can do way more stuff at the same time and this tablet does the job of multi windows smoothly. My only prior tablet experience was my fiance's iPad3 which interfaces only one app at a time. Having the 12.2 do more than one thing at once took some getting used to. It is fun and most practical. CONS: The 12.2 screen carries over the resolution from the Note 10.1's screen (why?). This means in apps like Gallery, Facebook and eBay some lower resolution photos do not scale up properly to fill out the bigger screen. Instead the Note Pro 12.2 tablet stretches out and enlarges images in these apps which results in slight fuzziness on those pictures. So if you are importing old school throw back pictures that you scanned back in 1999 then don't expect those pictures to look good here. On the Note 10.1 the same images would come out sharper on the smaller screen space. Lisa Gade at MobileTech has a brilliant side by side 10.1 vs 12.2 video on YouTube that shows this effect better than I can describe here. In fact isn't a big issue and most people who use hi-res pictures won't notice it at all. That said the camera on this tablet like all current tablets is quite crap compared to most high end cell phones or digital cameras. The camera app does have all the latest and greatest features but dont expect wallpaper quality pics. Use your samsung cell phone instead and import pics. Regarding the screen resolution I will add other apps like Adobe, Chrome/Internet, NYTimes and Magazine/FlipBoard have very crisp and clear images that are on par or better than iPad Retina's images so don't let that turn you away. My next gripe is in the box this tablet comes with nothing. You get the usb cord/charger combo, a quick start leaflet, and replacement nibs for the S pen. For $800 spent here I would have at least wanted headphones and a proper manual booklet (you have to search Samsung's website for the 178 page SM-P905 pdf file). Funny enough the SM-P905 manual when you download it has A LOT of important information regarding how to use this tablet properly. For one it warns "do not use a screen protector on the Note Tablets" which is the first thing we all buy for it. I use one anyway and it works fine with the S pen and touch controls. Be careful with where you buy your MicroSD card from. I went thru two eBay-bought SanDisk and Samsung 64gb cards, most likely fakes, that didn't work, died and almost damaged my Pro 12.2. I finally purchased the genuine Samsung 64gb card at my local Best Buy and it worked. Funny enough the eBay cards worked perfectly on my PC laptop and BlackBerry smartphone so that means the MicroSD card drive on this 12.2 tablet is very picky. One of my early concerns was out the box it took this tablet about 5 hours to charge from 70% to 100% and much longer to charge from near empty. To improve charging time I went to Application Manager->Running and deleted hidden background Bloatware like HP printer services, Blurb Checkout, etc... On kitkat 4.4 these sneaky apps remain running even after you sleep the tablet suckling away at your battery. With bloatware gone I now go from 30% to a full 100% charge in 4 to 5 hours. If that is too long for you then the LTE note pro 12.2 tablets will have batteries that charge much quicker but the downside is they will also die quicker than non-LTE models. Price was not a con for me but Samsung releases so many similarly priced tablets with a range of features that it makes me wonder if the Tab Pro 12.2 didn't come out at the same time would the retail for the Note Pro 12.2 be cheaper. Other than these minor gripes this is a perfect tablet for everyday use BONUS! ZAGG KEYBOARD REVIEW: I love that this keyboard works flawlessly, protects the screen when closed has great battery life. Also with its own leather stitched back matching the 12.2 perfectly BUT the Zagg is not a good case holder. The 12.2 lays facedown into the Zagg and clips on but the connection is loose and the tablet moves up and down inside the clips. Also removal of the tablet (as per the instructions) requires you to yank it up snapping back the holding clips open to release the tablet. The issue here is those clips are springy but still plastic and look like they will eventually break (I assume) from continued removal of the tablet. Furthermore you can easily pry or yank the wrong way and send the tablet flying across the room since it requires a bit of force. Also at $99.99 it is an overpriced keyboard accessory that isn't backlit and looks like it will wear out in a year or two of use. Already my spacebar key needs to be pressed really hard and in the center to work. I only purchased it at the time I picked my 12.2 because I needed a case right away and the only other case in Best Buy was $59.99. So I did the math and figured $40 bucks more got me a keyboard and stand for the 12.2. It does make a great stand. In retrospect I probably would not have bought the Zagg for more than $50 but I didn't really have a choice. It does its job with minimal features for the most part but do not expect a drop-proof case/holder or stellar desktop keyboard performance out of this product.
R1WWQDXQOZ5LER	5.0	Lovin' my BAT	84	88	Samsung proves what the ladies already know, bigger is better. The 12.2 Note Pro has received a high percentage of positive reviews on Amazon and other sites. Most professional reviewers, on the other hand, seem to think it's just too big, too heavy, too expensive, or it's just not an Ipad. Fortunately for me I've learned to take professional reviewers with a huge grain of salt. Thanks to reviews from real users on Amazon and balanced reviewers like Lisa Gade from Mobile Tech Review I was able to take the plunge when the Note Pro went on sale. And I love my Big Ass Tablet (BAT)! I wasn't sure I could use that word in my review, but I did a search on Amazon and found out they sell this Liquid Ass So I guess it's okay. I've owned two 7" tablets, two 10.1" tablets, and I recently bought the Samsung Tab Pro 8.4". The 7" were good for portability outside the home, while the 10.1" were good for couch surfing. I thought the 8.4" would be a good compromise between the two that I could use for both, and consolidate down to one size. However, after using the 8.4" for about a month I discovered my eyes aren't up to looking at a small screen for several hours at a time (reading/web surfing on the couch is my favorite past-time). The 8.4" has great resolution, but text is still small and zooming in just means you have to scroll a lot. However, it is great for portability outside the home, and I will keep it for that. The 12.2" offers no compromise web browsing and document reading. Rarely do I feel the need to zoom in. I'm also addicted to the S-pen. Love to use it for navigating my tablet, and I also prefer to write with the pen then use the onscreen keyboard. I guess it's Samsung tablets for me, until the competition catches up and produces something better. Some people might complain that the tablet is too heavy for couch surfing. I've solved this problem by modifying the Aerb case I got for the tablet Aerb® X Pro Series Samsung Galaxy Note Pro 12.2 Leather Case Cover Stand W Sleep Wake Function (for Galaxy Note Pro 12.2, Black Style A) By folding the cover of the case and securing it with four small binding clips I can use the case to prop up the tablet while lying on the couch or in bed. Weight becomes a non-issue and I only need to lightly grasp the tablet with one hand to keep it from tipping over. I've included some pictures to show you how I modified the case, for those that are interested (see the user supplied picture area of this product's web page). So far I'm completely satisfied with this tablet. My only complaint is the price. Wish it was more like $600. Even so, I'm livin' large and lovin' it! I hope Samsung keeps making this size tablet. It's ideal for my use.
R1T5127OGNXDR9	5.0	Simply Amazing!!	158	178	For those who've rated this low because of the size, do your research. Duh! It's a 12.2 inch tablet. I'm a cartoonist and really wasn't excited about spending $2K on a Wacom Cintiq Companion. I spent all of yesterday using this Samsung for drawing, web browsing, and calendar use. This is the best tablet I've ever owned and will use it daily. Yes, it's a bit on the expensive side, but for me, with how much I will be using it, and what I'll be using it for, I can more than justify the cost.

As we could see the reviews are not properly formatted like a news article, few decisions were made for data preparation as mentioned below.

Key Decisions in Data Preperation

  • Since the summarization task is extractive(not abstractive) from the original review file, only sentences with number of words between 10 and 30 are considered to avoid long story lines written by users in the final summary.
  • Only english alphabets are considered in the summarization process. All special characters and numbers are ignored in both methods implemented in this project.
  • Stopwords in english are ignored and all other words are lemmatized.

Data Preparation

  • A review file of a product is collected into a reviewRDD as key(review_id), value(full_review) pair.
  • Each review in the reviewRDD is comprised of multiple sentences, so each review is further split into individual sentences, let's call sentenceRDD. Each sentence is identified by a unique key.(Here few sentences are removed by following the key decisions described above)
  • Process(remove stopwords, lemmatize) the sentenceRDD to extract only words that are meaningful. Here words with less than four letters are ignored with a heuristic that more meaning can be captured by longer words than shorter words.
  • The processed wordlist from the sentenceRDD replaces its sentences with a list of words to form wordlistRDD. So the id(key) of each sentence is preserved to its corresponding wordlist, which is used to finally extract the summary sentences after performing the summarization techniques.

Summarization

Latent Semantic Analysis

After preparing the data as mentioned above, full vocabulary/rowheader of the review file is obtained from the values() of wordlistRDD and sentence_ids/columnheader from the keys() of wordlistRDD. From the wordlist obtained by data preparation, term-frequency vector for each selected sentence is computed and then using the term-frequency matrix the document-frequency( here document means sentences in the review file) vector is computed. Inverse Document Frequency is then computed using the document-frequency vector and total number of sentences. TF-IDF matrix is computed by multiplying TF matrix with IDF vector using numpy. Here the rows and columns of the TF-IDF matrix are words and sentences respectively.

After computing the TF-IDF matrix, factorized the matrix by Singular Value Decomposition (using numpy) and collected the key sentences from the right singular matrix. The matrix decomposition resulted three matrices U, S and V-Transpose. U is the left singular vector matrix, S is the diagonal matrix of non-negative singular values sorted in descending order and V-Transpose is the right singular vector matrix. Latent Semantic Analysis summarization method chooses k concepts from the right singular matrix and in each concept(row vector in V-Transpose) selects the sentence with largest value to the summary. This method is suitable when we know the number of topics/concepts in a given corpus of documents. But in the review summarization task, since there is no predefined set of concepts, i've chosen 10 concepts with top 5 sentences each. So the concepts captured by LSA in final review can be interpreted clearly.

Sample Output by LSA Summarizer for "Samsung Galaxy Note Pro 12.2"

The following is the sample output which contains 5 sentences on each concept/topic in the review. There are obtained from the right singular vector matrix. Here,

  • Concept X clearly shows the positive feedback about Hancom Office that comes with this tablet
  • Concept Y describes about the screen of the tablet
  • Concept Z Captured sentences that described the keyboard and battery performance
    Concept X
    [Sentence 1] :	[u" Combined with any Bluetooth Keyboard and the Hancom Officer suite (it's free!), it's almost exactly like your Microsoft Office at home"]
    [Sentence 2] :	[u'2 is first and foremost a pseudo-laptop replacement computer, especially when you have specialized office apps like the (full version of) Hancom Office']
    [Sentence 3] :	[u' - The Hancom office app is only a viewer, but you can use OfficeSuite, WPS Office or QuickOfficeHD to have a near 95% Microsoft compatible experience']
    [Sentence 4] :	[u' The Hancom office is very useful, I now have no need to get Microsoft Office 2013 for my laptop']
    [Sentence 5] :	[u' Samsungs free office suite is just like working with Microsoft office']
    
    Concept Y
    [Sentence 1] :	[u' Pros: Large screen Fast computing processing speeds / large amount of Ram Beautiful looking screen with large resolution Easy setup with secure packaging Highly customizable with smooth operation']
    [Sentence 2] :	[u'2: 1) the large screen allows for effective split window use; 2) the large screen is great for reading technical textbooks']
    [Sentence 3] :	[u" Apps do not transition well from full screen to partial screen, and most are nearly useless in quarter-screen, so don't put too much faith in that multi-app functionality"]
    [Sentence 4] :	[u" The stylus is incredibly useful to write notes and the fact that the screen is so large really makes you feel like you're just using an actual notepad"]
    [Sentence 5] :	[u' Second, the on-screen keyboard keeps popping up! The BT keyboard is connected and working, but any pen-clicking on the screen slides up the on-screen keyboard as well']
    
    Concept Z
    [Sentence 1] :	[u' Battery life: At first average battery life between charges was about four hours']
    [Sentence 2] :	[u" I've read however that those using the Logitech Bluetooth keyboard are now experiencing issues with their Logitech keyboards after updating to lollipop"]
    [Sentence 3] :	[u' Second, the on-screen keyboard keeps popping up! The BT keyboard is connected and working, but any pen-clicking on the screen slides up the on-screen keyboard as well']
    [Sentence 4] :	[u' Now my Logitech Pro keyboard no longer works and the Samsung screen keyboard works randomly']
    [Sentence 5] :	[u" - Battery life and charging times could be better if you're used to an older Transformer that had a battery in the keyboard dock"]    

Note: Few concepts showed redundant information, so selected only three concepts randomly(without following any order) for this example. Here X,Y and Z represents 5th, 3rd and 7th concepts respectively in the actual output.

We can use LSA to get the important words instead of sentences by doing SVD on transpose matrix of tf-idf matrix. I've added instruction below to choose between words/sentences while executing the program.

TextRank

TextRank is a graph based summarization algorithm and this starts with the same data preparation steps as LSA till producing the wordlistRDD. Here each wordlist represent a vertex of the graph. To add edges to the vertices, i've created graphRDD which takes wordlist(vertex) and all other vertices(all sentences from wordlistRDD) as input and creates adjacency list of the vertex based on the similarity between itself to all other sentences(vertices). If two vertices don't share any words then there won't be an edge between them and if they share some common words, then the edge between them will have a weight equivalent to the similarity between them. The similarity score is computed by the below formula.
similarity formula
Note: In my implementation, I've added a smoothing factor of 1 in the denominator to avoid divide by zero case.

After constructing the graph, implemented the TextRank(modified version of PageRank) iterative algorithm to compute the rank of each vertex. The top ‘k’ ranked sentences are then selected and added to the final summary of the reviews. The TextRank algorithm is formulated from PageRank and the mathematical expression is TextRank Formula
Image Source: TextRank: Bringing Order into Texts by Rada Mihalcea and Paul Tarau

Sample Output of TextRank

Example 1: Samsung Galaxy Note Pro 12.2
Rank: 2.3602	Sentence : [u' I like Android very much, but if Apple ever makes a tablet this size with a retina+ screen']
Rank: 2.3573	Sentence : [u' Samsung makes an S Action mouse especially for the Note Pro and Tab Pro line of tablets and it works very well']
Rank: 2.26      Sentence : [u'only had the tablet an hour now but I love the full screen keyboard and the new tile system is fantastic, I really like it']
Rank: 2.2016	Sentence : [u' As with all Galaxy Note products, this tablet comes with the S-Pen which works very well']
Rank: 2.199		Sentence : [u' I\'m tired of "sacrificing" size with a tablet and it\'s time to be \'normal\'! This is where the Samsung Galaxy Note Pro (SGNP) 12']
Rank: 2.1748	Sentence : [u" IOS has nothing on android and the iPad is a play toy compared to samsung's line of note tablets, the note 8, the note 10"]
Rank: 2.1524	Sentence : [u" I really liked the Yoga 2 laptops except the windows tablet apps don't handle the resolution well and the screen is too narrow in portrait mode"]
Rank: 2.1514	Sentence : [u" I've been waiting a long time for a tablet like this to be released: a large screen, expandable memory and the freedom of Android"]
Rank: 2.1503	Sentence : [u'I love this tablet! It has a huge screen and I am able to use it like a laptop']
Rank: 2.1501	Sentence : [u' On a positive note, it functions well as a keyboard, it matches the tablet nice and battery life is very good']

Example 2: Fire HD 8, 8" HD Display, Wi-Fi, 8 GB, Black
Rank: 2.4671	Sentence : [u" I also purchased to kindle fire children's tablets, I think they look good , case seems strong and great warranty"]
Rank: 2.4379	Sentence : [u" It's about TIME Amazon did this with their Kindle Fire devices! NOT SO GOOD: - Mediatek processor"]
Rank: 2.3362	Sentence : [u'I really wanted to like this tablet since my very old Kindle Fire needed replacing']
Rank: 2.3126	Sentence : [u' You buy a Kindle Fire for Amazon content & their ecosystem, and this is a darn good device for the price']
Rank: 2.2486	Sentence : [u' Had Amazon allowed the Google Play services to be integrated better, this tablet would have been a great device, especially for its low price tag']
Rank: 2.2433	Sentence : [u' I like having a tablet dedicated to the Amazon ecosystem, so I think I would stick with a Fire tablet vs']
Rank: 2.2399	Sentence : [u"I like everything about the Fire HD8, except you can't put apps from other than Amazon, on the Kindle, especially one called Xfinity Connect,which allows me to"]
Rank: 2.2325	Sentence : [u' Amazon owes it too the loyal Fire following that purchases apps and books constantly to make a good tablet with which to enjoy those apps and books']
Rank: 2.1829	Sentence : [u' I especially like the light weight of the new Kindle Fire as it is easy to hold when reading e-books or websites']
Rank: 2.0988	Sentence : [u' I am an Amazon user and fan but I was sadly let down with this device and would gladly purchase a quality tablet like my old HDX']

##Demo

Tip: I used a tool called Byzanz to record the following GIFs

LSA in Action

Summarization using Latent Semantic Analysis is shown below. Note the similarity between sentences in each concept. Summarization using LSA

TextRank in Action

Summary sentences using TextRank

Observations

  • Initially i tried running LSA on all sentences(without filtering >30 and <10). It surprisingly resulted sentences with very few words in the first concept which didn't give any meaning at all.
  • Then i picked sentences with words more than 10 words, this resulted with same long sentences appearing in multiple concepts(or rows) because longer sentences have more words and that can represent multiple concepts.
  • Negative reviews are mostly user specific, so they didn't show up much in the top concepts. Also the negative reviews are comparatively very less in number than positive reviews.
  • Text rank is pretty stable in this and always prefers sentences with more words and have more edges.
  • TextRank and LSA are not so good choice for text summarization, but these can be used as a step 1 for summarization task.

Evaluation

  • Since the evaluation of the text summary involves comparing many human written summaries with the machine generated summary by checking unigram co-occurrences, i've not evaluated these two methods.
  • Also the commonly used tool for evaluating summary 'ROUGE' was written in Perl script and limited time didn't allow me to explore the usage of the tool for this task.

Installation

Dependencies

  • Python 2.6 or 2.7(not tested in 3.x)
  • Install pip
  • Numpy(version >1.4)
  • NLTK Library
  • Apache Spark

Install Numpy and NLTK

sudo pip install -U numpy
sudo pip install -U nltk

Download Stopwords from nltk data source

 #Pythonic way
 import nltk
 nltk.download('all-corpora')
     (or)
 #Command line way       
 python -m nltk.downloader all-corpora

Execution Instruction

Summarization using LSA

$ spark-submit lsa.py -s <inputfile>
    (or)
$ spark-submit lsa.py -w <inputfile>

Note: -s and -w flag returns 'review key sentences' and review 'key words' respectively.
Example:
    $ spark-submit lsa.py -s hdfs-path-/dataset/reviews/B00HWMPSK6.txt

Summarization using TextRank

$ spark-submit textrank.py <iter-count> <summary-sent-count> <inputfile>

Iter-count  -   No of iterations for textrank algorithm. 10 iterations give results close to converged state
summary-sent-count - No of summary sentences as the final result.

Example: 
    $ spark-submit textrank.py 10 10 hdfs-path/data/reviews/B00HWMPSK6.txt

Output Interpretation

Outputs of the above execution will be saved to a text file in the following directory as well as printed on the console(Disable logs to get clear output).

yourHDFShome/output-lsa/
yourHDFShome/output-textrank/

Example:
    $ hadoop fs -cat output-lsa/*
    $ hadoop fs -cat output-textrank/*

Note: Delete the output folders if you would like to check again with different input file.

Source Code

Not available for now

References

  1. Y. Gong, X. Liu: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. Proceedings of the 24 th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, United States 2001, pp. 19-25
  2. Makbule Gulcin Ozsoy, Ferda Nur Alpaslan and Ilyas Cicekli. Text summarization using Latent Semantic Analysis. Journal of Information Science archive, Volume 37 Issue 4, August 2011, Pages 405-417.
  3. Josef Steinberger and Karel Ježek. Using Latent Semantic Analysis in Text Summarization and Summary Evaluation(2004), In Proc. ISIM ’04.
  4. R. Mihalcea and P. Tarau. TextRank - bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, 2004.