fabianvf/python-rake

All phrases scored as 1.0?

Closed this issue · 11 comments

Python 3.6 venv >> pip install python-rake

import RAKE
Rake = RAKE.Rake(RAKE.SmartStopList())

text="The initiating oncogenic event in almost half of human lung adenocarcinomas is still unknown, a fact that complicates the development of selective targeted therapies. Yet these tumours harbour a number of alterations without obvious oncogenic function including BRAF-inactivating mutations. Researchers at the Spanish National Cancer Research Centre (CNIO) have demonstrated that the expression of an endogenous Braf (D631A) kinase-inactive isoform in mice (corresponding to the human BRAF(D594A) mutation) triggers lung adenocarcinoma in vivo, indicating that BRAF-inactivating mutations are initiating events in lung oncogenesis. The paper, published in Nature, indicates that the signal intensity of the MAPK pathway is a critical determinant not only in tumour development, but also in dictating the nature of the cancer-initiating cell and ultimately the resulting tumour phenotype."

Rake.run(text)
[('mapk pathway', 1.0), ('still unknown', 1.0), ('initiating oncogenic event', 1.0), ('spanish national cancer research centre', 1.0), ('demonstrated', 1.0), ('dictating', 1.0), ('kinase-inactive isoform', 1.0), ('ultimately', 1.0), ('selective targeted therapies', 1.0), ('vivo', 1.0), ('researchers', 1.0), ('development', 1.0), ('tumours harbour', 1.0), ('yet', 1.0), ('mice', 1.0), ('braf-inactivating mutations', 1.0), ('tumour development', 1.0), ('expression', 1.0), ('indicating', 1.0), ('cancer-initiating cell', 1.0), ('triggers lung adenocarcinoma', 1.0), ('signal intensity', 1.0), ('critical determinant', 1.0), ('resulting tumour phenotype', 1.0), ('complicates', 1.0), ('corresponding', 1.0), ('lung oncogenesis', 1.0), ('human lung adenocarcinomas', 1.0), ('paper', 1.0), ('mutation', 1.0), ('published', 1.0), ('cnio', 1.0), ('d594a', 1.0), ('number', 1.0), ('initiating events', 1.0), ('d631a', 1.0), ('fact', 1.0), ('endogenous braf', 1.0), ('nature', 1.0), ('indicates', 1.0), ('almost half', 1.0), ('also', 1.0), ('human braf', 1.0)]

All scored as 1.0 (also on imported text, from a file); am I doing something wrong? Thanks ...


Additional tests:

text="Halifax, an Atlantic Ocean port in eastern Canada, is the provincial capital of Nova Scotia. A major business centre, it’s also known for its maritime history. The city’s dominated by the hilltop Citadel, a star-shaped fort completed in the 1850s. Waterfront warehouses known as the Historic Properties recall Halifax’s days as a trading hub for privateers, notably during the War of 1812. Halifax, legally known as the Halifax Regional Municipality (HRM), is the capital of the province of Nova Scotia, Canada. The municipality had a population of 403,131 in 2016, with 316,701 in the urban area centred on Halifax Harbour. The regional municipality consists of four former municipalities that were amalgamated in 1996: Halifax, Dartmouth, Bedford, and the Municipality of Halifax County. Halifax is a major economic centre in Atlantic Canada with a large concentration of government services and private sector companies. Major employers and economic generators include the Department of National Defence, Dalhousie University, Saint Mary's University, the Halifax Shipyard, various levels of government, and the Port of Halifax. Agriculture, fishing, mining, forestry and natural gas extraction are major resource industries found in the rural areas of the municipality. Halifax was ranked by MoneySense magazine as the fourth best place to live in Canada for 2012, placed first on a list of 'large cities by quality of life' and placed second in a list of 'large cities of the future', both conducted by fDi Magazine for North and South American cities. Additionally, Halifax has consistently placed in the top 10 for business friendliness of North and South American cities, as conducted by fDi Magazine. For a city with more pubs and clubs per capita than almost any city in Canada, it’s fitting that our most famous brewmaster was also our mayor. Three times. Alexander Keith’s original 1820 brewery welcomes visitors with costumed guides, stories and, of course, good ale. Walk across the street from Keith’s Brewery to the Halifax waterfront boardwalk that follows the water’s edge alongside the world’s second largest ice-free harbour. Stretching from the Canadian Museum of Immigration at Pier 21 – the gateway into Canada for over one million immigrants – to Casino Nova Scotia, you’ll pass unique shops, restaurants, and in the warmer months, graceful tall ships. Hop aboard the ferry, North America's longest running saltwater ferry, in fact, and cross the harbour to the Dartmouth side which is filled with more locally-owned shops, galleries, cafés, restaurants, and pubs. A visit to Halifax is not complete without trying the fabled donair, the offical food of Halifax. Become a soldier for a day at Halifax Citadel National Historic Site. Visit a 200-year-old restored fishing village at Fisherman’s Cove. Hear captivating sea stories from small to the Titanic at the Maritime Museum of the Atlantic. Discover the stories of over 1 million immigrants who landed in Halifax at Pier 21. Explore the new Halifax Cental Library, named as one of CNN's 10 eye-popping new buildings in 2014. Skate or bike The Emera Oval. The long-track speed skating oval on the Halifax Commons is an outdoor activity destination in summer and in winter. Stroll through the beautiful Victorian flower gardens and grounds at Halifax Public Gardens. Take in one of Canada’s best walks along the Halifax Waterfront. Be inspired by Atlantic Canada’s largest art collection at the Art Gallery of Nova Scotia. Ride the oldest running saltwater ferry service in North America (second oldest in the world) when you take the ferry between Dartmouth and Halifax. Experience the craftsmanship of Canada's only mouth-blown, hand-cut crystal maker, NovaScotian Crystal on the Halifax Waterfront. Venture to McNabs Island, located at the mouth of the Halifax Harbour, for secluded trails, a beautiful beach, and a historic fort. Explore the oldest continuously running farmers' market in North America at the Halifax Seaport Farmers' Market. Visit Alderney Landing on the Dartmouth Waterfront and peruse the shops, art gallery, community theatre, and restaurants. For the golfer - you have plenty of golfing choices to make while golfing in Halifax Metro."

  • I crafted the text string above, to have some degree of repetition.

Rake.run(text)
[('provincial capital', 1.0), ('national defence', 1.0), ('nova scotia', 1.0), ('skate', 1.0), ('street', 1.0), ('clubs per capita', 1.0), ('privateers', 1.0), ('population', 1.0), ('three times', 1.0), ('south american cities', 1.0), ('take', 1.0), ('galleries', 1.0), ('fisherman', 1.0), ('halifax regional municipality', 1.0), ('edge alongside', 1.0), ('fitting', 1.0), ('agriculture', 1.0), ('urban area centred', 1.0), ('business friendliness', 1.0), ('top 10', 1.0), ('plenty', 1.0), ('mining', 1.0), ('mcnabs island', 1.0), ('peruse', 1.0), ('complete without trying', 1.0), ('atlantic canada', 1.0), ('ll pass unique shops', 1.0), ('stretching', 1.0), ('large cities', 1.0), ('fishing', 1.0), ('future', 1.0), ('star-shaped fort completed', 1.0), ('maritime museum', 1.0), ('original 1820 brewery welcomes visitors', 1.0), ('hilltop citadel', 1.0), ('list', 1.0), ('walk across', 1.0), ('cnn', 1.0), ('quality', 1.0), ('filled', 1.0), ('ride', 1.0), ('notably', 1.0), ('days', 1.0), ('beautiful beach', 1.0), ('large concentration', 1.0), ('longest running saltwater ferry', 1.0), ('also', 1.0), ('largest art collection', 1.0), ('port', 1.0), ('restaurants', 1.0), ('market', 1.0), ('fourth best place', 1.0), ('gateway', 1.0), ('water', 1.0), ('alexander keith', 1.0), ('canadian museum', 1.0), ('various levels', 1.0), ('oldest continuously running farmers', 1.0), ('additionally', 1.0), ('named', 1.0), ('mouth-blown', 1.0), ('city', 1.0), ('department', 1.0), ('summer', 1.0), ('halifax harbour', 1.0), ('hear captivating sea stories', 1.0), ('placed first', 1.0), ('almost', 1.0), ('ferry', 1.0), ('follows', 1.0), ('novascotian crystal', 1.0), ('cross', 1.0), ('amalgamated', 1.0), ('cove', 1.0), ('beautiful victorian flower gardens', 1.0), ('new halifax cental library', 1.0), ('university', 1.0), ('10 eye-popping new buildings', 1.0), ('soldier', 1.0), ('course', 1.0), ('become', 1.0), ('1850s', 1.0), ('halifax public gardens', 1.0), ('winter', 1.0), ('1 million immigrants', 1.0), ('warmer months', 1.0), ('maritime history', 1.0), ('craftsmanship', 1.0), ('brewery', 1.0), ('economic generators include', 1.0), ('capital', 1.0), ('hop aboard', 1.0), ('venture', 1.0), ('oldest running saltwater ferry service', 1.0), ('keith', 1.0), ('one million immigrants', 1.0), ('hand-cut crystal maker', 1.0), ('waterfront warehouses known', 1.0), ('day', 1.0), ('mayor', 1.0), ('halifax waterfront boardwalk', 1.0), ('municipality', 1.0), ('major economic centre', 1.0), ('regional municipality consists', 1.0), ('dartmouth waterfront', 1.0), ('historic properties recall halifax', 1.0), ('located', 1.0), ('government', 1.0), ('dartmouth', 1.0), ('canada', 1.0), ('halifax shipyard', 1.0), ('war', 1.0), ('also known', 1.0), ('forestry', 1.0), ('golfing', 1.0), ('trading hub', 1.0), ('best walks along', 1.0), ('explore', 1.0), ('dalhousie university', 1.0), ('halifax seaport farmers', 1.0), ('cafés', 1.0), ('harbour', 1.0), ('mouth', 1.0), ('legally known', 1.0), ('north america', 1.0), ('major business centre', 1.0), ('bike', 1.0), ('private sector companies', 1.0), ('hrm', 1.0), ('offical food', 1.0), ('titanic', 1.0), ('major employers', 1.0), ('visit', 1.0), ('small', 1.0), ('shops', 1.0), ('saint mary', 1.0), ('moneysense magazine', 1.0), ('natural gas extraction', 1.0), ('pubs', 1.0), ('bedford', 1.0), ('ranked', 1.0), ('locally-owned shops', 1.0), ('life', 1.0), ('discover', 1.0), ('province', 1.0), ('eastern canada', 1.0), ('stories', 1.0), ('stroll', 1.0), ('inspired', 1.0), ('immigration', 1.0), ('live', 1.0), ('famous brewmaster', 1.0), ('placed second', 1.0), ('visit alderney landing', 1.0), ('graceful tall ships', 1.0), ('halifax metro', 1.0), ('second largest ice-free harbour', 1.0), ('golfer', 1.0), ('outdoor activity destination', 1.0), ('pier 21', 1.0), ('dartmouth side', 1.0), ('art gallery', 1.0), ('world', 1.0), ('four former municipalities', 1.0), ('costumed guides', 1.0), ('fact', 1.0), ('long-track speed skating oval', 1.0), ('second oldest', 1.0), ('fabled donair', 1.0), ('secluded trails', 1.0), ('emera oval', 1.0), ('halifax commons', 1.0), ('north', 1.0), ('consistently placed', 1.0), ('fdi magazine', 1.0), ('good ale', 1.0), ('conducted', 1.0), ('atlantic', 1.0), ('government services', 1.0), ('historic fort', 1.0), ('community theatre', 1.0), ('200-year-old restored fishing village', 1.0), ('halifax county', 1.0), ('experience', 1.0), ('halifax citadel national historic site', 1.0), ('halifax', 1.0), ('make', 1.0), ('grounds', 1.0), ('landed', 1.0), ('one', 1.0), ('atlantic ocean port', 1.0), ('golfing choices', 1.0), ('halifax waterfront', 1.0), ('rural areas', 1.0), ('casino nova scotia', 1.0), ('major resource industries found', 1.0), ('dominated', 1.0), ('403', 0), ('2014', 0), ('131', 0), ('2012', 0), ('1996', 0), ('2016', 0), ('1812', 0), ('701', 0), ('316', 0)]

Rake = RAKE.Rake(RAKE.NLTKStopList())

Rake.run(text)
[('provincial capital', 1.0), ('national defence', 1.0), ('nova scotia', 1.0), ('skate', 1.0), ('street', 1.0), ('clubs per capita', 1.0), ('privateers', 1.0), ('population', 1.0), ('three times', 1.0), ('south american cities', 1.0), ('take', 1.0), ('galleries', 1.0), ('fisherman', 1.0), ('halifax regional municipality', 1.0), ('edge alongside', 1.0), ('fitting', 1.0), ('agriculture', 1.0), ('urban area centred', 1.0), ('business friendliness', 1.0), ('top 10', 1.0), ('plenty', 1.0), ('mining', 1.0), ('mcnabs island', 1.0), ('peruse', 1.0), ('complete without trying', 1.0), ('atlantic canada', 1.0), ('ll pass unique shops', 1.0), ('stretching', 1.0), ('large cities', 1.0), ('fishing', 1.0), ('future', 1.0), ('star-shaped fort completed', 1.0), ('maritime museum', 1.0), ('original 1820 brewery welcomes visitors', 1.0), ('hilltop citadel', 1.0), ('list', 1.0), ('walk across', 1.0), ('cnn', 1.0), ('quality', 1.0), ('filled', 1.0), ('ride', 1.0), ('notably', 1.0), ('days', 1.0), ('beautiful beach', 1.0), ('large concentration', 1.0), ('longest running saltwater ferry', 1.0), ('also', 1.0), ('largest art collection', 1.0), ('port', 1.0), ('restaurants', 1.0), ('market', 1.0), ('fourth best place', 1.0), ('gateway', 1.0), ('water', 1.0), ('alexander keith', 1.0), ('canadian museum', 1.0), ('various levels', 1.0), ('oldest continuously running farmers', 1.0), ('additionally', 1.0), ('named', 1.0), ('mouth-blown', 1.0), ('city', 1.0), ('department', 1.0), ('summer', 1.0), ('halifax harbour', 1.0), ('hear captivating sea stories', 1.0), ('placed first', 1.0), ('almost', 1.0), ('ferry', 1.0), ('follows', 1.0), ('novascotian crystal', 1.0), ('cross', 1.0), ('amalgamated', 1.0), ('cove', 1.0), ('beautiful victorian flower gardens', 1.0), ('new halifax cental library', 1.0), ('university', 1.0), ('10 eye-popping new buildings', 1.0), ('soldier', 1.0), ('course', 1.0), ('become', 1.0), ('1850s', 1.0), ('halifax public gardens', 1.0), ('winter', 1.0), ('1 million immigrants', 1.0), ('warmer months', 1.0), ('maritime history', 1.0), ('craftsmanship', 1.0), ('brewery', 1.0), ('economic generators include', 1.0), ('capital', 1.0), ('hop aboard', 1.0), ('venture', 1.0), ('oldest running saltwater ferry service', 1.0), ('keith', 1.0), ('one million immigrants', 1.0), ('hand-cut crystal maker', 1.0), ('waterfront warehouses known', 1.0), ('day', 1.0), ('mayor', 1.0), ('halifax waterfront boardwalk', 1.0), ('municipality', 1.0), ('major economic centre', 1.0), ('regional municipality consists', 1.0), ('dartmouth waterfront', 1.0), ('historic properties recall halifax', 1.0), ('located', 1.0), ('government', 1.0), ('dartmouth', 1.0), ('canada', 1.0), ('halifax shipyard', 1.0), ('war', 1.0), ('also known', 1.0), ('forestry', 1.0), ('golfing', 1.0), ('trading hub', 1.0), ('best walks along', 1.0), ('explore', 1.0), ('dalhousie university', 1.0), ('halifax seaport farmers', 1.0), ('cafés', 1.0), ('harbour', 1.0), ('mouth', 1.0), ('legally known', 1.0), ('north america', 1.0), ('major business centre', 1.0), ('bike', 1.0), ('private sector companies', 1.0), ('hrm', 1.0), ('offical food', 1.0), ('titanic', 1.0), ('major employers', 1.0), ('visit', 1.0), ('small', 1.0), ('shops', 1.0), ('saint mary', 1.0), ('moneysense magazine', 1.0), ('natural gas extraction', 1.0), ('pubs', 1.0), ('bedford', 1.0), ('ranked', 1.0), ('locally-owned shops', 1.0), ('life', 1.0), ('discover', 1.0), ('province', 1.0), ('eastern canada', 1.0), ('stories', 1.0), ('stroll', 1.0), ('inspired', 1.0), ('immigration', 1.0), ('live', 1.0), ('famous brewmaster', 1.0), ('placed second', 1.0), ('visit alderney landing', 1.0), ('graceful tall ships', 1.0), ('halifax metro', 1.0), ('second largest ice-free harbour', 1.0), ('golfer', 1.0), ('outdoor activity destination', 1.0), ('pier 21', 1.0), ('dartmouth side', 1.0), ('art gallery', 1.0), ('world', 1.0), ('four former municipalities', 1.0), ('costumed guides', 1.0), ('fact', 1.0), ('long-track speed skating oval', 1.0), ('second oldest', 1.0), ('fabled donair', 1.0), ('secluded trails', 1.0), ('emera oval', 1.0), ('halifax commons', 1.0), ('north', 1.0), ('consistently placed', 1.0), ('fdi magazine', 1.0), ('good ale', 1.0), ('conducted', 1.0), ('atlantic', 1.0), ('government services', 1.0), ('historic fort', 1.0), ('community theatre', 1.0), ('200-year-old restored fishing village', 1.0), ('halifax county', 1.0), ('experience', 1.0), ('halifax citadel national historic site', 1.0), ('halifax', 1.0), ('make', 1.0), ('grounds', 1.0), ('landed', 1.0), ('one', 1.0), ('atlantic ocean port', 1.0), ('golfing choices', 1.0), ('halifax waterfront', 1.0), ('rural areas', 1.0), ('casino nova scotia', 1.0), ('major resource industries found', 1.0), ('dominated', 1.0), ('403', 0), ('2014', 0), ('131', 0), ('2012', 0), ('1996', 0), ('2016', 0), ('1812', 0), ('701', 0), ('316', 0)]

  • Updates:
    • same (basically) in Python 2.7 venv: all values 1.0, with numbers (integers, dates) scoring as 0.
    • same, in host (Python 3.6.3) env, RAKE installed via sudo pip3 install python-rake
    • OS: Arch Linux x86_64

@fabianvf you got this one?

@justinkterry I have a few big releases in the pipeline right now, but I can probably take a look around Monday or so.

@fabianvf Cool. I ask because I have a suspicion that it's (at least in part) some sort of mangling in PyPI given how correctly this is being used and the similar problems before, and I can't address those.

@victoriastuart Could you please give a more detailed repo of the install and system parameters?

No problem:

  • HARDWARE | OS:

    • Arch Linux 64-bit
    • RAM: 32 GB RAM
    • CPU: Intel Core i7-4790 CPU @ 3.60 GHz x 4 cores (hyper-threaded to 8 threads)
    • Linux kernel: 4.13.11-1
    • Architecture: x86_64
    • uname -a: Linux victoria 4.13.11-1-ARCH #1 SMP PREEMPT Thu Nov 2 10:25:56 CET 2017 x86_64 GNU/Linux
  • PYTHON:

Python 3.6.3
[victoria@victoria ~]$ p3
    [Anaconda Python 3.5 venv (source activate py35)]
(py35) [victoria@victoria ~]$ python --version
Python 3.5.3 :: Anaconda custom (64-bit)
(py35) [victoria@victoria ~]$ P
[P: python]
Python 3.5.3 |Anaconda custom (64-bit)| (default, Mar  6 2017, 11:58:13) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
(py35) [victoria@victoria ~]$ 
(py35) [victoria@victoria ~]$ pip list | grep -i rake
python-rake (1.4.0)
(py35) [victoria@victoria ~]$ conda list | grep -i rake
python-rake               1.4.0                     <pip>
(py35) [victoria@victoria ~]$

As mentioned, I tried the usual install (python-rake) via pip in host env (Python 3.6), as well as my Python 2.7 and 3.6 venv ...

Could you try it inside of virtualenv instead of anaconda?

[victoria@victoria ~]$ python-mkvirtualenv             
bash: python-mkvirtualenv: command not found

[victoria@victoria ~]$ python -m venv rake

[victoria@victoria ~]$ source rake/bin/activate

(rake) [victoria@victoria ~]$ P
[P: python]
Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

(rake) [victoria@victoria ~]$ pip3 install python-rake
Collecting python-rake
  Using cached python-rake-1.4.0.tar.gz
Installing collected packages: python-rake
  Running setup.py install for python-rake ... done
Successfully installed python-rake-1.4.0

(rake) [victoria@victoria ~]$ P
[P: python]
Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import RAKE
>>> Rake = RAKE.Rake(RAKE.SmartStopList())
>>> text="Halifax, an Atlantic Ocean port in eastern Canada, is the provincial capital of Nova Scotia. A major business centre, it’s also known for its maritime history. The city’s dominated by the hilltop Citadel, a star-shaped fort completed in the 1850s. Waterfront warehouses known as the Historic Properties recall Halifax’s days as a trading hub for privateers, notably during the War of 1812. Halifax, legally known as the Halifax Regional Municipality (HRM), is the capital of the province of Nova Scotia, Canada. The municipality had a population of 403,131 in 2016, with 316,701 in the urban area centred on Halifax Harbour. The regional municipality consists of four former municipalities that were amalgamated in 1996: Halifax, Dartmouth, Bedford, and the Municipality of Halifax County. Halifax is a major economic centre in Atlantic Canada with a large concentration of government services and private sector companies. Major employers and economic generators include the Department of National Defence, Dalhousie University, Saint Mary's University, the Halifax Shipyard, various levels of government, and the Port of Halifax. Agriculture, fishing, mining, forestry and natural gas extraction are major resource industries found in the rural areas of the municipality. Halifax was ranked by MoneySense magazine as the fourth best place to live in Canada for 2012, placed first on a list of 'large cities by quality of life' and placed second in a list of 'large cities of the future', both conducted by fDi Magazine for North and South American cities. Additionally, Halifax has consistently placed in the top 10 for business friendliness of North and South American cities, as conducted by fDi Magazine. For a city with more pubs and clubs per capita than almost any city in Canada, it’s fitting that our most famous brewmaster was also our mayor. Three times. Alexander Keith’s original 1820 brewery welcomes visitors with costumed guides, stories and, of course, good ale. Walk across the street from Keith’s Brewery to the Halifax waterfront boardwalk that follows the water’s edge alongside the world’s second largest ice-free harbour. Stretching from the Canadian Museum of Immigration at Pier 21 – the gateway into Canada for over one million immigrants – to Casino Nova Scotia, you’ll pass unique shops, restaurants, and in the warmer months, graceful tall ships. Hop aboard the ferry, North America's longest running saltwater ferry, in fact, and cross the harbour to the Dartmouth side which is filled with more locally-owned shops, galleries, cafés, restaurants, and pubs. A visit to Halifax is not complete without trying the fabled donair, the offical food of Halifax.  Become a soldier for a day at Halifax Citadel National Historic Site. Visit a 200-year-old restored fishing village at Fisherman’s Cove. Hear captivating sea stories from small to the Titanic at the Maritime Museum of the Atlantic. Discover the stories of over 1 million immigrants who landed in Halifax at Pier 21. Explore the new Halifax Cental Library, named as one of CNN's 10 eye-popping new buildings in 2014.  Skate or bike The Emera Oval. The long-track speed skating oval on the Halifax Commons is an outdoor activity destination in summer and in winter. Stroll through the beautiful Victorian flower gardens and grounds at Halifax Public Gardens. Take in one of Canada’s best walks along the Halifax Waterfront. Be inspired by Atlantic Canada’s largest art collection at the Art Gallery of Nova Scotia. Ride the oldest running saltwater ferry service in North America (second oldest in the world) when you take the ferry between Dartmouth and Halifax. Experience the craftsmanship of Canada's only mouth-blown, hand-cut crystal maker, NovaScotian Crystal on the Halifax Waterfront. Venture to McNabs Island, located at the mouth of the Halifax Harbour, for secluded trails, a beautiful beach, and a historic fort. Explore the oldest continuously running farmers' market in North America at the Halifax Seaport Farmers' Market. Visit Alderney Landing on the Dartmouth Waterfront and peruse the shops, art gallery, community theatre, and restaurants. For the golfer - you have plenty of golfing choices to make while golfing in Halifax Metro."
>>> Rake.run(text)
[('halifax', 1.0), ('atlantic ocean port', 1.0), ('eastern canada', 1.0), ('provincial capital', 1.0), ('nova scotia', 1.0), ('major business centre', 1.0), ('maritime history', 1.0), ('city', 1.0), ('dominated', 1.0), ('hilltop citadel', 1.0), ('star-shaped fort completed', 1.0), ('1850s', 1.0), ('waterfront warehouses', 1.0), ('historic properties recall halifax', 1.0), ('days', 1.0), ('trading hub', 1.0), ('privateers', 1.0), ('notably', 1.0), ('war', 1.0), ('legally', 1.0), ('halifax regional municipality', 1.0), ('hrm', 1.0), ('capital', 1.0), ('province', 1.0), ('canada', 1.0), ('municipality', 1.0), ('population', 1.0), ('urban area centred', 1.0), ('halifax harbour', 1.0), ('regional municipality consists', 1.0), ('municipalities', 1.0), ('amalgamated', 1.0), ('dartmouth', 1.0), ('bedford', 1.0), ('halifax county', 1.0), ('major economic centre', 1.0), ('atlantic canada', 1.0), ('large concentration', 1.0), ('government services', 1.0), ('private sector companies', 1.0), ('major employers', 1.0), ('economic generators include', 1.0), ('department', 1.0), ('national defence', 1.0), ('dalhousie university', 1.0), ('saint mary', 1.0), ('university', 1.0), ('halifax shipyard', 1.0), ('levels', 1.0), ('government', 1.0), ('port', 1.0), ('agriculture', 1.0), ('fishing', 1.0), ('mining', 1.0), ('forestry', 1.0), ('natural gas extraction', 1.0), ('major resource industries found', 1.0), ('rural areas', 1.0), ('ranked', 1.0), ('moneysense magazine', 1.0), ('fourth', 1.0), ('place', 1.0), ('live', 1.0), ('list', 1.0), ('large cities', 1.0), ('quality', 1.0), ('life', 1.0), ('future', 1.0), ('conducted', 1.0), ('fdi magazine', 1.0), ('north', 1.0), ('south american cities', 1.0), ('additionally', 1.0), ('consistently', 1.0), ('top 10', 1.0), ('business friendliness', 1.0), ('pubs', 1.0), ('clubs', 1.0), ('capita', 1.0), ('fitting', 1.0), ('famous brewmaster', 1.0), ('mayor', 1.0), ('times', 1.0), ('alexander keith', 1.0), ('original 1820 brewery welcomes visitors', 1.0), ('costumed guides', 1.0), ('stories', 1.0), ('good ale', 1.0), ('walk', 1.0), ('street', 1.0), ('keith', 1.0), ('brewery', 1.0), ('halifax waterfront boardwalk', 1.0), ('water', 1.0), ('edge alongside', 1.0), ('world', 1.0), ('largest ice-free harbour', 1.0), ('stretching', 1.0), ('canadian museum', 1.0), ('immigration', 1.0), ('pier 21', 1.0), ('gateway', 1.0), ('million immigrants', 1.0), ('casino nova scotia', 1.0), ('ll pass unique shops', 1.0), ('restaurants', 1.0), ('warmer months', 1.0), ('graceful tall ships', 1.0), ('hop aboard', 1.0), ('ferry', 1.0), ('north america', 1.0), ('longest running saltwater ferry', 1.0), ('fact', 1.0), ('cross', 1.0), ('harbour', 1.0), ('dartmouth side', 1.0), ('filled', 1.0), ('locally-owned shops', 1.0), ('galleries', 1.0), ('cafés', 1.0), ('visit', 1.0), ('complete', 1.0), ('fabled donair', 1.0), ('offical food', 1.0), ('soldier', 1.0), ('day', 1.0), ('halifax citadel national historic site', 1.0), ('200-year-', 1.0), ('restored fishing village', 1.0), ('fisherman', 1.0), ('cove', 1.0), ('hear captivating sea stories', 1.0), ('small', 1.0), ('titanic', 1.0), ('maritime museum', 1.0), ('atlantic', 1.0), ('discover', 1.0), ('1 million immigrants', 1.0), ('landed', 1.0), ('explore', 1.0), ('halifax cental library', 1.0), ('named', 1.0), ('cnn', 1.0), ('10 eye-popping', 1.0), ('buildings', 1.0), ('skate', 1.0), ('bike', 1.0), ('emera oval', 1.0), ('long-track speed skating oval', 1.0), ('halifax commons', 1.0), ('outdoor activity destination', 1.0), ('summer', 1.0), ('winter', 1.0), ('stroll', 1.0), ('beautiful victorian flower gardens', 1.0), ('grounds', 1.0), ('halifax public gardens', 1.0), ('walks', 1.0), ('halifax waterfront', 1.0), ('inspired', 1.0), ('largest art collection', 1.0), ('art gallery', 1.0), ('ride', 1.0), ('oldest running saltwater ferry service', 1.0), ('oldest', 1.0), ('experience', 1.0), ('craftsmanship', 1.0), ('mouth-blown', 1.0), ('hand-cut crystal maker', 1.0), ('novascotian crystal', 1.0), ('venture', 1.0), ('mcnabs island', 1.0), ('located', 1.0), ('mouth', 1.0), ('secluded trails', 1.0), ('beautiful beach', 1.0), ('historic fort', 1.0), ('oldest continuously running farmers', 1.0), ('market', 1.0), ('halifax seaport farmers', 1.0), ('visit alderney landing', 1.0), ('dartmouth waterfront', 1.0), ('peruse', 1.0), ('shops', 1.0), ('community theatre', 1.0), ('golfer', 1.0), ('plenty', 1.0), ('golfing choices', 1.0), ('make', 1.0), ('golfing', 1.0), ('halifax metro', 1.0), ('1812', 0), ('403', 0), ('131', 0), ('2016', 0), ('316', 0), ('701', 0), ('1996', 0), ('2012', 0), ('2014', 0)]
(rake) [victoria@victoria ~]$ deactivate
[victoria@victoria ~]$ 

SOLUTION

OK I dissected the code (print statements ...),

/home/victoria/rake/lib64/python3.6/site-packages/RAKE/RAKE.py

Here is the problem:

def separate_words(text):
    """
    Utility function to return a list of all words that are have a length greater than a specified number of characters.
    @param text The text that must be split in to words.
    @param min_word_return_size The minimum no of characters a word must have to be included.
    """
    #splitter = re.compile('/W+')
    splitter = re.compile('\W+')
    # ...

It appears that '/W+' is a Windows-based regex (I'm guessing); the Linux version (mine) is \W+'

With the former I was getting this (e.g.)

phrase: oldest continuously running farmers
word_list in phrase: ['oldest continuously running farmers']

not this (e.g.)

phrase: oldest continuously running farmers
word_list in phrase: ['oldest', 'continuously', 'running', 'farmers']

Correct output (I presume) now:

[('long-track speed skating oval', 23.5), ('oldest running saltwater ferry service', 19.916666666666664), ('halifax citadel national historic site', 17.628205128205128), ('longest running saltwater ferry', 15.583333333333332), ('oldest continuously running farmers', 15.166666666666668), ('star-shaped fort completed', 15.0), ('major resource industries found', 15.0), ('hand-cut crystal maker', 15.0), ('ll pass unique shops', 14.666666666666666), ('original 1820 brewery welcomes visitors', 14.5), ('beautiful victorian flower gardens', 14.5), ('hear captivating sea stories', 14.0), ('largest ice-free harbour', 13.75), ('historic properties recall halifax', 13.628205128205128), ('urban area centred', 9.0), ('major economic centre', 9.0), ('private sector companies', 9.0), ('economic generators include', 9.0), ('natural gas extraction', 9.0), ('graceful tall ships', 9.0), ('outdoor activity destination', 9.0), ('largest art collection', 8.833333333333334), ('locally-owned shops', 8.666666666666666), ('major business centre', 8.5), ('south american cities', 8.5), ('halifax public gardens', 8.461538461538462), ('halifax seaport farmers', 8.461538461538462), ('restored fishing village', 8.0), ('halifax cental library', 7.961538461538462), ('regional municipality consists', 7.8), ('visit alderney landing', 7.666666666666667), ('casino nova scotia', 7.5), ('halifax waterfront boardwalk', 7.161538461538462), ('atlantic ocean port', 7.0), ('halifax regional municipality', 6.7615384615384615), ('historic fort', 6.666666666666666), ('hilltop citadel', 5.5), ('national defence', 5.5), ('emera oval', 5.5), ('major employers', 5.0), ('novascotian crystal', 5.0), ('beautiful beach', 5.0), ('nova scotia', 4.5), ('large cities', 4.5), ('business friendliness', 4.5), ('art gallery', 4.333333333333334), ('halifax harbour', 4.211538461538462), ('waterfront warehouses', 4.2), ('halifax waterfront', 4.161538461538462), ('maritime history', 4.0), ('trading hub', 4.0), ('large concentration', 4.0), ('saint mary', 4.0), ('rural areas', 4.0), ('moneysense magazine', 4.0), ('fdi magazine', 4.0), ('famous brewmaster', 4.0), ('costumed guides', 4.0), ('good ale', 4.0), ('edge alongside', 4.0), ('canadian museum', 4.0), ('million immigrants', 4.0), ('warmer months', 4.0), ('hop aboard', 4.0), ('fabled donair', 4.0), ('offical food', 4.0), ('maritime museum', 4.0), ('1 million immigrants', 4.0), ('10 eye-popping', 4.0), ('mcnabs island', 4.0), ('secluded trails', 4.0), ('community theatre', 4.0), ('halifax county', 3.9615384615384617), ('halifax shipyard', 3.9615384615384617), ('halifax commons', 3.9615384615384617), ('halifax metro', 3.9615384615384617), ('dartmouth waterfront', 3.7), ('north america', 3.6), ('provincial capital', 3.5), ('government services', 3.5), ('dalhousie university', 3.5), ('alexander keith', 3.5), ('dartmouth side', 3.5), ('mouth-blown', 3.5), ('golfing choices', 3.5), ('oldest', 3.3333333333333335), ('eastern canada', 3.333333333333333), ('atlantic canada', 3.333333333333333), ('ferry', 2.75), ('shops', 2.6666666666666665), ('brewery', 2.5), ('harbour', 2.25), ('port', 2.0), ('fishing', 2.0), ('stories', 2.0), ('atlantic', 2.0), ('halifax', 1.9615384615384615), ('municipality', 1.8), ('visit', 1.6666666666666667), ('north', 1.6), ('capital', 1.5), ('dartmouth', 1.5), ('university', 1.5), ('government', 1.5), ('keith', 1.5), ('mouth', 1.5), ('golfing', 1.5), ('canada', 1.3333333333333333), ('city', 1.0), ('dominated', 1.0), ('1850s', 1.0), ('days', 1.0), ('privateers', 1.0), ('notably', 1.0), ('war', 1.0), ('legally', 1.0), ('hrm', 1.0), ('province', 1.0), ('population', 1.0), ('municipalities', 1.0), ('amalgamated', 1.0), ('bedford', 1.0), ('department', 1.0), ('levels', 1.0), ('agriculture', 1.0), ('mining', 1.0), ('forestry', 1.0), ('ranked', 1.0), ('fourth', 1.0), ('place', 1.0), ('live', 1.0), ('list', 1.0), ('quality', 1.0), ('life', 1.0), ('future', 1.0), ('conducted', 1.0), ('additionally', 1.0), ('consistently', 1.0), ('top 10', 1.0), ('pubs', 1.0), ('clubs', 1.0), ('capita', 1.0), ('fitting', 1.0), ('mayor', 1.0), ('times', 1.0), ('walk', 1.0), ('street', 1.0), ('water', 1.0), ('world', 1.0), ('stretching', 1.0), ('immigration', 1.0), ('pier 21', 1.0), ('gateway', 1.0), ('restaurants', 1.0), ('fact', 1.0), ('cross', 1.0), ('filled', 1.0), ('galleries', 1.0), ('cafés', 1.0), ('complete', 1.0), ('soldier', 1.0), ('day', 1.0), ('200-year-', 1.0), ('fisherman', 1.0), ('cove', 1.0), ('small', 1.0), ('titanic', 1.0), ('discover', 1.0), ('landed', 1.0), ('explore', 1.0), ('named', 1.0), ('cnn', 1.0), ('buildings', 1.0), ('skate', 1.0), ('bike', 1.0), ('summer', 1.0), ('winter', 1.0), ('stroll', 1.0), ('grounds', 1.0), ('walks', 1.0), ('inspired', 1.0), ('ride', 1.0), ('experience', 1.0), ('craftsmanship', 1.0), ('venture', 1.0), ('located', 1.0), ('market', 1.0), ('peruse', 1.0), ('golfer', 1.0), ('plenty', 1.0), ('make', 1.0), ('1812', 0), ('403', 0), ('131', 0), ('2016', 0), ('316', 0), ('701', 0), ('1996', 0), ('2012', 0), ('2014', 0)]

Addendum, for anyone not used to this software (as I was / am):

Usage:

import RAKE

Rake = RAKE.Rake(RAKE.SmartStopList())

# or, optionally:
# Rake = RAKE.Rake(RAKE.SmartStopList(),5,5,1)
# see RAKE.py for details, or (e.g.)
# https://www.airpair.com/nlp/keyword-extraction-tutorial
    
text="Halifax, an Atlantic Ocean port in eastern Canada, is the provincial capital of Nova Scotia. A major business centre, it’s also known for its maritime history. The city’s dominated by the hilltop Citadel, a star-shaped fort completed in the 1850s. Waterfront warehouses ..."

Rake.run(text)

A great description of RAKE scoring (degree; frequency; ...) is found here: https://codelingo.wordpress.com/2017/05/26/keyword-extraction-using-rake/

This RAKE package looks great -- thanks to all involved in keeping it going (#1) -- much appreciated! :-)

It isn't a Windows/Linux version issue (python's uses its own regex engine), it's just that in retrospect I forgot to validate the code after I made the fix related to that and we don't have automated tests yet. @fabianvf Can you change the / in the main branch?

Ahhhso ... :-p

Good detective work, thanks for the issue and fix! Pushed the changed regex to master and tagged a new release (1.4.1), should hit pypi as soon as the travis build finishes.