We've already used Pandas to import data in previous lessons, but in this lesson we'll take a little longer to dive into what's actually going on.
You will be able to:
- Import data from csv files and Excel files
- Understand and explain key arguments for imports
- Save information to csv and Excel files
- Access data within a Pandas DataFrame (print() and .head())
As usual, we import pandas under the standard alias pd
import pandas as pd
There are a few main functions for importing data into a pandas DataFrame including:
- pd.read_csv()
- pd.read_excel()
- pd.read_json()
- pd.DataFrame.from_dict()
Most of these methods are fairly straightforward; you use .read_csv()
for csv files, .read_excel()
for excel files (both new and old .xlx and .xlsx) and .read_json()
for json files. That said, there are a few nuances you should know about. The first is that the .read_csv()
format can be used for any plain-text delimited file. This may include (but is not limited to) pipe (|) delimited files (.psv) and tab seperated files (.tsv).
Let's look at an example by investigating a file, bp.txt, stored in the Data folder.
df = pd.read_csv('Data/bp.txt', delimiter='\t')
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Pt | BP | Age | Weight | BSA | Dur | Pulse | Stress | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 105 | 47 | 85.4 | 1.75 | 5.1 | 63 | 33 |
1 | 2 | 115 | 49 | 94.2 | 2.10 | 3.8 | 70 | 14 |
2 | 3 | 116 | 49 | 95.3 | 1.98 | 8.2 | 72 | 10 |
3 | 4 | 117 | 50 | 94.7 | 2.01 | 5.8 | 73 | 99 |
4 | 5 | 112 | 51 | 89.4 | 1.89 | 7.0 | 72 | 95 |
This example shows that the data was tab delimited (\t), so an appropriate file extension could have also been .tsv. Once we've loaded the dataset, we can export it to any format we would like with the related methods:
- df.to_csv()
- df.to_excel()
- df.to_json()
- df.to_dict()
There are also several other options available, but these are the most common.
Another feature that you may have to employ is skipping rows when there is metadata stored at the top of a file. You can do this using the optional paramater skiprows
. Similarly, if you want to only load a portion of a large file as an initial preview, you can use the nrows
parameter.
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', nrows=100)
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
GEO.id | GEO.id2 | GEO.display-label | HD01_VD01 | HD02_VD01 | HD01_VD02 | HD02_VD02 | HD01_VD03 | HD02_VD03 | HD01_VD04 | ... | HD01_VD32 | HD02_VD32 | HD01_VD33 | HD02_VD33 | HD01_VD34 | HD02_VD34 | HD01_VD35 | HD02_VD35 | HD01_VD36 | HD02_VD36 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Id | Id2 | Geography | Estimate; Total: | Margin of Error; Total: | Estimate; Total: - Management, business, scien... | Margin of Error; Total: - Management, business... | Estimate; Total: - Management, business, scien... | Margin of Error; Total: - Management, business... | Estimate; Total: - Management, business, scien... | ... | Estimate; Total: - Natural resources, construc... | Margin of Error; Total: - Natural resources, c... | Estimate; Total: - Production, transportation,... | Margin of Error; Total: - Production, transpor... | Estimate; Total: - Production, transportation,... | Margin of Error; Total: - Production, transpor... | Estimate; Total: - Production, transportation,... | Margin of Error; Total: - Production, transpor... | Estimate; Total: - Production, transportation,... | Margin of Error; Total: - Production, transpor... |
1 | 0500000US01001 | 01001 | Autauga County, Alabama | 33267 | 2306 | 48819 | 1806 | 55557 | 4972 | 63333 | ... | 31402 | 5135 | 35594 | 3034 | 36059 | 3893 | 47266 | 13608 | 19076 | 4808 |
2 | 0500000US01003 | 01003 | Baldwin County, Alabama | 31540 | 683 | 49524 | 1811 | 57150 | 6980 | 63422 | ... | 35603 | 3882 | 30549 | 1606 | 29604 | 4554 | 35504 | 6260 | 24182 | 3580 |
3 | 0500000US01005 | 01005 | Barbour County, Alabama | 26575 | 1653 | 41652 | 2638 | 51797 | 5980 | 52775 | ... | 37847 | 11189 | 26094 | 4884 | 25339 | 4900 | 37282 | 6017 | 16607 | 3497 |
4 | 0500000US01007 | 01007 | Bibb County, Alabama | 30088 | 2224 | 40787 | 2896 | 50069 | 12841 | 67917 | ... | 45952 | 5622 | 28983 | 3401 | 31881 | 2317 | 26580 | 2901 | 23479 | 4942 |
5 rows × 75 columns
We could manually remove:
df = df.drop(0)
df.head(2)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
GEO.id | GEO.id2 | GEO.display-label | HD01_VD01 | HD02_VD01 | HD01_VD02 | HD02_VD02 | HD01_VD03 | HD02_VD03 | HD01_VD04 | ... | HD01_VD32 | HD02_VD32 | HD01_VD33 | HD02_VD33 | HD01_VD34 | HD02_VD34 | HD01_VD35 | HD02_VD35 | HD01_VD36 | HD02_VD36 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0500000US01001 | 01001 | Autauga County, Alabama | 33267 | 2306 | 48819 | 1806 | 55557 | 4972 | 63333 | ... | 31402 | 5135 | 35594 | 3034 | 36059 | 3893 | 47266 | 13608 | 19076 | 4808 |
2 | 0500000US01003 | 01003 | Baldwin County, Alabama | 31540 | 683 | 49524 | 1811 | 57150 | 6980 | 63422 | ... | 35603 | 3882 | 30549 | 1606 | 29604 | 4554 | 35504 | 6260 | 24182 | 3580 |
2 rows × 75 columns
Or if we knew from the start, we could use the skiprows argument:
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', skiprows=1, nrows=100)
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Id | Id2 | Geography | Estimate; Total: | Margin of Error; Total: | Estimate; Total: - Management, business, science, and arts occupations: | Margin of Error; Total: - Management, business, science, and arts occupations: | Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: | Margin of Error; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: | Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: - Management occupations | ... | Estimate; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations | Margin of Error; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations | Estimate; Total: - Production, transportation, and material moving occupations: | Margin of Error; Total: - Production, transportation, and material moving occupations: | Estimate; Total: - Production, transportation, and material moving occupations: - Production occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Production occupations | Estimate; Total: - Production, transportation, and material moving occupations: - Transportation occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Transportation occupations | Estimate; Total: - Production, transportation, and material moving occupations: - Material moving occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Material moving occupations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0500000US01001 | 1001 | Autauga County, Alabama | 33267 | 2306 | 48819 | 1806 | 55557 | 4972 | 63333 | ... | 31402 | 5135 | 35594 | 3034 | 36059 | 3893 | 47266 | 13608 | 19076 | 4808 |
1 | 0500000US01003 | 1003 | Baldwin County, Alabama | 31540 | 683 | 49524 | 1811 | 57150 | 6980 | 63422 | ... | 35603 | 3882 | 30549 | 1606 | 29604 | 4554 | 35504 | 6260 | 24182 | 3580 |
2 | 0500000US01005 | 1005 | Barbour County, Alabama | 26575 | 1653 | 41652 | 2638 | 51797 | 5980 | 52775 | ... | 37847 | 11189 | 26094 | 4884 | 25339 | 4900 | 37282 | 6017 | 16607 | 3497 |
3 | 0500000US01007 | 1007 | Bibb County, Alabama | 30088 | 2224 | 40787 | 2896 | 50069 | 12841 | 67917 | ... | 45952 | 5622 | 28983 | 3401 | 31881 | 2317 | 26580 | 2901 | 23479 | 4942 |
4 | 0500000US01009 | 1009 | Blount County, Alabama | 34900 | 2063 | 46593 | 2963 | 47003 | 6189 | 50991 | ... | 42489 | 7176 | 32969 | 3767 | 31814 | 4551 | 41375 | 5280 | 26755 | 2963 |
5 rows × 75 columns
Relatedly to skiprows is the header option. This specifies the row where column names are and starts the load from that point:
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
df.head()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14858)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-6-a2d61668e2c4> in <module>()
----> 1 df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1)
2 df.head()
~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
653 skip_blank_lines=skip_blank_lines)
654
--> 655 return _read(filepath_or_buffer, kwds)
656
657 parser_f.__name__ = name
~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
409
410 try:
--> 411 data = parser.read(nrows)
412 finally:
413 parser.close()
~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1003 raise ValueError('skipfooter not supported for iteration')
1004
-> 1005 ret = self._engine.read(nrows)
1006
1007 if self.options.get('as_recarray'):
~\Anaconda3wenv\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1746 def read(self, nrows=None):
1747 try:
-> 1748 data = self._reader.read(nrows)
1749 except StopIteration:
1750 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:12175)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data (pandas\_libs\parsers.c:14136)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14972)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte
Encoding errors like the one above are always frustrating. This has to do with how the strings within the file itself are formatted. The most common encoding other then utf-8
that you are likely to come across is latin-1
.
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', header=1, encoding='latin-1')
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Id | Id2 | Geography | Estimate; Total: | Margin of Error; Total: | Estimate; Total: - Management, business, science, and arts occupations: | Margin of Error; Total: - Management, business, science, and arts occupations: | Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: | Margin of Error; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: | Estimate; Total: - Management, business, science, and arts occupations: - Management, business, and financial occupations: - Management occupations | ... | Estimate; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations | Margin of Error; Total: - Natural resources, construction, and maintenance occupations: - Installation, maintenance, and repair occupations | Estimate; Total: - Production, transportation, and material moving occupations: | Margin of Error; Total: - Production, transportation, and material moving occupations: | Estimate; Total: - Production, transportation, and material moving occupations: - Production occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Production occupations | Estimate; Total: - Production, transportation, and material moving occupations: - Transportation occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Transportation occupations | Estimate; Total: - Production, transportation, and material moving occupations: - Material moving occupations | Margin of Error; Total: - Production, transportation, and material moving occupations: - Material moving occupations | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0500000US01001 | 1001 | Autauga County, Alabama | 33267 | 2306 | 48819 | 1806 | 55557 | 4972 | 63333 | ... | 31402 | 5135 | 35594 | 3034 | 36059 | 3893 | 47266 | 13608 | 19076 | 4808 |
1 | 0500000US01003 | 1003 | Baldwin County, Alabama | 31540 | 683 | 49524 | 1811 | 57150 | 6980 | 63422 | ... | 35603 | 3882 | 30549 | 1606 | 29604 | 4554 | 35504 | 6260 | 24182 | 3580 |
2 | 0500000US01005 | 1005 | Barbour County, Alabama | 26575 | 1653 | 41652 | 2638 | 51797 | 5980 | 52775 | ... | 37847 | 11189 | 26094 | 4884 | 25339 | 4900 | 37282 | 6017 | 16607 | 3497 |
3 | 0500000US01007 | 1007 | Bibb County, Alabama | 30088 | 2224 | 40787 | 2896 | 50069 | 12841 | 67917 | ... | 45952 | 5622 | 28983 | 3401 | 31881 | 2317 | 26580 | 2901 | 23479 | 4942 |
4 | 0500000US01009 | 1009 | Blount County, Alabama | 34900 | 2063 | 46593 | 2963 | 47003 | 6189 | 50991 | ... | 42489 | 7176 | 32969 | 3767 | 31814 | 4551 | 41375 | 5280 | 26755 | 2963 |
5 rows × 75 columns
You can also specify specific columns if you only want to load specific features.
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', usecols=[0,1,2,5,6], encoding='latin-1')
df.head(2)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
GEO.id | GEO.id2 | GEO.display-label | HD01_VD02 | HD02_VD02 | |
---|---|---|---|---|---|
0 | Id | Id2 | Geography | Estimate; Total: - Management, business, scien... | Margin of Error; Total: - Management, business... |
1 | 0500000US01001 | 01001 | Autauga County, Alabama | 48819 | 1806 |
df = pd.read_csv('Data/ACS_16_5YR_B24011_with_ann.csv', usecols=['GEO.id', 'GEO.id2'], encoding='latin-1')
df.head(2)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
GEO.id | GEO.id2 | |
---|---|---|
0 | Id | Id2 |
1 | 0500000US01001 | 01001 |
You can also specify specific sheets for Excel files!
import pandas as pd
This can be done by index number
df1 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx')
df1.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | business_id | cool | date | funny | review_id | stars | text | useful | user_id |
1 | RESDUcs7fIiihp38-d6_6g | 0 | 2015-09-16 | 0 | gkcPdbblTvZDMSwx8nVEKw | 5 | Got here early on football Sunday 7:30am as I ... | 0 | SKteB5rgDlkkUa1Zxe1N0Q |
2 | RESDUcs7fIiihp38-d6_6g | 0 | 2017-09-09 | 0 | mQfl6ci46mu0xaZrkRUhlA | 5 | This buffet is amazing. Yes, it is expensive,... | 0 | f638AHA_GoHbyDB7VFMz7A |
3 | RESDUcs7fIiihp38-d6_6g | 0 | 2013-01-14 | 0 | EJ7DJ8bm7-2PLFB9WKx4LQ | 3 | I was really looking forward to this but it wa... | 0 | -wVPuTiIEG85LwTK46Prpw |
4 | RESDUcs7fIiihp38-d6_6g | 0 | 2017-02-08 | 0 | lMarDJDg4-e_0YoJOKJoWA | 2 | This place....lol our server was nice. But fo... | 0 | A21zMqdN76ueLZFpmbue0Q |
df2 = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx', sheet_name=2)
df2.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | business_id | cool | date | funny | review_id | stars | text | useful | user_id |
1 | YJ8ljUhLsz6CtT_2ORNFmg | 1 | 2013-04-25 | 0 | xgUz0Ck4_ciNaeIk-H8GBQ | 5 | I loved this place. Easily the most hipsters p... | 1 | 6cpo8iqgnW3jnozhmY7eAA |
2 | YJ8ljUhLsz6CtT_2ORNFmg | 0 | 2014-07-07 | 0 | Au7MG4QlAxqq9meyKSQmaw | 5 | So my boyfriend and I came here for my birthda... | 0 | 8bFE3u1dMoYXkS7ORqlssw |
3 | YJ8ljUhLsz6CtT_2ORNFmg | 0 | 2015-12-04 | 0 | 8IQnZ54nenXjlK-FGZ82Bg | 5 | I really enjoyed their food. Went there for th... | 1 | bJmE1ms0MyZ6KHjmfZDWGw |
4 | YJ8ljUhLsz6CtT_2ORNFmg | 2 | 2016-07-06 | 1 | XY42LMhKoXzwtLoku4mvLA | 5 | A complete Vegas experience. We arrived right ... | 3 | PbccpC-I-8rxzF2bCDh8YA |
Or the name of the sheet itself
df = pd.read_excel('Data/Yelp_Selected_Businesses.xlsx', sheet_name='Biz_id_RESDU')
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | business_id | cool | date | funny | review_id | stars | text | useful | user_id |
1 | RESDUcs7fIiihp38-d6_6g | 0 | 2015-09-16 | 0 | gkcPdbblTvZDMSwx8nVEKw | 5 | Got here early on football Sunday 7:30am as I ... | 0 | SKteB5rgDlkkUa1Zxe1N0Q |
2 | RESDUcs7fIiihp38-d6_6g | 0 | 2017-09-09 | 0 | mQfl6ci46mu0xaZrkRUhlA | 5 | This buffet is amazing. Yes, it is expensive,... | 0 | f638AHA_GoHbyDB7VFMz7A |
3 | RESDUcs7fIiihp38-d6_6g | 0 | 2013-01-14 | 0 | EJ7DJ8bm7-2PLFB9WKx4LQ | 3 | I was really looking forward to this but it wa... | 0 | -wVPuTiIEG85LwTK46Prpw |
4 | RESDUcs7fIiihp38-d6_6g | 0 | 2017-02-08 | 0 | lMarDJDg4-e_0YoJOKJoWA | 2 | This place....lol our server was nice. But fo... | 0 | A21zMqdN76ueLZFpmbue0Q |
You can also load an entire excel workbook (which is a collection of spreadsheets) with the pd.ExcelFile()
method.
workbook = pd.ExcelFile('Data/Yelp_Selected_Businesses.xlsx')
workbook.sheet_names
['Biz_id_RESDU',
'Biz_id_4JNXU',
'Biz_id_YJ8lj',
'Biz_id_ujHia',
'Biz_id_na4Th']
df = workbook.parse(sheet_name=1)
df.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | business_id | cool | date | funny | review_id | stars | text | useful | user_id |
1 | 4JNXUYY8wbaaDmk3BPzlWw | 0 | 2012-06-10 | 0 | wl8BO_I-is-JaMwMW5c_gQ | 4 | I booked a table here for brunch and it did no... | 0 | fo4mpUqgXL2mJqALc9AvbA |
2 | 4JNXUYY8wbaaDmk3BPzlWw | 0 | 2012-01-20 | 0 | cf9RrqHY9eQ9M53OPyXLtg | 4 | Came here for lunch after a long night of part... | 0 | TVvTtXwPXsvrg2KJGoOUTg |
3 | 4JNXUYY8wbaaDmk3BPzlWw | 0 | 2017-05-10 | 0 | BvmhSQ6WFm2Jxu01G8OpdQ | 5 | Loved the fried goat cheese in tomato sauce al... | 0 | etbAVunw-4kwr6VTRweZpA |
4 | 4JNXUYY8wbaaDmk3BPzlWw | 0 | 2014-05-03 | 0 | IoKp9n1489XohTV_-EJ0IQ | 5 | Love the outdoor atmosphere. Price was right, ... | 0 | vKXux2Xx3xcicTgYZoR0pg |
Once we have data loaded that we may want to export back out, we use the .to_csv() or .to_excel() methods of any dataframe object.
df.to_csv('NewSavedView.csv', index=False) #Notice how we have to pass index=False if we do not want it included in our output
df.to_excel('NewSavedView.xlsx')
We've spent some time looking into how data importing with Pandas works and some of the methods you can use to manage the import and access the data. In the next lesson, lets get some hands on practice!