How to extract data from CSV file with Python

I had a problem: I needed to extract specific data from a CSV file that was 6GB in size! No human has the time to go through a file like that manually to locate the needed data. Well… if someone were paying me by the hour, I’d gladly do it! But that wasn’t the case.

You know who does have time to go through that CSV? Python! Sure, any other programming language could handle the task, but in this case, I chose to use Python. Here’s a breakdown of the code:

First, we import the pandas library—a powerful Python tool for working with table-like data formats (like spreadsheets or databases). Instead of typing pandas every time, we use an alias by importing it as pd. You could do this without a library, but I decided to go this route for simplicity and efficiency.

import pandas as pd

This line reads a CSV file named ‘LargeData.csv‘ and loads it into a DataFrame called df. A DataFrame is like an in-memory spreadsheet: rows and columns of data you can filter, analyze, and modify.

df = pd.read_csv('LargeData.csv')

Note: If your CSV file isn’t in the same directory as your Python script, you’ll need to specify the full path. In this case, mine was in the same directory, so no extra work needed.

Now, here’s where the magic happens! We filter the DataFrame to include only the rows where:

  • The County column is 'Towner', AND
  • The State column is 'North Dakota'

The result is stored in a new DataFrame called filtered_df:

filtered_df = df[(df['County'] == 'Towner') & (df['State'] == "North Dakota")]

If you’re not sure what the column names are in your CSV, you can use the following line to display them:

print(df.columns)

Once we have the data in filtered_df, we can write it to a new CSV file. This line does exactly that:

filtered_df.to_csv("FilteredData.csv", index=False)

The index=False part means that the row numbers (indexes) won’t be saved into the file.

And that’s it! Once you run this Python script, a new CSV file with the filtered data should appear in the same directory as your script. I hope this helps if you’re struggling to extract data from a large CSV file!

import pandas as pd
df = pd.read_csv('LargeData.csv')
filtered_df = df[(df['County'] == 'Towner') & (df['State'] == "North Dakota")]

filtered_df.to_csv("FilteredData.csv", index=False)

Leave a Reply

Your email address will not be published. Required fields are marked *