Web scraping for non-profit researchers: The breakdown
Why It Matters
So long as the tool is used within ethical and legal boundries, web-scraping can be a valuable research tool for non-profits.

Web scraping doesn’t have the best reputation: it’s a practice of extracting data from web pages using a tool or bot. This can, of course, run the risk of violating data privacy rights or infringing copyright.
In commercial settings, web scraping can be used to monitor price fluctuations and competitor prices, to track consumer sentiment about a product or a campaign, or to automate data collection.
However, within the right legal bounds and ethical use frameworks, web scraping can also prove useful and relatively inexpensive for researchers. Journalists can apply web scraping tools for their research. It can help with the collection and analysis of real-time data, such as housing availability and food prices.
For those in the non-profit sector looking to automate or enhance their research capacity, web-scraping can be valuable to explore, said Leena Yahia, lead and researcher on non-profit digital resilience at Imagine Canada.
Depending on levels of technical expertise, a simple web browser such as WebHarvy or Nimble could do the trick – as could a whole team of in-house data scientists on the other end of the spectrum.
Viet Vu, manager of economic research at The Dais, says that for web scraping to be useful as a research tool, three conditions need to be satisfied: the data needs to exist in large quantities online, the language needs to be somewhat consistent, and the format in which the data exists needs to be somewhat consistent as well.
It is challenging to discuss technology and digital tools in a vacuum, so let’s look at an applied example in a research context.
In July 2024, The Dais and Imagine Canada, along with a number of other partners, published research about the state of technology talent in Canada’s non-profit sector.
Using web-scraping methods, the researchers extracted data from 300,000 job descriptions. Then, using text classification and natural language processing, they separated the data between non-profit technology jobs and others. Along with the obvious wording around a non-profit job, Vu and his team found that non-profit jobs were also more likely to write about societal issues in their job descriptions, such as women’s rights and LGBTQ+ rights.
Once parsed, the research team could compare the dataset of non-profit technology jobs with other sectors, making comparisons about salaries and skills advertised in each sector.
Vu said using job postings as their main source of data provided rich information. Since new jobs are posted daily, there is not only an abundance of information, but they also provide an indication of what non-profits are looking for now and in the future.
Although Vu regularly uses web scraping in his research, he warns non-profit organizations to avoid using the tool in a way that doesn’t justify client trust, safety, and dignity. Yahia echoes that, adding that data privacy and intellectual property legislation are still two things that non-profits need to adhere to.
“Let’s say you are a charity giving conditional cash transfers as an experiment for basic income. You’ve asked people to come back after a month and tell you how happy they are,” Viet posited. “But then, let’s say you are actually tracking what they are up to, what they tweet about, and what they post on Instagram.
“Even though the purpose is to evaluate the success of your basic income experiment, it’s likely not going to preserve the trust that you need with the people and community that you serve,” he added.
The risk with adopting each and every shiny new tool, he said, is that the technology comes first and the mission and problem second.
“If you see new technology as a hammer, you’re going to start to see everything else as a nail, and then you’re going to try to bang that hammer on it, even if it’s not the most appropriate thing.”