關於 IronWebScraper for .NET

# 框架,用於從 HTML Web 應用程式中提取乾淨、結構化的資料。

IronWebScraper for .NET is a C# web scraping library, that allows developers to simulate and automate human browsing behavior to extract content, files and images from Web applications as native .NET objects. IronWebScraper manages politeness and multithreading in the background, leaving your application easy to understand and maintain.

IronWebScraper Features

  • Powerful Scraping Engine Under Your Control - Just write a single C# web-scraper class to scrape thousands or even millions of web pages into C# Class Instances, JSON or Downloaded Files. IronWebScraper allows you to code concise, linear workflows simulating human browsing behavior. IronWebScraper will run your code as a swarm of virtual web browsers, massively paralleled, yet polite and fault tolerant.
  • Simple, Flexible Logic - IronWebScraper must be programmed to know how to handle each “type” of page it encounters. This is achieved in a very concise manner using CSS Selectors or XPath expressions and can be fully customized in C#. This freedom allows you to decide which pages to scrape within a website, and what to do with the data extracted. Each method can be debugged and watched neatly in Visual Studio.
  • Fast and Polite Behavior - IronWebScraper deals with multithreading and web-requests to allow for hundreds of concurrent threads without the developer needing to manage them. Politeness can be set to throttle requests, so reducing risk of excessive load on target web servers.
  • Create Virtual User Identities - IronWebScraper can use one or multiple “identities” - sessions that simulate real world human requests. Each request may programmatically or randomly assign its own Identity, User Agent, Cookies, Logins and even IP addresses. Requests are set as auto-unique with a combination of URL, parse method and post variables.
  • Action Replay - IronWebScraper uses advanced caching to allow developers to change their code “on the fly” and replay every previous request without contacting the internet. Every scrape job is autosaved and can be resumed in the event of an exception or power outage.
  • Rapid Installation with Microsoft Visual Studio - IronWebScraper puts Web Scraping tools in your own hands quickly with a Visual Studio installer. Whether installing directly from Nuget within visual studio or downloading the DLL, you’ll be setup in no time. Just one DLL and no dependancies.