General Introduction
Aggregator is an open source project designed to create a free proxy pool that can crawl a variety of available proxy nodes. The platform has a flexible plug-in system , the user can according to the special needs of the target site , through plug-ins to achieve specific functions . The project is mainly used to learn crawling techniques , prohibited for any illegal activities .
Function List
- Agent Pool Construction: Automatically crawls and aggregates proxy nodes from multiple sources to form a high-quality proxy pool.
- plug-in system: Support for user-defined plug-ins to cope with the specific needs of different websites.
- automatic operation: Includes features such as autosignup, autoregistration, subscription aggregation, etc. to simplify user operations.
- Multi-protocol support: Supports a variety of proxy protocols, such as HTTP, HTTPS, SOCKS, and so on.
- Open Source and Community Support: The project is open source and users are free to modify and extend the functionality and get support through the community.
Using Help
Installation process
- environmental preparation: Ensure that Python 3.6 and above is installed.
- cloning project: Use
git clone https://github.com/wzdnzd/aggregator
command to clone the project locally. - Installation of dependencies: Go to the project directory and run
pip install -r requirements.txt
Install the required dependencies. - configuration file: Modify as necessary
config.yaml
Configuration file to set crawl target and agent pool parameters. - Running Projects: Implementation
python collect.py
To start crawling proxy nodes, executepython process.py
Processing and aggregating agents.
Usage Process
- Start the crawler: Run
python collect.py
Start crawling the proxy nodes and the system will automatically crawl them according to the settings in the configuration file. - Processing data: Run
python process.py
The crawled proxy nodes are processed and filtered to ensure the high quality of the proxy pool. - Plug-in use: Depending on the needs of the target site, write or use an existing plugin to be placed in the
plugins
directory, the system will automatically load and execute it. - automatic operation: Configure automatic sign-in, automatic registration and other functions, and run the corresponding scripts to realize automated operation.
- View Results: After the processing is completed, the agent pool data will be saved in a specified file and can be used by the user as needed.
Detailed Operation Procedure
- Agent Pool Construction: The system regularly crawls proxy nodes from multiple sources and screens and verifies them to ensure the high quality and availability of the proxy pool.
- plug-in system: Users can write custom plug-ins based on the specific needs of the target site to be placed in the
plugins
directory, the system will automatically load and execute these plugins. - automatic operation: Set up automatic check-in, automatic registration and other functions through the configuration file, and the system will perform these operations periodically to simplify the user's daily operation.
- Multi-protocol supportThe system supports a variety of proxy protocols such as HTTP, HTTPS, SOCKS, etc. Users can choose the appropriate proxy protocol to use according to their needs.