我要投稿

Python异步IO终极指南：从asyncio到生产环境实战

资深开发工程师 python

2025-07-12 0 1,211

Python异步IO终极指南：从asyncio到生产环境实战

一、异步编程核心概念

Python 3.4引入的asyncio模块彻底改变了IO密集型应用开发方式：

import asyncio

async def fetch_data(url):
    print(f"开始获取 {url}")
    await asyncio.sleep(1)  # 模拟IO操作
    print(f"完成获取 {url}")
    return f"{url}的数据"

async def main():
    # 并行执行两个任务
    task1 = asyncio.create_task(fetch_data("https://api1"))
    task2 = asyncio.create_task(fetch_data("https://api2"))
    
    results = await asyncio.gather(task1, task2)
    print(results)

asyncio.run(main())

三大核心优势：高并发、低资源消耗、代码简洁

二、生产级异步编程技巧

1. 并发控制

async def batch_fetch(urls, max_concurrent=5):
    semaphore = asyncio.Semaphore(max_concurrent)
    
    async def fetch(url):
        async with semaphore:
            return await fetch_data(url)
    
    return await asyncio.gather(*[fetch(url) for url in urls])

2. 超时处理

async def fetch_with_timeout(url, timeout=3):
    try:
        return await asyncio.wait_for(
            fetch_data(url), 
            timeout=timeout
        )
    except asyncio.TimeoutError:
        print(f"{url}请求超时")
        return None

三、高性能Web爬虫实战

import aiohttp

async def crawl_page(session, url):
    try:
        async with session.get(url) as response:
            html = await response.text()
            return parse_html(html)
    except Exception as e:
        print(f"抓取失败 {url}: {str(e)}")
        return None

async def batch_crawl(urls):
    connector = aiohttp.TCPConnector(limit=10)  # 连接池限制
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [crawl_page(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# 使用示例
urls = [f"https://example.com/page{i}" for i in range(1, 101)]
results = asyncio.run(batch_crawl(urls))