Python 异步编程与结构化并发：构建高性能应用

2025年，Python异步编程已经成为构建高性能网络应用、爬虫和微服务的标准方案。Python 3.11+ 引入的 TaskGroup 和结构化并发概念，让异步任务的生命周期管理更加清晰可靠。本文通过四个实战案例，带你掌握这些现代Python特性。

1. 为什么需要异步编程与结构化并发？

传统同步编程在处理IO密集型任务时，CPU会大量空闲等待。异步编程让单个线程可以并发处理多个IO任务，而结构化并发则解决了传统 asyncio.gather 中任务生命周期不清晰、异常处理复杂的问题。

异步编程：高效处理IO密集型任务（网络请求、文件读写、数据库查询）
结构化并发：使用 TaskGroup 管理任务生命周期
异步迭代器：流式处理异步数据源

2. 异步编程基础：协程与事件循环

Python异步编程的核心是协程（Coroutine）和事件循环（Event Loop）。

import asyncio
import time

# 定义协程
async def say_hello(name: str, delay: float):
    await asyncio.sleep(delay)
    print(f"Hello, {name}! (等待了{delay}秒)")
    return f"完成: {name}"

# 运行协程
async def main():
    # 方式1：直接await
    result = await say_hello("张三", 1)
    print(result)
    
    # 方式2：并发执行
    async with asyncio.TaskGroup() as tg:
        task1 = tg.create_task(say_hello("李四", 2))
        task2 = tg.create_task(say_hello("王五", 1.5))
    
    print("所有任务完成")

# 运行入口
asyncio.run(main())

3. 实战案例一：高性能网络爬虫

使用 aiohttp 和 TaskGroup 构建高并发爬虫。

import asyncio
import aiohttp
from dataclasses import dataclass

@dataclass
class PageResult:
    url: str
    status: int
    content_length: int

async def fetch_page(session: aiohttp.ClientSession, url: str) -> PageResult:
    """异步获取单个页面"""
    try:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
            content = await response.read()
            return PageResult(
                url=url,
                status=response.status,
                content_length=len(content)
            )
    except Exception as e:
        print(f"请求失败: {url} - {e}")
        return PageResult(url=url, status=0, content_length=0)

async def crawl_urls(urls: list[str]) -> list[PageResult]:
    """并发爬取多个URL"""
    connector = aiohttp.TCPConnector(limit=50)  # 限制并发连接数
    async with aiohttp.ClientSession(connector=connector) as session:
        results = []
        # 使用TaskGroup管理并发任务
        async with asyncio.TaskGroup() as tg:
            tasks = [tg.create_task(fetch_page(session, url)) for url in urls]
        
        # 收集结果
        for task in tasks:
            results.append(task.result())
        
        return results

async def main():
    urls = [
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/0.5",
        "https://httpbin.org/status/200",
        "https://httpbin.org/status/404",
    ] * 20  # 100个请求
    
    start = time.time()
    results = await crawl_urls(urls)
    elapsed = time.time() - start
    
    success = sum(1 for r in results if r.status == 200)
    print(f"总请求: {len(results)}, 成功: {success}, 耗时: {elapsed:.2f}秒")

asyncio.run(main())

4. 实战案例二：异步文件处理与流式读取

使用异步生成器逐行读取和处理大型文件。

import asyncio
import aiofiles
from pathlib import Path

async def read_large_file(file_path: str, chunk_size: int = 8192):
    """异步读取文件，逐块yield"""
    async with aiofiles.open(file_path, mode='r', encoding='utf-8') as f:
        while True:
            chunk = await f.read(chunk_size)
            if not chunk:
                break
            yield chunk

async def process_log_file(file_path: str):
    """异步处理日志文件，统计错误行"""
    error_count = 0
    total_lines = 0
    
    async for chunk in read_large_file(file_path):
        lines = chunk.split('n')
        for line in lines:
            total_lines += 1
            if 'ERROR' in line or 'error' in line:
                error_count += 1
    
    return total_lines, error_count

async def main():
    # 创建一个模拟日志文件
    log_path = "sample.log"
    async with aiofiles.open(log_path, mode='w') as f:
        for i in range(100000):
            if i % 100 == 0:
                await f.write(f"2025-01-15 10:00:00 ERROR 发生错误 #{i}n")
            else:
                await f.write(f"2025-01-15 10:00:00 INFO 正常日志 #{i}n")
    
    print("文件创建完成，开始处理...")
    start = time.time()
    
    total, errors = await process_log_file(log_path)
    
    elapsed = time.time() - start
    print(f"总行数: {total}, 错误行: {errors}, 耗时: {elapsed:.2f}秒")

asyncio.run(main())

5. 实战案例三：异步队列与生产者消费者模式

使用 asyncio.Queue 实现生产者-消费者模式。

import asyncio
import random

async def producer(queue: asyncio.Queue, producer_id: int, item_count: int):
    """生产者：生成数据放入队列"""
    for i in range(item_count):
        item = {
            'id': f"P{producer_id}-{i}",
            'value': random.randint(1, 100),
            'producer': producer_id
        }
        await queue.put(item)
        print(f"生产者{producer_id} 生产: {item['id']} (队列大小: {queue.qsize()})")
        await asyncio.sleep(random.uniform(0.1, 0.5))
    
    print(f"生产者{producer_id} 完成生产")

async def consumer(queue: asyncio.Queue, consumer_id: int):
    """消费者：从队列获取数据处理"""
    while True:
        try:
            # 等待数据，超时后退出
            item = await asyncio.wait_for(queue.get(), timeout=2.0)
            
            # 模拟数据处理
            await asyncio.sleep(random.uniform(0.2, 0.8))
            result = item['value'] * 2
            print(f"消费者{consumer_id} 处理: {item['id']} -> {result}")
            
            queue.task_done()
        except asyncio.TimeoutError:
            print(f"消费者{consumer_id} 超时退出")
            break

async def main():
    queue = asyncio.Queue(maxsize=20)
    
    # 创建生产者和消费者
    async with asyncio.TaskGroup() as tg:
        # 启动3个生产者
        for i in range(3):
            tg.create_task(producer(queue, i, 10))
        
        # 启动2个消费者
        for i in range(2):
            tg.create_task(consumer(queue, i))
    
    # 等待队列清空
    await queue.join()
    print("所有任务完成")

asyncio.run(main())

6. 实战案例四：异步Web服务与结构化并发

使用 aiohttp 构建异步Web服务，结合 TaskGroup 处理后台任务。

import asyncio
from aiohttp import web
import time

# 模拟数据库查询
async def query_database(query_id: int) -> dict:
    await asyncio.sleep(0.5)  # 模拟IO
    return {'id': query_id, 'data': f'结果{query_id}', 'timestamp': time.time()}

# 模拟外部API调用
async def call_external_api(api_name: str) -> str:
    await asyncio.sleep(0.3)  # 模拟IO
    return f"{api_name} 响应"

# 处理请求的handler
async def handle_request(request: web.Request) -> web.Response:
    request_id = request.match_info.get('id', '0')
    
    # 使用结构化并发并行执行多个任务
    async with asyncio.TaskGroup() as tg:
        db_task = tg.create_task(query_database(int(request_id)))
        api1_task = tg.create_task(call_external_api("API-1"))
        api2_task = tg.create_task(call_external_api("API-2"))
    
    # 收集结果
    db_result = db_task.result()
    api1_result = api1_task.result()
    api2_result = api2_task.result()
    
    response_data = {
        'request_id': request_id,
        'database': db_result,
        'external_apis': [api1_result, api2_result],
        'processing_time': time.time() - db_result['timestamp']
    }
    
    return web.json_response(response_data)

# 后台健康检查任务
async def health_check(app: web.Application):
    while True:
        print(f"[健康检查] 服务运行中... 时间: {time.strftime('%H:%M:%S')}")
        await asyncio.sleep(5)

# 启动后台任务
async def start_background_tasks(app: web.Application):
    app['health_task'] = asyncio.create_task(health_check(app))

# 清理后台任务
async def cleanup_background_tasks(app: web.Application):
    app['health_task'].cancel()
    await app['health_task']

def create_app():
    app = web.Application()
    app.router.add_get('/api/data/{id}', handle_request)
    app.on_startup.append(start_background_tasks)
    app.on_cleanup.append(cleanup_background_tasks)
    return app

if __name__ == '__main__':
    app = create_app()
    web.run_app(app, host='0.0.0.0', port=8080)

7. 性能对比：同步 vs 异步

场景	同步方式	异步方式
100个网络请求	约100秒（串行）	约2秒（并发）
大文件处理	内存占用高，阻塞	流式处理，内存友好
Web服务并发	需要多线程/多进程	单线程高并发
CPU密集型任务	适合	不适合（需要多进程配合）

8. 最佳实践总结

使用 TaskGroup 管理任务：替代 asyncio.gather，自动处理异常和取消
限制并发数：使用 asyncio.Semaphore 或连接池限制并发
避免阻塞事件循环：不要在协程中使用 time.sleep()，使用 asyncio.sleep()
使用异步上下文管理器：async with 管理资源生命周期
结构化并发：确保所有任务在退出前完成或取消

# 使用Semaphore限制并发示例
semaphore = asyncio.Semaphore(10)

async def limited_task(url):
    async with semaphore:
        return await fetch_url(url)

# 使用asyncio.timeout设置超时
async def task_with_timeout():
    try:
        async with asyncio.timeout(5):
            result = await slow_operation()
            return result
    except asyncio.TimeoutError:
        print("操作超时")
        return None

9. 总结

通过本文的案例，你掌握了Python异步编程和结构化并发的核心技术：

协程与事件循环基础
使用TaskGroup管理并发任务
高性能网络爬虫构建
异步文件流式处理
生产者-消费者模式
异步Web服务与后台任务
最佳实践与性能对比

Python异步编程让IO密集型应用性能提升数十倍，结构化并发让代码更加健壮和可维护。现在就开始在你的项目中使用这些现代Python特性吧！

本文原创，基于Python 3.12+。所有代码均在Python 3.12环境中测试通过。

Python 异步编程与结构化并发：构建高性能应用

1. 为什么需要异步编程与结构化并发？

2. 异步编程基础：协程与事件循环

3. 实战案例一：高性能网络爬虫

4. 实战案例二：异步文件处理与流式读取

5. 实战案例三：异步队列与生产者消费者模式

6. 实战案例四：异步Web服务与结构化并发

7. 性能对比：同步 vs 异步

8. 最佳实践总结

9. 总结

相关文章

淘吗网

1. 为什么需要异步编程与结构化并发？

2. 异步编程基础：协程与事件循环

3. 实战案例一：高性能网络爬虫

4. 实战案例二：异步文件处理与流式读取

5. 实战案例三：异步队列与生产者消费者模式

6. 实战案例四：异步Web服务与结构化并发

7. 性能对比：同步 vs 异步

8. 最佳实践总结

9. 总结

相关文章

微信

淘吗网

QQ交流群