一、车间调度问题
车间调度是NP难问题,传统方法难以获得最优解。强化学习能通过不断学习找到近似最优调度策略。
二、问题建模
状态空间
S = (machine_status, job_queue, processing_time)机器状态 | 作业队列 | 加工时间
动作空间
A = {assign_job_1, assign_job_2, ..., assign_job_n}选择下一个要加工的作业
奖励函数
R = -makespan - penalty延迟惩罚 + 准时奖励
三、Q-Learning实现
import numpy as npclass QScheduler: def __init__(self, n_jobs, n_machines): self.q_table = np.zeros((n_states, n_actions)) self.alpha = 0.1 # 学习率 self.gamma = 0.9 # 折扣因子 self.epsilon = 0.1 # 探索率 def choose_action(self, state): if np.random.random() < self.epsilon: return np.random.randint(self.n_actions) return np.argmax(self.q_table[state]) def update(self, state, action, reward, next_state): best_next = np.max(self.q_table[next_state]) self.q_table[state, action] += self.alpha * (reward + self.gamma * best_next - self.q_table[state, action])
四、训练流程
for episode in range(1000): state = env.reset() done = False while not done: action = scheduler.choose_action(state) next_state, reward, done = env.step(action) scheduler.update(state, action, reward, next_state) state = next_state
五、效果对比
| 方法 | 平均完工时间 | 利用率 |
|---|---|---|
| FCFS | 100% | 75% |
| 启发式 | 85% | 85% |
| Q-Learning | 78% | 92% |
关注【一路福利】,获取强化学习调度完整代码!