时雨小径 May the Spirit be with you

HackerRank - Task Scheduling

Recently I was leaning Python by going through the problems at HackerRank.com, it original was the interviewstreet, at which I had submitted some codes, they upgrade the site but didn't provide code backups for downloading, that upset me :(, however, something good there is, it makes me have to re-code the Python version without any existed C++ codes.

I'm thinking about posting analysis for some problems I solved, they could be interesting, tricky, or successfully frustrating me too much that I won't let it go easily :)

Here is the link to the problem Task Scheduling.

There is a straight forward thought, that is the tasks with smaller deadlines should be completed asap, it can be proved by contradiction.
[I think the description of the problem is just trying to mislead people(?), there is no switch needed]
Let $t_1$,$t_2$ be two tasks with deadline $d_1$ , $d_2$ and time cost $m_1$ , $m_2$, $t_1$ is finished at time $c_1$, and $t_2$ is finished at time $c_2$, instantly, we have $c_2 = m_1+m_2$.

Statement: if $d_1 < d_2$ and $t_1$ finished after $t_2$, the max overshoot might be less than that when $t_1$ finished before $t_2$

Proof:
$ts_1$ is the max overshoot if $t_1$ finished first, $ts_2$ is that otherwise.
\begin{eqnarray}
ts_1 = max(c_1 - d_1 , c_2 - d_2) \\
ts_2 = max(c_3 - d_2 , c_4 - d_1)
\end{eqnarray}
where $c_2 = c_4 = m_1 + m_2$
since $c_3 < c_4$ and $d_2 > d_1$, then $ts_2 = c_4 - d_1$
as the statement, to have $ts_2 < ts_1$, things need to satisfy:
\begin{eqnarray}
c_4 - d_1 < c_1 - d_1 \\
or\\
c_4 - d_1 < c_2 - d_2
\end{eqnarray}
which is
\begin{eqnarray}
m_1 + m_2 < c_1 \\
or\\
d_1 > d_2
\end{eqnarray}
and none of above could be true, therefore, we have the contradiction.

The basic solution is to just sort the list and do the calculation, however, what they ask is not just one final answer, but $n$ answers with dynamically increasing set of tasks, this turns the problem into an interval query'n'update problem, what my solution is:

Sort the whole set of tasks first, and each of them have its rank, given the order of input, put the tasks input the spot according to its rank $i$, then the answer would be:
$max\Big\{ \quad max\{S_p - d_p | p \in [1 , i)\} ,\quad S_i - d_i , \quad max\{ S_q - d_q | q \in (i , n]\} \Big\}$

where $S_i = \sum_{k=1}^{i}m_k$

I used Binary Indexed Tree to deal with $S_i$, the update and query can be done in $O(log_n)$, this is not too complicated, the more difficult thing for me is to update and query the part in the back of $i$, though I know a kind of Segment Tree is capable to do this in $O(log_n)$, which is acceptable, I haven't write trees with those operations before, it took me some time to figure out how does it work, and after I finished a working code with c++ and was trying to transfer it to Python, the running efficiency of Python was like a nightmare(about 50 times slower than c++…), I spent hours to optimize the Python codes to make it finally barely accepted at the HackerRank, (well, kinda meaningless though…you don't have to eat steak with Python)

Here is Python code which is the slower version but easier for reading:
I call it P++, lol.

import sys
import math

stdin = sys.stdin

Mid= lambda x : ((x[0] + x[1])>>1)
Lowbit = lambda x : (x&(-x))

used = {}

class Task:
    def __init__(self , _d , _m , _index):
        self.d = int(_d)
        self.m = int(_m)
        self.index = int(_index)
    def __lt__(self , other):
        if (self.d == other.d):
            return self.m > other.m
        else:
            return self.d < other.d
    def __repr__(self):
        return str((self.d , self.m))
class Node:
    def __init__(self):
        self.val = 0
        self.idx = 0
        self.add = 0
        self.cover = (0 , 0)
        self.lChild = None
        self.rChild = None
    def __lt__(self , other):
        if (self.idx not in used):
            return True
        if (other.idx not in used):
            return False
        return self.val < other.val
    def __repr__(self):
        return str(self.val)
    def build(self , cover):
        self.cover = cover
        self.idx = cover[0]
        if (cover[0] == cover[1]):
            return
        mid = Mid(cover)
        self.lChild = Node()
        self.rChild = Node()
        self.lChild.build( (cover[0] , mid) )
        self.rChild.build( (mid + 1 , cover[1]) )
    def addDown(self):

        self.lChild.addValue(self.add)
        self.rChild.addValue(self.add)
        self.add = 0
    def addValue(self , add):
        self.val += add
        self.add += add
        
    def update(self , cover, add):
        if (cover[0] <= self.cover[0] and cover[1] >= self.cover[1]):
            self.addValue(add)
            return
        if (self.add):
            self.addDown()
        mid = Mid(self.cover)
        if (cover[0] <= mid):
            self.lChild.update(cover , add)
        if (cover[1] > mid):
            self.rChild.update(cover , add)

        ret = max( self.lChild , self.rChild )
        self.val = ret.val
        self.idx = ret.idx

    def query(self , cover):
        if (cover[0] == self.cover[0] and cover[1] == self.cover[1]):
            return self
        if (self.add):
            self.addDown()
        mid = Mid(self.cover)
        if (cover[1] <= mid):
            return self.lChild.query(cover)
        elif (cover[0] > mid):
            return self.rChild.query(cover)
        else:
            vl = self.lChild.query( (cover[0] , mid) )
            vr = self.rChild.query( (mid + 1 , cover[1]) )
            return max(vl , vr)

class BinIdxTree:
    def __init__(self , size):
        self.S = [0 for i in range(size+10)]
    def add(self , x , c):
       
        while (x <= n):
            self.S[x] += c
            x += Lowbit(x)
    def getsum(self , x):
        res = 0
        while (x > 0):
            res += self.S[x]
            x -= Lowbit(x)
        return res
        
       
line = stdin.readline()
n = int(line)

segTreeRoot = Node()
segTreeRoot.build((1 , n))
bTree = BinIdxTree(n)

taskList = []
rank = [0 for i in range(n+1)]

for i in range(n):
    line = stdin.readline()
    [d , m] = [int(x) for x in line.split()]
    t = Task(d , m , i)
    taskList.append(t)

taskList.sort()

for i in range(n):
    rank[ taskList[i].index ] = i

ans = 0
for i in range(n):
    p = rank[i] + 1
    bTree.add(p , taskList[p-1].m)
    used[p] = True
    segTreeRoot.update((p , p) ,  - taskList[p-1].d)
    segTreeRoot.update((p , n) , taskList[p-1].m)
    res = bTree.getsum(p) - taskList[p-1].d
    res = 0 if res < 0 else res
    if (p + 1 <= n):
        resNode = segTreeRoot.query((p + 1 , n))
        p = resNode.idx
        if (p in used):
            s =bTree.getsum(p) - taskList[p-1].d
            res = s if s > res else res
    ans = res if res > ans else ans
    print ans