findSubstrings y breakSubstring en Data.ByteString

En la fuente de Data/ByteString.hs dice que la función findSubstrings ha quedado en desuso en favor de breakSubstring. Sin embargo, creo que el findSubstrings que se implementó utilizando el algoritmo KMP es mucho más eficiente que el algoritmo utilizado en breakSubstring que es ingenuo. ¿Alguien tiene alguna idea de por qué se hizo esto?findSubstrings y breakSubstring en Data.ByteString

Aquí está la implementación antigua:

{-# DEPRECATED findSubstrings "findSubstrings is deprecated in favour of breakSubstring." #-} 

{- 
{- This function uses the Knuth-Morris-Pratt string matching algorithm. -} 

findSubstrings [email protected](PS _ _ m) [email protected](PS _ _ n) = search 0 0 
where 
    patc x = pat `unsafeIndex` x 
    strc x = str `unsafeIndex` x 

    -- maybe we should make kmpNext a UArray before using it in search? 
    kmpNext = listArray (0,m) (-1:kmpNextL pat (-1)) 
    kmpNextL p _ | null p = [] 
    kmpNextL p j = let j' = next (unsafeHead p) j + 1 
        ps = unsafeTail p 
        x = if not (null ps) && unsafeHead ps == patc j' 
          then kmpNext Array.! j' else j' 
        in x:kmpNextL ps j' 
    search i j = match ++ rest -- i: position in string, j: position in pattern 
    where match = if j == m then [(i - j)] else [] 
      rest = if i == n then [] else search (i+1) (next (strc i) j + 1) 
    next c j | j >= 0 && (j == m || c /= patc j) = next c (kmpNext Array.! j) 
      | otherwise = j 
-}

y aquí está la nueva ingenua uno:

findSubstrings :: ByteString --^String to search for. 
      -> ByteString --^String to seach in. 
      -> [Int] 
findSubstrings pat str 
    | null pat   = [0 .. length str] 
    | otherwise  = search 0 str 
where 
    STRICT2(search) 
    search n s 
     | null s    = [] 
     | pat `isPrefixOf` s = n : search (n+1) (unsafeTail s) 
     | otherwise   =  search (n+1) (unsafeTail s)

Fuente

2011-06-13 sob7y

La razón para el cambio fue que la aplicación de KMP era en realidad más ineficiente que la versión ingenua, que se implementa teniendo en cuenta el rendimiento.

Fuente

2011-06-13 16:37:09

¿No es mejor el KMP en términos de complejidad de tiempo? – sob7y

Es, teóricamente. Estoy diciendo que, en la práctica, esta implementación fue tan ineficiente que le gustó la versión ingenua de todas las pruebas que realicé. –

findSubstrings y breakSubstring en Data.ByteString

Respuesta

Cuestiones relacionadas