1

I have an entire HTML page inside one variable ($product_info) and I am trying to get the following values into seperate variables

<h1 itemprop="name">Product name</h1>
<span id="prvat">£285.60</span>
<span id="spc">142020EB</span>

I am trying to use the following php code, but it's simply not outputting the expected result

$product_info =
('
<h1 itemprop="name">Product name</h1>
<span id="prvat">£285.60</span>
<span id="spc">142020EB</span>
');

$product_name = preg_match('/<h1 itemprop="name">(.*)<\/h1>/', $product_info);
$price = preg_match('/<span id="prvat">(.*)<\/span>/', $product_info);
$product_code = preg_match('/<span id="spc">(.*)<\/span>/', $product_info);

echo ("Product Name = ".$product_name."<br>Price = ".$price."<br>Product Code = ".$product_code);

This is the output

Product Name = 1
Price = 1
Product Code = 1

Can someone point me in the right direction please.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

5 Answers5

4

I humbly suggest use an HTML Parser, DOMDocument in particular:

$product_info = '
<h1 itemprop="name">Product name</h1>
<span id="prvat">£285.60</span>
<span id="spc">142020EB</span>
';

$dom = new DOMDocument();
$dom->loadHTML($product_info);
$xpath = new DOMXpath($dom);

$product_name = $xpath->evaluate('string(//h1[@itemprop="name"]/text())');
$price = $xpath->evaluate('string(//span[@id="prvat"]/text())');
$product_code = $xpath->evaluate('string(//span[@id="spc"]/text())');

echo "
Product Name =  $product_name <br/>
Price =         $price <br/>
Product Code =  $product_code
";
Kevin
  • 41,694
  • 12
  • 53
  • 70
1

For the record, preg_match returns:

1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.

Which is why you get 1 in every variable.

The correct code would be

preg_match('/<h1 itemprop="name">(.*)<\/h1>/', $product_info, $product_name);

Although if you're parsing an entire HTML document, a HTML parser is definitely the way to go.

Joe
  • 15,669
  • 4
  • 48
  • 83
1

You weren't far off. You were just not giving PHP a var to store the results in. $matches is optional for storing the results. If you don't supply it, then preg_match will return true or false depending upon if the $string has a match or not.

preg_match ( string $pattern , string $subject, array $matches);

preg_match manual

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches1 will have the text that matched the first captured parenthesized subpattern, and so on.

This should do it.

preg_match('/<h1 itemprop="name">(.*)<\/h1>/', $product_info, $product_name);
preg_match('/<span id="prvat">(.*)<\/span>/', $product_info, $price);
preg_match('/<span id="spc">(.*)<\/span>/', $product_info, $product_code);

Then just print_r() your results.

Gary
  • 1,120
  • 10
  • 14
1

preg_match takes 3 arguments, the first argument is you regular expression, the second one is the string and the last one stores all cactched result.

So you shold use:
if(preg_match('/<h1 itemprop="name">(.*)<\/h1>/', $product_info, $result) { $product_info = $result[1]; )

Joey
  • 1,233
  • 3
  • 11
  • 18
1

look here: http://en.wikipedia.org/wiki/Dyck_language

you need to check the matching brackets not a finite state machine its a Pushdown automaton (well regex might work regardless :D ), but just to Show another way to handle these types of languages:

C#:

static void Main(string[] args)
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Stopwatch timer = new Stopwatch();
        timer.Start();
        VerweisKeller k = new VerweisKeller(); // This is just a german name for my own stack implementation use the .NET Stack<T> instead

        bool fail = false;
        string a = "[[[((())[[((([()])))[()()]([])]([([([])])])])]]][[[((())[[(([()]))[()()][((()))][][()()]()[][(())][][]([])]([([([])])])])]]]"; 
        char[] tape = a.ToCharArray();
        int countChars = 0;
        Console.WriteLine("{0} \n", a);

        for (int i = 0; i < tape.Length && !fail; ++i)
        {
            switch (tape[i])
            {
                case ('('):
                case ('['): 
                    k.push(tape[i]);
                    break;

                case (')'):
                    fail = !checkClosingBracketsRound(k);
                    break;

                case (']'):
                    fail = !checkClosingBracketsSquared(k);
                    break;

                default:
                    break;
            }++countChars;
        } 
        if (!fail && k.empty())
            Console.WriteLine("accepted");
        else 
            Console.WriteLine("not accepted");
        Console.WriteLine(countChars);
        timer.Stop();
        Console.WriteLine("Time: {0}", timer.Elapsed);
        Console.ReadKey();
    }

    private static bool checkClosingBracketsSquared(VerweisKeller k)
    {
        if (!k.empty() && ((char)k.top()) == '[')
        {
            k.pop();
            return true;
        }
        return false;
    }

    private static bool checkClosingBracketsRound(VerweisKeller k)
    {
        if (!k.empty() && ((char)k.top()) == '(')
        {
            k.pop();
            return true;
        }
        return false;
    }
Anzzi
  • 35
  • 7