管理长时间运行的 php 脚本的最佳方法？

I have a PHP script that takes a long time (5-30 minutes) to complete. Just in case it matters, the script is using curl to scrape data from another server. This is the reason it's taking so long; it has to wait for each page to load before processing it and moving to the next. I want to be able to initiate the script and let it be until it's done, which will set a flag in a database table. What I need to know is how to be able to end the http request before the script is finished running. Also, is a php script the best way to do this?

更新 +12 年 - 安全说明虽然这仍然是调用长时间运行的代码的好方法，但限制甚至禁用 Web 服务器中的 PHP 启动其他可执行文件的能力对于安全性是有好处的。由于这将日志运行的行为与启动它的行为分离，因此在许多情况下，使用守护程序或 cron 作业可能更合适。原答案当然可以使用 PHP 来完成，但是您不应该将其作为后台任务来执行 - 新进程必须与其启动的进程组分离。由于人们不断对此常见问题解答给出相同的错误答案，因此我在这里写了一个更完整的答案： http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html 来自评论：简短的版本是 shell_exec('echo /usr/bin/php -q longThing.php | at now');但“为什么”的原因在这里包含起来有点长。

2024 年 1 月 7 日

正如许多人所说，这不是最好的方法，但这可能会有所帮助：忽略_用户_中止（1）； // 即使用户关闭浏览器也会在后台运行脚本设置时间限制（1800）； // 运行30分钟 // 这里长时间运行的脚本

2024 年 1 月 7 日

if you have long script then divide page work with the help of input parameter for each task.(then each page act like thread) i.e if page has 1 lac product_keywords long process loop then instead of loop make logic for one keyword and pass this keyword from magic or cornjobpage.php(in following example) and for background worker i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous. cornjobpage.php //mainpage <?php post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue"); //post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2"); //post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue"); //call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous. ?> <?php /* * Executes a PHP page asynchronously so the current page does not have to wait for it to finish running. * */ function post_async($url,$params) { $post_string = $params; $parts=parse_url($url); $fp = fsockopen($parts['host'], isset($parts['port'])?$parts['port']:80, $errno, $errstr, 30); $out = "GET ".$parts['path']."?$post_string"." HTTP/1.1rn";//you can use POST instead of GET if you like $out.= "Host: ".$parts['host']."rn"; $out.= "Content-Type: application/x-www-form-urlencodedrn"; $out.= "Content-Length: ".strlen($post_string)."rn"; $out.= "Connection: Closernrn"; fwrite($fp, $out); fclose($fp); } ?> testpage.php <? echo $_REQUEST["Keywordname"];//case1 Output > testValue ?> PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712

2024 年 1 月 7 日

what I ALWAYS use is one of these variants (because different flavors of Linux have different rules about handling output/some programs output differently): Variant I @exec('./myscript.php 1>/dev/null 2>/dev/null &'); Variant II @exec('php -f myscript.php 1>/dev/null 2>/dev/null &'); Variant III @exec('nohup myscript.php 1>/dev/null 2>/dev/null &'); You might havet install "nohup". But for example, when I was automating FFMPEG video converstions, the output interface somehow wasn't 100% handled by redirecting output streams 1 & 2, so I used nohup AND redirected the output.

2024 年 1 月 7 日

使用代理来委托请求。

2024 年 1 月 7 日

我已经用 Perl、双 fork() 和与父进程分离做了类似的事情。所有 http 获取工作都应该在 fork 进程中完成。

2024 年 1 月 7 日

I would like to propose a solution that is a little different from symcbean's, mainly because I have additional requirement that the long running process need to be run as another user, and not as apache / www-data user. First solution using cron to poll a background task table: PHP web page inserts into a background task table, state 'SUBMITTED' cron runs once each 3 minutes, using another user, running PHP CLI script that checks the background task table for 'SUBMITTED' rows PHP CLI will update the state column in the row into 'PROCESSING' and begin processing, after completion it will be updated to 'COMPLETED' Second solution using Linux inotify facility: PHP web page updates a control file with the parameters set by user, and also giving a task id shell script (as a non-www user) running inotifywait will wait for the control file to be written after control file is written, a close_write event will be raised an the shell script will continue shell script executes PHP CLI to do the long running process PHP CLI writes the output to a log file identified by task id, or alternatively updates progress in a status table PHP web page could poll the log file (based on task id) to show progress of the long running process, or it could also query status table Some additional info could be found in my post : http://inventorsparadox.blogspot.co.id/2016/01/long-running-process-in-linux-using-php.html

2024 年 1 月 7 日

I realize this is a quite old question but would like to give it a shot. This script tries to address both the initial kick off call to finish quickly and chop down the heavy load into smaller chunks. I haven't tested this solution. <?php /** * crawler.php located at http://mysite.com/crawler.php */ // Make sure this script will keep on runing after we close the connection with // it. ignore_user_abort(TRUE); function get_remote_sources_to_crawl() { // Do a database or a log file query here. $query_result = array ( 1 => 'http://exemple.com', 2 => 'http://exemple1.com', 3 => 'http://exemple2.com', 4 => 'http://exemple3.com', // ... and so on. ); // Returns the first one on the list. foreach ($query_result as $id => $url) { return $url; } return FALSE; } function update_remote_sources_to_crawl($id) { // Update my database or log file list so the $id record wont show up // on my next call to get_remote_sources_to_crawl() } $crawling_source = get_remote_sources_to_crawl(); if ($crawling_source) { // Run your scraping code on $crawling_source here. if ($your_scraping_has_finished) { // Update you database or log file. update_remote_sources_to_crawl($id); $ctx = stream_context_create(array( 'http' => array( // I am not quite sure but I reckon the timeout set here actually // starts rolling after the connection to the remote server is made // limiting only how long the downloading of the remote content should take. // So as we are only interested to trigger this script again, 5 seconds // should be plenty of time. 'timeout' => 5, ) )); // Open a new connection to this script and close it after 5 seconds in. file_get_contents('http://' . $_SERVER['HTTP_HOST'] . '/crawler.php', FALSE, $ctx); print 'The cronjob kick off has been initiated.'; } } else { print 'Yay! The whole thing is done.'; }

2024 年 1 月 7 日

您可以将其作为 XHR (Ajax) 请求发送。与普通 HTTP 请求不同，客户端通常不会有任何 XHR 超时。

2024 年 1 月 7 日

I agree with the answers that say this should be run in a background process. But it's also important that you report on the status so the user knows that the work is being done. When receiving the PHP request to kick off the process, you could store in a database a representation of the task with a unique identifier. Then, start the screen-scraping process, passing it the unique identifier. Report back to the iPhone app that the task has been started and that it should check a specified URL, containing the new task ID, to get the latest status. The iPhone application can now poll (or even "long poll") this URL. In the meantime, the background process would update the database representation of the task as it worked with a completion percentage, current step, or whatever other status indicators you'd like. And when it has finished, it would set a completed flag.

2024 年 1 月 7 日

PHP 可能是也可能不是最好的工具，但您知道如何使用它，并且应用程序的其余部分是使用它编写的。这两个品质，再加上 PHP“足够好”这一事实，为使用它而不是 Perl、Ruby 或 Python 提供了非常有力的理由。如果您的目标是学习另一种语言，那么选择一种语言并使用它。您提到的任何语言都可以完成这项工作，没问题。我碰巧喜欢Perl，但你喜欢的可能不同。 Symcbean 在他的链接中提供了一些关于如何管理后台进程的好建议。简而言之，编写一个 CLI PHP 脚本来处理长位。确保它以某种方式报告状态。使用 AJAX 或传统方法创建一个 php 页面来处理状态更新。您的启动脚本将启动在其自己的会话中运行的进程，并返回该进程正在进行的确认。祝你好运。

2024 年 1 月 7 日

Yes, you can do it in PHP. But in addition to PHP it would be wise to use a Queue Manager. Here's the strategy: Break up your large task into smaller tasks. In your case, each task could be loading a single page. Send each small task to the queue. Run your queue workers somewhere. Using this strategy has the following advantages: For long running tasks it has the ability to recover in case a fatal problem occurs in the middle of the run -- no need to start from the beginning. If your tasks do not have to be run sequentially, you can run multiple workers to run tasks simultaneously. You have a variety of options (this is just a few): RabbitMQ (https://www.rabbitmq.com/tutorials/tutorial-one-php.html) ZeroMQ (http://zeromq.org/bindings:php) If you're using the Laravel framework, queues are built-in (https://laravel.com/docs/5.4/queues), with drivers for AWS SES, Redis, Beanstalkd

2024 年 1 月 7 日

No, PHP is not the best solution. I'm not sure about Ruby or Perl, but with Python you could rewrite your page scraper to be multi-threaded and it would probably run at least 20x faster. Writing multi-threaded apps can be somewhat of a challenge, but the very first Python app I wrote was mutlti-threaded page scraper. And you could simply call the Python script from within your PHP page by using one of the shell execution functions.

2024 年 1 月 7 日

您可以使用 exec 或 system 启动后台作业，然后在其中完成工作。此外，还有比您正在使用的更好的方法来抓取网络。您可以使用线程方法（多个线程一次执行一页），或使用事件循环的方法（一个线程一次执行多个页面）。我个人使用 Perl 的方法是使用 AnyEvent::HTTP。 ETA：symcbean 在这里解释了如何正确分离后台进程。

2024 年 1 月 7 日

The quick and dirty way would be to use the ignore_user_abort function in php. This basically says: Don't care what the user does, run this script until it is finished. This is somewhat dangerous if it is a public facing site (because it is possible, that you end up having 20++ versions of the script running at the same time if it is initiated 20 times). The "clean" way (at least IMHO) is to set a flag (in the db for example) when you want to initiate the process and run a cronjob every hour (or so) to check if that flag is set. If it IS set, the long running script starts, if it is NOT set, nothin happens.

2024 年 1 月 7 日

如果脚本的所需输出是某种处理，而不是网页，那么我相信所需的解决方案是从 shell 运行脚本，就像 php my_script.php

2024 年 1 月 7 日