Ticket #314 (new Bugs)

Opened 11 months ago

Last modified 44 hours ago

Liquidsoap locks up during transitions using dynamic requests

Reported by: omeron Owned by: admin
Priority: 1 Milestone:
Component: Liquidsoap Version: 0.9.1+svn
Keywords: transitions, crossfading Cc:
Mac OSX: no Linux: yes
NetBSD: no Other Operating System: no
FreeBSD: no

Description

This started with ticket 311, tracking down lockups. After setting the conservative parameter to true on the various sources, the lockups stopped with metadata rewriting, but continue with transitions. Disabling them makes the lockups disappear. The problem may relate to negative remaining times, and resolving ticket 311 may fix this problem. We felt it better to open this new ticket, however, to track this specific issue regarding transitions.

Attachments

gv.liq Download (3.0 KB) - added by omeron 11 months ago.
The script which triggers the lockups.
log.tar.bz2 Download (37.4 KB) - added by omeron 11 months ago.
An archive of a log stopped soon after a lockup.
liquidsoap.log.crash8.tar.bz2 Download (100.3 KB) - added by omeron 10 months ago.
A log with both a hard and soft lockup, and a gdb backtrace after the hard lockup. See notes at beginning.
gdb.log Download (4.9 KB) - added by omeron 10 months ago.
A gdb stack trace including all threads after a hard lockup
ioring.patch Download (4.7 KB) - added by mrpingouin 10 months ago.
fix deadlock in ioring

Change History

Changed 11 months ago by omeron

The script which triggers the lockups.

Changed 11 months ago by omeron

An archive of a log stopped soon after a lockup.

Changed 11 months ago by toots

Ok, thanks !

We've just spend some hours with David trying to reproduce the issue without success. We believe now that it should be interesting if we could have an access to a frozen liquidsoap in order to try to investigate directly on your case...

I guess we should coordonate later on IRC or by mail..

Changed 11 months ago by mrpingouin

Hi omeron & toots,

I tried to reproduce this bug. In order to accelerate the process I switched to an output.dummy and I set("root.sync",false). It made me discover some bugs (that I fixed recently), and a segfault with faad it seems (didn't look into it). But I don't get any freeze!

Would it be possible to try to reproduce the problem in root.sync=false mode? From your log, it seems that you're not using faad. It seems you're not even using any external decoder (flac), right?

In any case, reproducing more quickly (and if possible on a simpler script) would be very useful to find the problem, and check when it's fixed.

PS: You can try to reproduce with or without the recent fixes, it probably isn't related, but please say which version you're at.

PPS: To be precise I'm running a slightly simplified script without replaygain, and an empty playlist.

Changed 10 months ago by omeron

A log with both a hard and soft lockup, and a gdb backtrace after the hard lockup. See notes at beginning.

Changed 10 months ago by omeron

Yes, I use flac to decode flac files. Attached you'll find a most interesting log, which has both a soft and a hard lockup. I believe we actually track to issues. Soft lockups relate to transitions or something related to them. Hard lockups relate to something else.

I first had a soft lockup. When this happens, decoding, queueing, everything continues as normal, but all outputs cease. This happened around 21:01 as noted. Normally, I'd abort and restart, but this time I let it go, in case anyone needed anything. To my shock, it resumed after midnight and the update triggered then.

A hard lockup happened later, and when this happens everything stops, and I have to kill -9 it. Before doing so however, I dud a gdb backtrace, at the end of the log. I hope this helps.

Changed 10 months ago by toots

Thanks for this information !

Was the gdb trace taken before you trigered the shutdown ?

You may also use :

thread apply all bt

in order to get a per-thread backtrace.

I will be very busy until tomorow, but I have another test script running on and I hope to see it locking up at some point...

Changed 10 months ago by omeron

A gdb stack trace including all threads after a hard lockup

Changed 10 months ago by mrpingouin

Thanks a lot for those files! It has allowed me to spot a bug, which is very probably the one that you had too. Hopefully, this is the only "hard lockup" bug that you're having. As for the soft lockup, I have no idea.

In any case, you gdb trace says something interesting: two threads are waiting on a condition. Those threads are the ALSA thread that consumes data from your output.alsa(), and the streaming thread that feeds it. I looked at the corresponding code, shared by most buffered I/O operators, and it didn't seem very robust.

First, it's easy to reproduce. Just play a playlist through output.alsa(), stressing ALSA as much as possible: {{{src/liquidsoap -t scripts/utils.liq 'set("alsa.periods",2) set("frame.size",256) output.alsa(mksafe(playlist("~/media/audio/jazz")))' }}} For me this would freeze immediately but with my patch it's been running fine for several songs.

Although fixing the problem is easily done by applying a rigid coding style, it was actually a bit hard to understand why exactly the code went wrong. Some of the wild coding style is justified for subtle reasons, but it was a mistake to believe that everything was under control their (probably my mistake a long time ago). Here is the bad scenario (with N=2, which is the default):

  • the reader takes the mutex
  • the reader sees that it should wait
  • the writer can write, does so, signals the reader
  • the writer can write, does so, signals the reader (the writer didn't need to take the mutex at any time)
  • the reader waits (and thus releases the mutex)
  • the writer takes the mutex, sees that it should wait, and waits
  • everybody waits forever!

The attached patch is the minimal change to rule out this scenario. Apart from that it has tons of comments about what else to cleanup/fix. I'll take care of it tomorrow.

Thanks again for your patience and help on this bug. This is an important fix in a crucial piece of code. I hope that it will solve your problem, but in any case it's a good thing to have fixed!

Changed 10 months ago by mrpingouin

fix deadlock in ioring

Changed 44 hours ago by 太阳能

China is the world's commitment to implement energy-saving emission reduction targets, to enhance policy support the new energy economy strategy, accelerating the construction of solar photovoltaic technology in rural and urban areas of the application, the relevant state ministries launch solar roofs program. Solar roof planned efforts to break through the integration and solution architecture design of a lack of optical, optoelectronic products combined with low level of construction, optical and network problems, low market awareness of the problem. Solar Roofs Plan into consideration economic and social benefits and other factors, at this stage of economic development, industrial base cities to actively promote a better solar roofs, building integrated photovoltaic curtain wall and other optical model; actively support development in rural and remote areas from the net-generation, implementation of transmission to rural areas, the implementation of the national farmer-friendly policy. Solar roof planned to mobilize all sectors of society through the development of demonstration projects and promote the implementation of relevant national policies. Demonstration projects to strengthen advocacy, to expand its influence, increase market awareness, formation and development of solar photovoltaic products, a good social atmosphere; to promote the implementation of tariff and other Internet-sharing policy, the formation of policy together, amplifying the effect of policies; the optical construction applications as an important building energy efficiency content, in new construction, existing building energy-saving, actively promote the use of urban lighting. Policy limit sun roof demonstration project must be greater than 50kW, which requires at least 400 square meters of total floor area, construction is difficult to participate in the general population, eligible owners will focus on schools, hospitals and government and other public and commercial buildings. Ministry of Finance to consider subsidies, the cost of electricity can be reduced to 0.58 yuan / kWh. Whether PV electricity price electricity price in the thermal power given premium is not clear, but even without the premium, due to lower power generation cost sales price, the owners are still building solar power projects to generate electricity for personal use, replacement power purchased from power. Moreover, local governments can be expected to give additional subsidies, electricity costs will decline further  http://solar-poweronline.info/

Note: See TracTickets for help on using tickets.